SlideShare a Scribd company logo
1 of 82
POSTGRES INDEXES:
A DEEP DIVE
INTRODUCTION
Bartosz Sypytkowski
▪ @Horusiath
▪ b.sypytkowski@gmail.com
▪ bartoszsypytkowski.com
 B+Tree: how databases manage data?
 How indexes work
 Query plans & execution
 Types of indexes
 Vector search
AGENDA
B+TREE
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
4 32
4 25
4
DATA
32 49
8
DATA
16
DATA
20
DATA
25
DATA
27
DATA
30
DATA
32
DATA
42
DATA
43
DATA
44
DATA
49
DATA
62
DATA
64
DATA
8KB
B+TREE
link
link link link
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
4 32
4 25
4
DATA
32 49
8
DATA
16
DATA
20
DATA
25
DATA
27
DATA
30
DATA
32
DATA
42
DATA
43
DATA
49
DATA
49
DATA
62
DATA
64
DATA
8KB
SEARCH BY EQUALITY
link
link link link
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
4 32
4 25
4
DATA
32 49
8
DATA
16
DATA
20
DATA
25
DATA
27
DATA
30
DATA
32
DATA
42
DATA
43
DATA
44
DATA
49
DATA
62
DATA
64
DATA
8KB
RANGE SCAN
link
link link link
POSTGRESQL
TOAST TABLES
book_id title content
1 ‘Haikus’
‘In the twilight rain
these brilliant-hued
hibiscus - A lovely
sunset.’
2 ‘Moby Dick’
public.books
chunk_id chunk_seq chunk_data
16403 0 0xFF2F000000436
16403 1 0x6167F27521F25B
16403 2 0x23FB21030E6F6
16403 3 0x7974686108676
pg_toast.pg_toast_{$OID}
select relname
from pg_class
where oid = (
select reltoastrelid
from pg_class
where relname='books')
Find name of a
corresponding
TOAST table
if(compress(content).len > 2000)
HEADER
HEADER
HEADER
HEADER
HEADER
4 25
4
DATA
8
DATA
16
DATA
20
DATA
25
DATA
27
DATA
30
DATA
TUPLE IDs
link
Table Heap Index Storage
4
TID
8
TID
16
TID
20
TID
25
TID
27
TID
30
TID
32
TID
32
DATA
<block id, tuple offset>
select ctid from my_table
QUERY EXECUTION
select *
from books
where author_id = 10
SEQ SCAN
SEQ SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. (Hopefully) sequential I/O
2. Scans all table’s related pages
3. Doesn’t use index pages
create index on books(publication_date);
select publication_date
from books
where publication_date > ‘2020/01/01’
INDEX
ONLY
SCAN
INDEX ONLY
SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Sequential I/O over index pages
2. Doesn’t use table’s related pages
create index on books(publication_date);
select title, publication_date
from books
where publication_date > ‘2020/01/01’
INDEX
SCAN
INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…
INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…
2. Position read cursor on the first page…
INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…
2. Position read cursor on the first page…
3. Sequential I/O over all table’s pages until condition is done
create index on books
using gist(description_lex);
select title, publication_date
from books
where description_lex @@ ‘epic’
BITMAP
SCAN
BITMAP SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Using index create bitmap of matching pages
Bitmap
BITMAP SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Using index create bitmap of matching pages
2. Random I/O over pages covered by bitmap
Bitmap
INCLUDE & PARTIAL
INDEXES
create index ix_books_by_author
on books(author_id)
include (created_at)
where author_id is not null;
HEADER
HEADER
4
25
HEADER
HEADER
HEADER
4
DATA
7
DATA
13
DATA
16
DATA
19
DATA
25
DATA
32
DATA
47
DATA
61
DATA
4
TID
INC
16
TID
INC
25
TID
INC
32
TID
INC
duplicated
columns
Index
Storage
BTREE INDEX
create index ix_users_birthdate on users(birthdate desc);
COMPAR
ING
VECTOR
CLOCKS
B-TREE INDEX
1. Default
2. Access time: always O(logN)
3. Supports most of the index features
create index ix_users_birthdate
on users(birthdate desc);
HASH INDEX
create index ix_orders_no on orders using hash(order_no);
HEADER
HEADER
0x
01
0x
02
HEADER
HEADER
HEADER
Meta page
HASH INDEX
HEADER
Bucket 0
Overflow page Bitmap page
Bucket 1
4
TID
71
TID
13
TID
73
TID
42
TID
67
TID
86
TID
99
TID
5
TID
38
TID
7
TID
51
TID
14
TID
34
TID
66
TID
70
TID
72
TID
90
TID
79
TID
91
TID
82
TID
COMPAR
ING
VECTOR
CLOCKS
HASH INDEX
1. Access time: O(1) – O(N)
2. Doesn’t shrink in size*
create index ix_orders_no
on orders using hash(order_no);
BRIN INDEX
create index ih_events_created_at on events
using brin(created_at) with (pages_per_range = 128);
HEADER
HEADER
HEADER
2020/01/01
2020/03/14
2020/02/09
2020/11/10
2020/01/01
2020/01/21
0..127
BRIN INDEX
link
2020/01/21
2020/03/14
128..255
2020/02/09
2020/07/28
256..383
2020/03/17
2020/11/10
384..511
select tablename, attname, correlation
from pg_stats
where tablename = 'film'
tablename attname correlation
film film_id 0.9979791
film title 0.9979791
film description 0.04854498
film release_year 1
film rating 0.1953281
film last_update 1
film fulltext <null>
COLUMN-TUPLE
CORRELATION
COMPAR
ING
VECTOR
CLOCKS
BRIN INDEX
1. Imprecise
2. Very small in size
3. Good for columns aligned with tuple
insert order and immutable records
create index ih_events_created_at on events
using brin(created_at) with (pages_per_range = 128);
BLOOM INDEX
create index ix_active_codes
on active_codes using bloom(keycode)
with (length=80, col1=2);
BLOOM FILTER
INSERT(C) =
h1(C)/len | h2(C)/len .. | hn(C)/len
BLOOM FILTER
INSERT(C) =
h1(C)/len | h2(C)/len .. | hn(C)/len
INSERT(D) =
h1(D)/len | h2(D)/len .. | hn(D)/len
BLOOM FILTER
INSERT(C) =
h1(C)/len | h2(C)/len .. | hn(C)/len
INSERT(D) =
h1(D)/len | h2(D)/len .. | hn(D)/len
CONTAINS(A) =
h1(A)/len | h2(A)/len .. | hn(A)/len
FALSE
BLOOM FILTER
INSERT(C) =
h1(C)/len | h2(C)/len .. | hn(C)/len
INSERT(D) =
h1(D)/len | h2(D)/len .. | hn(D)/len
CONTAINS(A) =
h1(A)/len | h2(A)/len .. | hn(A)/len
CONTAINS(B) =
h1(B)/len | h2(B)/len .. | hn(B)/len
FALSE
MAYBE?
COMPAR
ING
VECTOR
CLOCKS
BLOOM INDEX
1. Small in size
2. Good for exclusion/narrowing
3. False positive ratio: hur.st/bloomfilter/
create extension bloom;
create index ix_active_codes
on active_codes using bloom(keycode)
with (length=80, col1=2);
number of bits per record
number of hashes for each
column
GiST INDEX
create index ix_books_content on books using gist(content_lex);
GiST INDEX
GEO POINTS
image: https://postgrespro.com/blog/pgsql/4175817
GiST INDEX
TSVECTOR
-- gist cannot be applied directly on text columns
alter table film add column
description_lex tsvector
generated always as (to_tsvector('english', description))
stored;
create index idx_film_description_lex
on film using gist(description_lex);
select * from film where description_lex @@ 'epic';
Bitmap Heap Scan on film (cost=4.18..20.32 rows=5 width=416)
Recheck Cond: (description_lex @@ '''epic'''::tsquery)
-> Bitmap Index Scan on idx_film_description_lex (cost=0.00..4.18 rows=5 width=0)
Index Cond: (description_lex @@ '''epic'''::tsquery)
Query Plan
GiST INDEX
TSVECTOR
HEADER
011011 110111
HEADER
010011 011010
HEADER
100110 110001
HEADER
aab
TID
aac
TID
aba
TID HEADER
adf
TID
azf
TID
bac
TID
HEADER
brc
TID
caa
TID
cdl
TID
HEADER
cff
TID
fre
TID
klm
TID
COMPAR
ING
VECTOR
CLOCKS
GiST INDEX
1. Supports specialized operators
2. Index is not updated during deletes
create index ix_books_content
on books using gist(content_lex);
SP-GiST INDEX
create index ix_files_path on files using spgist(path);
SP-GiST INDEX
GEO POINTS
image: https://postgrespro.com/blog/pgsql/4220639
SP-GiST INDEX
TSVECTOR
-- spgist can be created on text column but not on nvarchar
create index idx_film_title on film using spgist(title);
select * from film
where title like ‘A Fast-Paced% in New Orleans';
Bitmap Heap Scan on film (cost=8.66..79.03 rows=51 width=416)
Filter: (description ~~ 'A Fast-Paced%'::text)
-> Bitmap Index Scan on idx_film_title (cost=0.00..8.64 rows=50 width=0)
Index Cond: ((description ~>=~ 'A Fast-Paced'::text) AND (description ~<~ 'A Fast-Pacee'::text))
Query Plan
COMPAR
ING
VECTOR
CLOCKS
GiST INDEX
1. Just like GiST, but faster for some ops…
2. … but unable to perform some other
3. Indexed space is partitioned into non-
overlapping regions
create index ix_files_path
on files using spgist(path);
GIN INDEX
create index ix_books_content on books using gin(content_lex);
GIN INDEX -- gist cannot be applied directly on text columns
alter table film add column
description_lex tsvector
generated always as (to_tsvector('english', description))
stored;
create index idx_film_description_lex
on film using gin(description_lex);
select * from film where description_lex @@ 'epic';
Bitmap Heap Scan on film (cost=8.04..24.18 rows=5 width=416)
Recheck Cond: (description_lex @@ '''epic'''::tsquery)
-> Bitmap Index Scan on idx_film_description_lex (cost=0.00..8.04 rows=5 width=0)
Index Cond: (description_lex @@ '''epic'''::tsquery)
Query Plan
GIN INDEX
HEADER
ever gone
HEADER
HEADER
HEADER
HEADER
omit
call
(1,1)
(3,1)
dome
(1,2)
ever
(1,1)
(2,1)
faucet
(4,0)
gather
(2,1)
(3,2)
gone leather
(2,1)
omit
(2,1)
(3,2)
(2,2) (3,1)
HEADER
HEADER
(1,1)
(1,6)
(1,9)
(1,2) (1,3)
(1,7) (1,8)
(2,1) (2,2)
(2,3)
(2,6)
(2,8)
(2,5)
(2,7)
(3,1)
Posting list
Posting tree
COMPAR
ING
VECTOR
CLOCKS
GIN INDEX
create index ix_books_content
on books using gin(content_lex);
1. Reads usually faster than GiST
2. Writes are usually slower than GiST
3. Index size greater than GiST
RUM INDEX
create index ix_books_content on books using rum(content_lex);
RUM INDEX
HEADER
ever gone
HEADER
HEADER
HEADER
HEADER
omit
call
(1,1)
(3,1)
dome
(1,2)
ever
(1,1)
(2,1)
faucet
(4,0)
gather
(2,1)
(3,2)
gone leather
(2,1)
omit
(2,1)
(3,2)
(2,2) (3,1)
HEADER
HEADER
(1,1)
(1,6)
(1,9)
(1,2) (1,3)
(1,7) (1,8)
(2,1) (2,2)
1, 14
5
2, 5 25
1, 9
45 1, 21
9, 11
2, 10 45
1, 22
13
1
211
2, 4
2, 21
13
25
5
4, 7
(2,3)
(2,6)
(2,8)
(2,5)
(2,7)
(3,1)
2
3
1, 7
8, 13
17
1
RUM INDEX
-- similarity ranking
select description_lex <=> to_tsquery('epic’) as similarity
from books;
-- find description with 2 words located one after another
select * from books
where description_lex @@ to_tsquery(‘hello <-> world’);
COMPAR
ING
VECTOR
CLOCKS
RUM INDEX
1. GIN on steroids (bigger but more
capable)
2. Allows to query for terms and their
relative positions in text
3. Supports Index Scan and EXCLUDE
create extension rum;
create index ix_books_content
on books using rum(content_lex);
VECTOR DATABASE
1. Uses only lexical similarity
and is language-sensitive
2. Misses the context
(meaning)
3. Works only on text
LIMITS OF FULL-TEXT
SEARCH
select * from books
where description_lex @@
to_tsquery(‘white <-> castle’);
select * from books
where description_lex @@
to_tsquery(‘white <-> fortress’);
VECTOR
SIMILARITY
101
R G B
250 185 0
115 0 152
Embedding
EUCLIDEAN
DISTANCE
COSINE
DISTANCE
x
y
A
B
x
y
A
B
IVFFLAT VECTOR
INDEX
create index ix_books_content on books
using ivfflat(embedding content_lex) with (lists = 1000);
select * from items where embedding <-> ‘[3,1,2]’ < 5;
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using ivfflat(embedding vector_l2_ops)
with (lists=3);
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
lists
create index on items
using ivfflat(embedding vector_l2_ops)
with (lists=3);
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
centroids
create index on items
using ivfflat(embedding vector_l2_ops)
with (lists=3);
QUERYING
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
set ivfflat.probes = 2;
select * from items
where embedding <-> ‘[1,2]’ < 5;
[1,2]
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
set ivfflat.probes = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
[1,2]
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
[1,2]
set ivfflat.probes = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
[1,2]
set ivfflat.probes = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
[1,2]
set ivfflat.probes = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
HNSW VECTOR INDEX
create index on items using hnsw (embedding vector_cosine_ops)
with (m = 16, ef_construction = 64);
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using hnsw(embedding vector_l2_ops)
with (m=2, ef_construction=3);
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using hnsw(embedding vector_l2_ops)
with (m=2, ef_construction=3);
Layer 1
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using hnsw(embedding vector_l2_ops)
with (m=3, ef_construction=3);
Layer 1
Layer 2
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using hnsw(embedding vector_l2_ops)
with (m=3, ef_construction=3);
Layer 1
Layer 2
Layer 3
QUERYING
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
1. Fast build time
2. Smaller size
3. Slower query
performance
4. Bad for frequent index
updates
1. Slow initial build time
2. Bigger index size
3. Faster performance
4. Better recall after updates
IVFFLAT
INDEX
HNSW
INDEX
SUMMARY
THANK YOU

More Related Content

Similar to Postgres indexes: how to make them work for your application

Index management in depth
Index management in depthIndex management in depth
Index management in depthAndrea Giuliano
 
lect- 3&4.ppt
lect- 3&4.pptlect- 3&4.ppt
lect- 3&4.pptmrizwan38
 
Performing Data Science with HBase
Performing Data Science with HBasePerforming Data Science with HBase
Performing Data Science with HBaseWibiData
 
Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL. Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL. Anastasia Lubennikova
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamZheng Shao
 
Fundamentals of data structures
Fundamentals of data structuresFundamentals of data structures
Fundamentals of data structuresNiraj Agarwal
 
Data structures question paper anna university
Data structures question paper anna universityData structures question paper anna university
Data structures question paper anna universitysangeethajames07
 
Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01Michael Mathioudakis
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practiceJano Suchal
 
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2Antonios Giannopoulos
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
 
How mysql choose the execution plan
How mysql choose the execution planHow mysql choose the execution plan
How mysql choose the execution plan辛鹤 李
 
R tools for HiC data visualization
R tools for HiC data visualizationR tools for HiC data visualization
R tools for HiC data visualizationtuxette
 
Nutch and lucene_framework
Nutch and lucene_frameworkNutch and lucene_framework
Nutch and lucene_frameworksamuelhard
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesKeshav Murthy
 

Similar to Postgres indexes: how to make them work for your application (20)

Index management in depth
Index management in depthIndex management in depth
Index management in depth
 
lect- 3&4.ppt
lect- 3&4.pptlect- 3&4.ppt
lect- 3&4.ppt
 
Performing Data Science with HBase
Performing Data Science with HBasePerforming Data Science with HBase
Performing Data Science with HBase
 
DATABASE MANAGEMENT SYSTEM LAB.pdf
DATABASE MANAGEMENT SYSTEM LAB.pdfDATABASE MANAGEMENT SYSTEM LAB.pdf
DATABASE MANAGEMENT SYSTEM LAB.pdf
 
Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL. Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL.
 
Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
 
Fundamentals of data structures
Fundamentals of data structuresFundamentals of data structures
Fundamentals of data structures
 
Data structures question paper anna university
Data structures question paper anna universityData structures question paper anna university
Data structures question paper anna university
 
Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01
 
CS8391 Data Structures Part B Questions Anna University
CS8391 Data Structures Part B Questions Anna UniversityCS8391 Data Structures Part B Questions Anna University
CS8391 Data Structures Part B Questions Anna University
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practice
 
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
How mysql choose the execution plan
How mysql choose the execution planHow mysql choose the execution plan
How mysql choose the execution plan
 
R tools for HiC data visualization
R tools for HiC data visualizationR tools for HiC data visualization
R tools for HiC data visualization
 
Nutch and lucene_framework
Nutch and lucene_frameworkNutch and lucene_framework
Nutch and lucene_framework
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql Features
 
Less08 Schema
Less08 SchemaLess08 Schema
Less08 Schema
 
Chapter 5 ds
Chapter 5 dsChapter 5 ds
Chapter 5 ds
 

More from Bartosz Sypytkowski

How do databases perform live backups and point-in-time recovery
How do databases perform live backups and point-in-time recoveryHow do databases perform live backups and point-in-time recovery
How do databases perform live backups and point-in-time recoveryBartosz Sypytkowski
 
Scaling connections in peer-to-peer applications
Scaling connections in peer-to-peer applicationsScaling connections in peer-to-peer applications
Scaling connections in peer-to-peer applicationsBartosz Sypytkowski
 
Rich collaborative data structures for everyone
Rich collaborative data structures for everyoneRich collaborative data structures for everyone
Rich collaborative data structures for everyoneBartosz Sypytkowski
 
Behind modern concurrency primitives
Behind modern concurrency primitivesBehind modern concurrency primitives
Behind modern concurrency primitivesBartosz Sypytkowski
 
Behind modern concurrency primitives
Behind modern concurrency primitivesBehind modern concurrency primitives
Behind modern concurrency primitivesBartosz Sypytkowski
 
Living in eventually consistent reality
Living in eventually consistent realityLiving in eventually consistent reality
Living in eventually consistent realityBartosz Sypytkowski
 
Virtual machines - how they work
Virtual machines - how they workVirtual machines - how they work
Virtual machines - how they workBartosz Sypytkowski
 
Akka.NET streams and reactive streams
Akka.NET streams and reactive streamsAkka.NET streams and reactive streams
Akka.NET streams and reactive streamsBartosz Sypytkowski
 
GraphQL - an elegant weapon... for more civilized age
GraphQL - an elegant weapon... for more civilized ageGraphQL - an elegant weapon... for more civilized age
GraphQL - an elegant weapon... for more civilized ageBartosz Sypytkowski
 

More from Bartosz Sypytkowski (13)

How do databases perform live backups and point-in-time recovery
How do databases perform live backups and point-in-time recoveryHow do databases perform live backups and point-in-time recovery
How do databases perform live backups and point-in-time recovery
 
Scaling connections in peer-to-peer applications
Scaling connections in peer-to-peer applicationsScaling connections in peer-to-peer applications
Scaling connections in peer-to-peer applications
 
Rich collaborative data structures for everyone
Rich collaborative data structures for everyoneRich collaborative data structures for everyone
Rich collaborative data structures for everyone
 
Behind modern concurrency primitives
Behind modern concurrency primitivesBehind modern concurrency primitives
Behind modern concurrency primitives
 
Collaborative eventsourcing
Collaborative eventsourcingCollaborative eventsourcing
Collaborative eventsourcing
 
Behind modern concurrency primitives
Behind modern concurrency primitivesBehind modern concurrency primitives
Behind modern concurrency primitives
 
Living in eventually consistent reality
Living in eventually consistent realityLiving in eventually consistent reality
Living in eventually consistent reality
 
Virtual machines - how they work
Virtual machines - how they workVirtual machines - how they work
Virtual machines - how they work
 
Short story of time
Short story of timeShort story of time
Short story of time
 
Akka.NET streams and reactive streams
Akka.NET streams and reactive streamsAkka.NET streams and reactive streams
Akka.NET streams and reactive streams
 
Collaborative text editing
Collaborative text editingCollaborative text editing
Collaborative text editing
 
The last mile from db to disk
The last mile from db to diskThe last mile from db to disk
The last mile from db to disk
 
GraphQL - an elegant weapon... for more civilized age
GraphQL - an elegant weapon... for more civilized ageGraphQL - an elegant weapon... for more civilized age
GraphQL - an elegant weapon... for more civilized age
 

Recently uploaded

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 

Recently uploaded (20)

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 

Postgres indexes: how to make them work for your application

  • 2. INTRODUCTION Bartosz Sypytkowski ▪ @Horusiath ▪ b.sypytkowski@gmail.com ▪ bartoszsypytkowski.com
  • 3.  B+Tree: how databases manage data?  How indexes work  Query plans & execution  Types of indexes  Vector search AGENDA
  • 5. HEADER HEADER HEADER HEADER HEADER HEADER HEADER 4 32 4 25 4 DATA 32 49 8 DATA 16 DATA 20 DATA 25 DATA 27 DATA 30 DATA 32 DATA 42 DATA 43 DATA 44 DATA 49 DATA 62 DATA 64 DATA 8KB B+TREE link link link link
  • 6. HEADER HEADER HEADER HEADER HEADER HEADER HEADER 4 32 4 25 4 DATA 32 49 8 DATA 16 DATA 20 DATA 25 DATA 27 DATA 30 DATA 32 DATA 42 DATA 43 DATA 49 DATA 49 DATA 62 DATA 64 DATA 8KB SEARCH BY EQUALITY link link link link
  • 7. HEADER HEADER HEADER HEADER HEADER HEADER HEADER 4 32 4 25 4 DATA 32 49 8 DATA 16 DATA 20 DATA 25 DATA 27 DATA 30 DATA 32 DATA 42 DATA 43 DATA 44 DATA 49 DATA 62 DATA 64 DATA 8KB RANGE SCAN link link link link
  • 8. POSTGRESQL TOAST TABLES book_id title content 1 ‘Haikus’ ‘In the twilight rain these brilliant-hued hibiscus - A lovely sunset.’ 2 ‘Moby Dick’ public.books chunk_id chunk_seq chunk_data 16403 0 0xFF2F000000436 16403 1 0x6167F27521F25B 16403 2 0x23FB21030E6F6 16403 3 0x7974686108676 pg_toast.pg_toast_{$OID} select relname from pg_class where oid = ( select reltoastrelid from pg_class where relname='books') Find name of a corresponding TOAST table if(compress(content).len > 2000)
  • 9. HEADER HEADER HEADER HEADER HEADER 4 25 4 DATA 8 DATA 16 DATA 20 DATA 25 DATA 27 DATA 30 DATA TUPLE IDs link Table Heap Index Storage 4 TID 8 TID 16 TID 20 TID 25 TID 27 TID 30 TID 32 TID 32 DATA <block id, tuple offset> select ctid from my_table
  • 11. select * from books where author_id = 10 SEQ SCAN
  • 12. SEQ SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. (Hopefully) sequential I/O 2. Scans all table’s related pages 3. Doesn’t use index pages
  • 13. create index on books(publication_date); select publication_date from books where publication_date > ‘2020/01/01’ INDEX ONLY SCAN
  • 14. INDEX ONLY SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Sequential I/O over index pages 2. Doesn’t use table’s related pages
  • 15. create index on books(publication_date); select title, publication_date from books where publication_date > ‘2020/01/01’ INDEX SCAN
  • 16. INDEX SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Uses index to find a first page of the related table…
  • 17. INDEX SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Uses index to find a first page of the related table… 2. Position read cursor on the first page…
  • 18. INDEX SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Uses index to find a first page of the related table… 2. Position read cursor on the first page… 3. Sequential I/O over all table’s pages until condition is done
  • 19. create index on books using gist(description_lex); select title, publication_date from books where description_lex @@ ‘epic’ BITMAP SCAN
  • 20. BITMAP SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Using index create bitmap of matching pages Bitmap
  • 21. BITMAP SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Using index create bitmap of matching pages 2. Random I/O over pages covered by bitmap Bitmap
  • 22. INCLUDE & PARTIAL INDEXES create index ix_books_by_author on books(author_id) include (created_at) where author_id is not null; HEADER HEADER 4 25 HEADER HEADER HEADER 4 DATA 7 DATA 13 DATA 16 DATA 19 DATA 25 DATA 32 DATA 47 DATA 61 DATA 4 TID INC 16 TID INC 25 TID INC 32 TID INC duplicated columns Index Storage
  • 23. BTREE INDEX create index ix_users_birthdate on users(birthdate desc);
  • 24. COMPAR ING VECTOR CLOCKS B-TREE INDEX 1. Default 2. Access time: always O(logN) 3. Supports most of the index features create index ix_users_birthdate on users(birthdate desc);
  • 25. HASH INDEX create index ix_orders_no on orders using hash(order_no);
  • 26. HEADER HEADER 0x 01 0x 02 HEADER HEADER HEADER Meta page HASH INDEX HEADER Bucket 0 Overflow page Bitmap page Bucket 1 4 TID 71 TID 13 TID 73 TID 42 TID 67 TID 86 TID 99 TID 5 TID 38 TID 7 TID 51 TID 14 TID 34 TID 66 TID 70 TID 72 TID 90 TID 79 TID 91 TID 82 TID
  • 27. COMPAR ING VECTOR CLOCKS HASH INDEX 1. Access time: O(1) – O(N) 2. Doesn’t shrink in size* create index ix_orders_no on orders using hash(order_no);
  • 28. BRIN INDEX create index ih_events_created_at on events using brin(created_at) with (pages_per_range = 128);
  • 30. select tablename, attname, correlation from pg_stats where tablename = 'film' tablename attname correlation film film_id 0.9979791 film title 0.9979791 film description 0.04854498 film release_year 1 film rating 0.1953281 film last_update 1 film fulltext <null> COLUMN-TUPLE CORRELATION
  • 31. COMPAR ING VECTOR CLOCKS BRIN INDEX 1. Imprecise 2. Very small in size 3. Good for columns aligned with tuple insert order and immutable records create index ih_events_created_at on events using brin(created_at) with (pages_per_range = 128);
  • 32. BLOOM INDEX create index ix_active_codes on active_codes using bloom(keycode) with (length=80, col1=2);
  • 33. BLOOM FILTER INSERT(C) = h1(C)/len | h2(C)/len .. | hn(C)/len
  • 34. BLOOM FILTER INSERT(C) = h1(C)/len | h2(C)/len .. | hn(C)/len INSERT(D) = h1(D)/len | h2(D)/len .. | hn(D)/len
  • 35. BLOOM FILTER INSERT(C) = h1(C)/len | h2(C)/len .. | hn(C)/len INSERT(D) = h1(D)/len | h2(D)/len .. | hn(D)/len CONTAINS(A) = h1(A)/len | h2(A)/len .. | hn(A)/len FALSE
  • 36. BLOOM FILTER INSERT(C) = h1(C)/len | h2(C)/len .. | hn(C)/len INSERT(D) = h1(D)/len | h2(D)/len .. | hn(D)/len CONTAINS(A) = h1(A)/len | h2(A)/len .. | hn(A)/len CONTAINS(B) = h1(B)/len | h2(B)/len .. | hn(B)/len FALSE MAYBE?
  • 37. COMPAR ING VECTOR CLOCKS BLOOM INDEX 1. Small in size 2. Good for exclusion/narrowing 3. False positive ratio: hur.st/bloomfilter/ create extension bloom; create index ix_active_codes on active_codes using bloom(keycode) with (length=80, col1=2); number of bits per record number of hashes for each column
  • 38. GiST INDEX create index ix_books_content on books using gist(content_lex);
  • 39. GiST INDEX GEO POINTS image: https://postgrespro.com/blog/pgsql/4175817
  • 40. GiST INDEX TSVECTOR -- gist cannot be applied directly on text columns alter table film add column description_lex tsvector generated always as (to_tsvector('english', description)) stored; create index idx_film_description_lex on film using gist(description_lex); select * from film where description_lex @@ 'epic'; Bitmap Heap Scan on film (cost=4.18..20.32 rows=5 width=416) Recheck Cond: (description_lex @@ '''epic'''::tsquery) -> Bitmap Index Scan on idx_film_description_lex (cost=0.00..4.18 rows=5 width=0) Index Cond: (description_lex @@ '''epic'''::tsquery) Query Plan
  • 41. GiST INDEX TSVECTOR HEADER 011011 110111 HEADER 010011 011010 HEADER 100110 110001 HEADER aab TID aac TID aba TID HEADER adf TID azf TID bac TID HEADER brc TID caa TID cdl TID HEADER cff TID fre TID klm TID
  • 42. COMPAR ING VECTOR CLOCKS GiST INDEX 1. Supports specialized operators 2. Index is not updated during deletes create index ix_books_content on books using gist(content_lex);
  • 43. SP-GiST INDEX create index ix_files_path on files using spgist(path);
  • 44. SP-GiST INDEX GEO POINTS image: https://postgrespro.com/blog/pgsql/4220639
  • 45. SP-GiST INDEX TSVECTOR -- spgist can be created on text column but not on nvarchar create index idx_film_title on film using spgist(title); select * from film where title like ‘A Fast-Paced% in New Orleans'; Bitmap Heap Scan on film (cost=8.66..79.03 rows=51 width=416) Filter: (description ~~ 'A Fast-Paced%'::text) -> Bitmap Index Scan on idx_film_title (cost=0.00..8.64 rows=50 width=0) Index Cond: ((description ~>=~ 'A Fast-Paced'::text) AND (description ~<~ 'A Fast-Pacee'::text)) Query Plan
  • 46. COMPAR ING VECTOR CLOCKS GiST INDEX 1. Just like GiST, but faster for some ops… 2. … but unable to perform some other 3. Indexed space is partitioned into non- overlapping regions create index ix_files_path on files using spgist(path);
  • 47. GIN INDEX create index ix_books_content on books using gin(content_lex);
  • 48. GIN INDEX -- gist cannot be applied directly on text columns alter table film add column description_lex tsvector generated always as (to_tsvector('english', description)) stored; create index idx_film_description_lex on film using gin(description_lex); select * from film where description_lex @@ 'epic'; Bitmap Heap Scan on film (cost=8.04..24.18 rows=5 width=416) Recheck Cond: (description_lex @@ '''epic'''::tsquery) -> Bitmap Index Scan on idx_film_description_lex (cost=0.00..8.04 rows=5 width=0) Index Cond: (description_lex @@ '''epic'''::tsquery) Query Plan
  • 49. GIN INDEX HEADER ever gone HEADER HEADER HEADER HEADER omit call (1,1) (3,1) dome (1,2) ever (1,1) (2,1) faucet (4,0) gather (2,1) (3,2) gone leather (2,1) omit (2,1) (3,2) (2,2) (3,1) HEADER HEADER (1,1) (1,6) (1,9) (1,2) (1,3) (1,7) (1,8) (2,1) (2,2) (2,3) (2,6) (2,8) (2,5) (2,7) (3,1) Posting list Posting tree
  • 50. COMPAR ING VECTOR CLOCKS GIN INDEX create index ix_books_content on books using gin(content_lex); 1. Reads usually faster than GiST 2. Writes are usually slower than GiST 3. Index size greater than GiST
  • 51. RUM INDEX create index ix_books_content on books using rum(content_lex);
  • 52. RUM INDEX HEADER ever gone HEADER HEADER HEADER HEADER omit call (1,1) (3,1) dome (1,2) ever (1,1) (2,1) faucet (4,0) gather (2,1) (3,2) gone leather (2,1) omit (2,1) (3,2) (2,2) (3,1) HEADER HEADER (1,1) (1,6) (1,9) (1,2) (1,3) (1,7) (1,8) (2,1) (2,2) 1, 14 5 2, 5 25 1, 9 45 1, 21 9, 11 2, 10 45 1, 22 13 1 211 2, 4 2, 21 13 25 5 4, 7 (2,3) (2,6) (2,8) (2,5) (2,7) (3,1) 2 3 1, 7 8, 13 17 1
  • 53. RUM INDEX -- similarity ranking select description_lex <=> to_tsquery('epic’) as similarity from books; -- find description with 2 words located one after another select * from books where description_lex @@ to_tsquery(‘hello <-> world’);
  • 54. COMPAR ING VECTOR CLOCKS RUM INDEX 1. GIN on steroids (bigger but more capable) 2. Allows to query for terms and their relative positions in text 3. Supports Index Scan and EXCLUDE create extension rum; create index ix_books_content on books using rum(content_lex);
  • 56. 1. Uses only lexical similarity and is language-sensitive 2. Misses the context (meaning) 3. Works only on text LIMITS OF FULL-TEXT SEARCH select * from books where description_lex @@ to_tsquery(‘white <-> castle’); select * from books where description_lex @@ to_tsquery(‘white <-> fortress’);
  • 57. VECTOR SIMILARITY 101 R G B 250 185 0 115 0 152 Embedding
  • 59. IVFFLAT VECTOR INDEX create index ix_books_content on books using ivfflat(embedding content_lex) with (lists = 1000); select * from items where embedding <-> ‘[3,1,2]’ < 5;
  • 60. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using ivfflat(embedding vector_l2_ops) with (lists=3);
  • 61. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 lists create index on items using ivfflat(embedding vector_l2_ops) with (lists=3);
  • 62. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 centroids create index on items using ivfflat(embedding vector_l2_ops) with (lists=3);
  • 64. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 set ivfflat.probes = 2; select * from items where embedding <-> ‘[1,2]’ < 5; [1,2]
  • 65. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 set ivfflat.probes = 2; select * from items order by embedding <-> ‘[1,2]’ take 4; [1,2]
  • 66. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 [1,2] set ivfflat.probes = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 67. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 [1,2] set ivfflat.probes = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 68. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 [1,2] set ivfflat.probes = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 69. HNSW VECTOR INDEX create index on items using hnsw (embedding vector_cosine_ops) with (m = 16, ef_construction = 64);
  • 70. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using hnsw(embedding vector_l2_ops) with (m=2, ef_construction=3);
  • 71. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using hnsw(embedding vector_l2_ops) with (m=2, ef_construction=3); Layer 1
  • 72. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using hnsw(embedding vector_l2_ops) with (m=3, ef_construction=3); Layer 1 Layer 2
  • 73. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using hnsw(embedding vector_l2_ops) with (m=3, ef_construction=3); Layer 1 Layer 2 Layer 3
  • 75. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 76. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 77. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 78. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 79. INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 80. 1. Fast build time 2. Smaller size 3. Slower query performance 4. Bad for frequent index updates 1. Slow initial build time 2. Bigger index size 3. Faster performance 4. Better recall after updates IVFFLAT INDEX HNSW INDEX