Indexes are one of the most crucial structures of any relational database. In this talk we'll explain how to use them efficiently, how to read query plans and what do they mean for us. We'll also cover a variety of different indexing structures available in PostgreSQL database and build up some intuition about which one to pick depending on the situation.
12. SEQ SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. (Hopefully) sequential I/O
2. Scans all table’s related pages
3. Doesn’t use index pages
13. create index on books(publication_date);
select publication_date
from books
where publication_date > ‘2020/01/01’
INDEX
ONLY
SCAN
14. INDEX ONLY
SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Sequential I/O over index pages
2. Doesn’t use table’s related pages
15. create index on books(publication_date);
select title, publication_date
from books
where publication_date > ‘2020/01/01’
INDEX
SCAN
16. INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…
17. INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…
2. Position read cursor on the first page…
18. INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…
2. Position read cursor on the first page…
3. Sequential I/O over all table’s pages until condition is done
19. create index on books
using gist(description_lex);
select title, publication_date
from books
where description_lex @@ ‘epic’
BITMAP
SCAN
20. BITMAP SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Using index create bitmap of matching pages
Bitmap
21. BITMAP SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Using index create bitmap of matching pages
2. Random I/O over pages covered by bitmap
Bitmap
22. INCLUDE & PARTIAL
INDEXES
create index ix_books_by_author
on books(author_id)
include (created_at)
where author_id is not null;
HEADER
HEADER
4
25
HEADER
HEADER
HEADER
4
DATA
7
DATA
13
DATA
16
DATA
19
DATA
25
DATA
32
DATA
47
DATA
61
DATA
4
TID
INC
16
TID
INC
25
TID
INC
32
TID
INC
duplicated
columns
Index
Storage
30. select tablename, attname, correlation
from pg_stats
where tablename = 'film'
tablename attname correlation
film film_id 0.9979791
film title 0.9979791
film description 0.04854498
film release_year 1
film rating 0.1953281
film last_update 1
film fulltext <null>
COLUMN-TUPLE
CORRELATION
31. COMPAR
ING
VECTOR
CLOCKS
BRIN INDEX
1. Imprecise
2. Very small in size
3. Good for columns aligned with tuple
insert order and immutable records
create index ih_events_created_at on events
using brin(created_at) with (pages_per_range = 128);
32. BLOOM INDEX
create index ix_active_codes
on active_codes using bloom(keycode)
with (length=80, col1=2);
37. COMPAR
ING
VECTOR
CLOCKS
BLOOM INDEX
1. Small in size
2. Good for exclusion/narrowing
3. False positive ratio: hur.st/bloomfilter/
create extension bloom;
create index ix_active_codes
on active_codes using bloom(keycode)
with (length=80, col1=2);
number of bits per record
number of hashes for each
column
40. GiST INDEX
TSVECTOR
-- gist cannot be applied directly on text columns
alter table film add column
description_lex tsvector
generated always as (to_tsvector('english', description))
stored;
create index idx_film_description_lex
on film using gist(description_lex);
select * from film where description_lex @@ 'epic';
Bitmap Heap Scan on film (cost=4.18..20.32 rows=5 width=416)
Recheck Cond: (description_lex @@ '''epic'''::tsquery)
-> Bitmap Index Scan on idx_film_description_lex (cost=0.00..4.18 rows=5 width=0)
Index Cond: (description_lex @@ '''epic'''::tsquery)
Query Plan
45. SP-GiST INDEX
TSVECTOR
-- spgist can be created on text column but not on nvarchar
create index idx_film_title on film using spgist(title);
select * from film
where title like ‘A Fast-Paced% in New Orleans';
Bitmap Heap Scan on film (cost=8.66..79.03 rows=51 width=416)
Filter: (description ~~ 'A Fast-Paced%'::text)
-> Bitmap Index Scan on idx_film_title (cost=0.00..8.64 rows=50 width=0)
Index Cond: ((description ~>=~ 'A Fast-Paced'::text) AND (description ~<~ 'A Fast-Pacee'::text))
Query Plan
46. COMPAR
ING
VECTOR
CLOCKS
GiST INDEX
1. Just like GiST, but faster for some ops…
2. … but unable to perform some other
3. Indexed space is partitioned into non-
overlapping regions
create index ix_files_path
on files using spgist(path);
48. GIN INDEX -- gist cannot be applied directly on text columns
alter table film add column
description_lex tsvector
generated always as (to_tsvector('english', description))
stored;
create index idx_film_description_lex
on film using gin(description_lex);
select * from film where description_lex @@ 'epic';
Bitmap Heap Scan on film (cost=8.04..24.18 rows=5 width=416)
Recheck Cond: (description_lex @@ '''epic'''::tsquery)
-> Bitmap Index Scan on idx_film_description_lex (cost=0.00..8.04 rows=5 width=0)
Index Cond: (description_lex @@ '''epic'''::tsquery)
Query Plan
50. COMPAR
ING
VECTOR
CLOCKS
GIN INDEX
create index ix_books_content
on books using gin(content_lex);
1. Reads usually faster than GiST
2. Writes are usually slower than GiST
3. Index size greater than GiST
53. RUM INDEX
-- similarity ranking
select description_lex <=> to_tsquery('epic’) as similarity
from books;
-- find description with 2 words located one after another
select * from books
where description_lex @@ to_tsquery(‘hello <-> world’);
54. COMPAR
ING
VECTOR
CLOCKS
RUM INDEX
1. GIN on steroids (bigger but more
capable)
2. Allows to query for terms and their
relative positions in text
3. Supports Index Scan and EXCLUDE
create extension rum;
create index ix_books_content
on books using rum(content_lex);
56. 1. Uses only lexical similarity
and is language-sensitive
2. Misses the context
(meaning)
3. Works only on text
LIMITS OF FULL-TEXT
SEARCH
select * from books
where description_lex @@
to_tsquery(‘white <-> castle’);
select * from books
where description_lex @@
to_tsquery(‘white <-> fortress’);
59. IVFFLAT VECTOR
INDEX
create index ix_books_content on books
using ivfflat(embedding content_lex) with (lists = 1000);
select * from items where embedding <-> ‘[3,1,2]’ < 5;
60. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using ivfflat(embedding vector_l2_ops)
with (lists=3);
61. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
lists
create index on items
using ivfflat(embedding vector_l2_ops)
with (lists=3);
62. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
centroids
create index on items
using ivfflat(embedding vector_l2_ops)
with (lists=3);
64. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
set ivfflat.probes = 2;
select * from items
where embedding <-> ‘[1,2]’ < 5;
[1,2]
65. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
set ivfflat.probes = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
[1,2]
66. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
[1,2]
set ivfflat.probes = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
67. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
[1,2]
set ivfflat.probes = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
68. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
[1,2]
set ivfflat.probes = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
69. HNSW VECTOR INDEX
create index on items using hnsw (embedding vector_cosine_ops)
with (m = 16, ef_construction = 64);
70. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using hnsw(embedding vector_l2_ops)
with (m=2, ef_construction=3);
71. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using hnsw(embedding vector_l2_ops)
with (m=2, ef_construction=3);
Layer 1
72. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using hnsw(embedding vector_l2_ops)
with (m=3, ef_construction=3);
Layer 1
Layer 2
73. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using hnsw(embedding vector_l2_ops)
with (m=3, ef_construction=3);
Layer 1
Layer 2
Layer 3
75. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
76. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
77. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
78. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
79. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
80. 1. Fast build time
2. Smaller size
3. Slower query
performance
4. Bad for frequent index
updates
1. Slow initial build time
2. Bigger index size
3. Faster performance
4. Better recall after updates
IVFFLAT
INDEX
HNSW
INDEX