PGroonga is fast and flexible full text search extension for PostgreSQL. Zulip is a chat tool that uses PostgreSQL and PGroonga. This talk describes why PGroonga is suitable for Zulip.
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
PGroonga & Zulip
1. PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga
&
Zulip
Kouhei Sutou ClearCode Inc.
Zulip & PGroonga Night
2017-09-06
2. PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga
Pronunciation: píːzí:lúnɡά
読み方:ぴーじーるんが
PostgreSQL extension
PostgreSQLの拡張機能
Fast full text search
高速全文検索機能
All languages are supported!
全言語対応!
3. PGroonga & Zulip Powered by Rabbit 2.2.1
Fast?(高速?)
Need to measure to confirm
確認するには測定しないと
Targets(測定対象)
textsearch (built-in)(組み込み)
pg_bigm (third party)(外部プロダク
ト)
4. PGroonga & Zulip Powered by Rabbit 2.2.1
PGroona and textsearch
0
0.2
0.4
0.6
0.8
1
1.2
1.4
PostgreSQL OR MySQL database America
Data: English Wikipedia
(Many records and large docs)
N records: About 5.3millions
Average text size: 6.4KiB
Elapsedtime(ms)
(Shorterisbetter)
Query
PGroonga textsearch
5. PGroonga & Zulip Powered by Rabbit 2.2.1
As fast as textsearch
textsearchと同じくらいの速さ
textsearch uses word based
full text search
textsearchは単語ベースの全文検索実装
PostgreSQL has enough
performance for the approach
PostgreSQLはこの方法では十分な性能を出せる
6. PGroonga & Zulip Powered by Rabbit 2.2.1
textsearch and Japanese
textsearchと日本語
Asian languages including
Japanese aren't supported
日本語を含むアジア圏の言語は非サポート
Need plugin(プラグインが必要)
Plugin exists but isn't
maintained
プラグインはあるがメンテナンスされていない
7. PGroonga & Zulip Powered by Rabbit 2.2.1
Japanese support
日本語対応
Need one of them(どちらかが必要)
N-gram approach support
N-gramというやり方のサポート
Japanese specific word based
approach support
日本語を考慮した単語ベースのやり方のサポート
PGroonga supports both of them
8. PGroonga & Zulip Powered by Rabbit 2.2.1
PostgreSQL and N-gram
PostgreSQLとN-gram
PostgreSQL is slow with N-
gram approach
PostgreSQLでN-gramというやり方を使うと遅い
N-gram approach:
pg_trgm (contrib)
Japanese isn't supported by default
デフォルトでは日本語に対応していない
pg_bigm (third-party)
9. PGroonga & Zulip Powered by Rabbit 2.2.1
PGroona and pg_bigm
0
0.5
1
1.5
2
2.5
3
311 14706 20389
Data: Japanese Wikipedia
(Many records and large documents)
N records: About 0.9millions
Average text size: 6.7KiB
Fast Fast
Elapsedtime(sec)
(Lowerisbetter)
N hits
PGroonga pg_bigm
10. PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga is fast stably
PGroongaは安定して速い
PostgreSQL needs "recheck"
for N-gram approach
PostgreSQLはN-gramのときは「recheck」が必要
Seq search after index search
インデックスサーチのあとにシーケンシャルサーチ
PGroonga doesn't need
PGroongaでは必要ない
Only index search
インデックスサーチだけでOK
11. PGroonga & Zulip Powered by Rabbit 2.2.1
Wrap up
まとめ
textsearch is fast but
Asian langs aren't supported
textsearchは速いけどアジア圏の言語を未サポート
pg_bigm supports Japanese
but is slow for large hits
pg_bigmは日本語対応だがヒット数が多くなると遅い
PGroonga is fast and
supports all languages
PGroongaは速くて全言語対応
12. PGroonga & Zulip Powered by Rabbit 2.2.1
FYI: textsearch, PGroonga
and Groonga
0
0.2
0.4
0.6
0.8
1
1.2
1.4
PostgreSQL OR MySQL database America
Data: English Wikipedia
(Many records and large docs)
N records: About 5.3millions
Average text size: 6.4KiB
Groonga is 30x faster than others
Elapsedtime(ms)
(Shorterisbetter)
Query
PGroonga Groonga textsearch
13. PGroonga & Zulip Powered by Rabbit 2.2.1
Zulip and PGroonga
Zulip uses textsearch by
default
Zulipはデフォルトでtextsearchを使用
Japanese isn't supported
日本語非対応
Zulip supports PGroonga as
option
ZulipでPGroongaも使うこともできる
Implemented by me
私が実装
14. PGroonga & Zulip Powered by Rabbit 2.2.1
Zulip: full text search
Zulipと全文検索
Zulip is chat tool
Zulipはチャットツール
Latency is important for UX
UX的にレイテンシーは大事
Index update is heavy
インデックスの更新は重い
Delay index update
インデックスの更新を後回しにしている
15. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
CREATE TABLE zerver_message (
rendered_content text,
-- ... ↓Column for full text search
search_tsvector tsvector
); -- ↓Index for full text search
CREATE INDEX zerver_message_search_tsvector
ON zerver_message
USING gin (search_tsvector);
16. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Execute append_to_fts_update_log() on change
CREATE TRIGGER
zerver_message_update_search_tsvector_async
BEFORE INSERT OR UPDATE OF rendered_content
ON zerver_message
FOR EACH ROW
EXECUTE PROCEDURE append_to_fts_update_log();
17. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Insert ID to fts_update_log table
CREATE FUNCTION append_to_fts_update_log()
RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
INSERT INTO fts_update_log (message_id)
VALUES (NEW.id);
RETURN NEW;
END
$$;
18. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Keep ID to be updated
CREATE TABLE fts_update_log (
id SERIAL PRIMARY KEY,
message_id INTEGER NOT NULL
);
19. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Execute do_notify_fts_update_log()
-- on INSERT
CREATE TRIGGER fts_update_log_notify
AFTER INSERT ON fts_update_log
FOR EACH STATEMENT
EXECUTE PROCEDURE
do_notify_fts_update_log();
20. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- NOTIFY to fts_update_log channel!
CREATE FUNCTION do_notify_fts_update_log()
RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
NOTIFY fts_update_log;
RETURN NEW;
END
$$;
21. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
cursor.execute("LISTEN ftp_update_log") # Wait
cursor.execute("SELECT id, message_id FROM fts_update_log")
ids = []
for (id, message_id) in cursor.fetchall():
cursor.execute("UPDATE zerver_message SET search_tsvector = "
"to_tsvector('zulip.english_us_search', "
"rendered_content) "
"WHERE id = %s", (message_id,))
ids.append(id)
cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)",
(ids,))
22. PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga: index update
PGroongaとインデックス更新
PGroonga's index update is
fast too
PGroongaはインデックス更新も速い
PGroonga's search while
index update is still fast
PGroongaはインデックス更新中の検索も速い
23. PGroonga & Zulip Powered by Rabbit 2.2.1
Perf characteristics
性能の傾向
Searchthroughput
Update throughput
PGroonga
Searchthroughput
Update throughput
GIN
Keep
search performance
while many updates
Decrease
search performance
while updating
24. PGroonga & Zulip Powered by Rabbit 2.2.1
Update and lock
更新とロック
Update without read locks
参照ロックなしで更新
Write locks are required
書き込みロックは必要
27. PGroonga & Zulip Powered by Rabbit 2.2.1
Wrap up
まとめ
Zulip: Low latency for UX
ZulipはUXのために低レイテンシーをがんばっている
Delay index update
インデックスの更新は後回し
PGroonga: Keeps fast search
with update
PGroongaは更新しながらでも高速検索を維持
Chat friendly characteristics
チャット向きの特性
28. PGroonga & Zulip Powered by Rabbit 2.2.1
More PGroonga features
PGroongaの機能いろいろ
Query expansion(クエリー展開)
Support synonyms(同義語検索をサポート)
Similar search(類似文書検索)
Find similar messages
類似メッセージ検索
Fuzzy search(あいまい検索)
Stemming(ステミング)