SlideShare a Scribd company logo
1 of 28
Download to read offline
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga
&
Zulip
Kouhei Sutou ClearCode Inc.
Zulip & PGroonga Night
2017-09-06
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga
Pronunciation: píːzí:lúnɡά
読み方:ぴーじーるんが
PostgreSQL extension
PostgreSQLの拡張機能
Fast full text search
高速全文検索機能
All languages are supported!
全言語対応!
PGroonga & Zulip Powered by Rabbit 2.2.1
Fast?(高速?)
Need to measure to confirm
確認するには測定しないと
Targets(測定対象)
textsearch (built-in)(組み込み)
pg_bigm (third party)(外部プロダク
ト)
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroona and textsearch
0
0.2
0.4
0.6
0.8
1
1.2
1.4
PostgreSQL OR MySQL database America
Data: English Wikipedia
(Many records and large docs)
N records: About 5.3millions
Average text size: 6.4KiB
Elapsedtime(ms)
(Shorterisbetter)
Query
PGroonga textsearch
PGroonga & Zulip Powered by Rabbit 2.2.1
As fast as textsearch
textsearchと同じくらいの速さ
textsearch uses word based
full text search
textsearchは単語ベースの全文検索実装
PostgreSQL has enough
performance for the approach
PostgreSQLはこの方法では十分な性能を出せる
PGroonga & Zulip Powered by Rabbit 2.2.1
textsearch and Japanese
textsearchと日本語
Asian languages including
Japanese aren't supported
日本語を含むアジア圏の言語は非サポート
Need plugin(プラグインが必要)
Plugin exists but isn't
maintained
プラグインはあるがメンテナンスされていない
PGroonga & Zulip Powered by Rabbit 2.2.1
Japanese support
日本語対応
Need one of them(どちらかが必要)
N-gram approach support
N-gramというやり方のサポート
Japanese specific word based
approach support
日本語を考慮した単語ベースのやり方のサポート
PGroonga supports both of them
PGroonga & Zulip Powered by Rabbit 2.2.1
PostgreSQL and N-gram
PostgreSQLとN-gram
PostgreSQL is slow with N-
gram approach
PostgreSQLでN-gramというやり方を使うと遅い
N-gram approach:
pg_trgm (contrib)
Japanese isn't supported by default
デフォルトでは日本語に対応していない
pg_bigm (third-party)
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroona and pg_bigm
0
0.5
1
1.5
2
2.5
3
311 14706 20389
Data: Japanese Wikipedia
(Many records and large documents)
N records: About 0.9millions
Average text size: 6.7KiB
Fast Fast
Elapsedtime(sec)
(Lowerisbetter)
N hits
PGroonga pg_bigm
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga is fast stably
PGroongaは安定して速い
PostgreSQL needs "recheck"
for N-gram approach
PostgreSQLはN-gramのときは「recheck」が必要
Seq search after index search
インデックスサーチのあとにシーケンシャルサーチ
PGroonga doesn't need
PGroongaでは必要ない
Only index search
インデックスサーチだけでOK
PGroonga & Zulip Powered by Rabbit 2.2.1
Wrap up
まとめ
textsearch is fast but
Asian langs aren't supported
textsearchは速いけどアジア圏の言語を未サポート
pg_bigm supports Japanese
but is slow for large hits
pg_bigmは日本語対応だがヒット数が多くなると遅い
PGroonga is fast and
supports all languages
PGroongaは速くて全言語対応
PGroonga & Zulip Powered by Rabbit 2.2.1
FYI: textsearch, PGroonga
and Groonga
0
0.2
0.4
0.6
0.8
1
1.2
1.4
PostgreSQL OR MySQL database America
Data: English Wikipedia
(Many records and large docs)
N records: About 5.3millions
Average text size: 6.4KiB
Groonga is 30x faster than others
Elapsedtime(ms)
(Shorterisbetter)
Query
PGroonga Groonga textsearch
PGroonga & Zulip Powered by Rabbit 2.2.1
Zulip and PGroonga
Zulip uses textsearch by
default
Zulipはデフォルトでtextsearchを使用
Japanese isn't supported
日本語非対応
Zulip supports PGroonga as
option
ZulipでPGroongaも使うこともできる
Implemented by me
私が実装
PGroonga & Zulip Powered by Rabbit 2.2.1
Zulip: full text search
Zulipと全文検索
Zulip is chat tool
Zulipはチャットツール
Latency is important for UX
UX的にレイテンシーは大事
Index update is heavy
インデックスの更新は重い
Delay index update
インデックスの更新を後回しにしている
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
CREATE TABLE zerver_message (
rendered_content text,
-- ... ↓Column for full text search
search_tsvector tsvector
); -- ↓Index for full text search
CREATE INDEX zerver_message_search_tsvector
ON zerver_message
USING gin (search_tsvector);
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Execute append_to_fts_update_log() on change
CREATE TRIGGER
zerver_message_update_search_tsvector_async
BEFORE INSERT OR UPDATE OF rendered_content
ON zerver_message
FOR EACH ROW
EXECUTE PROCEDURE append_to_fts_update_log();
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Insert ID to fts_update_log table
CREATE FUNCTION append_to_fts_update_log()
RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
INSERT INTO fts_update_log (message_id)
VALUES (NEW.id);
RETURN NEW;
END
$$;
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Keep ID to be updated
CREATE TABLE fts_update_log (
id SERIAL PRIMARY KEY,
message_id INTEGER NOT NULL
);
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Execute do_notify_fts_update_log()
-- on INSERT
CREATE TRIGGER fts_update_log_notify
AFTER INSERT ON fts_update_log
FOR EACH STATEMENT
EXECUTE PROCEDURE
do_notify_fts_update_log();
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- NOTIFY to fts_update_log channel!
CREATE FUNCTION do_notify_fts_update_log()
RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
NOTIFY fts_update_log;
RETURN NEW;
END
$$;
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
cursor.execute("LISTEN ftp_update_log") # Wait
cursor.execute("SELECT id, message_id FROM fts_update_log")
ids = []
for (id, message_id) in cursor.fetchall():
cursor.execute("UPDATE zerver_message SET search_tsvector = "
"to_tsvector('zulip.english_us_search', "
"rendered_content) "
"WHERE id = %s", (message_id,))
ids.append(id)
cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)",
(ids,))
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga: index update
PGroongaとインデックス更新
PGroonga's index update is
fast too
PGroongaはインデックス更新も速い
PGroonga's search while
index update is still fast
PGroongaはインデックス更新中の検索も速い
PGroonga & Zulip Powered by Rabbit 2.2.1
Perf characteristics
性能の傾向
Searchthroughput
Update throughput
PGroonga
Searchthroughput
Update throughput
GIN
Keep
search performance
while many updates
Decrease
search performance
while updating
PGroonga & Zulip Powered by Rabbit 2.2.1
Update and lock
更新とロック
Update without read locks
参照ロックなしで更新
Write locks are required
書き込みロックは必要
PGroonga & Zulip Powered by Rabbit 2.2.1
GIN: Read/Write
GIN:読み書き
Conn1
Conn2
INSERT
start
SELECT
start
Blocked
INSERT
finish
SELECT
finish
GIN
Slow down!
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga: Read/Write
PGroonga:読み書き
Conn1
Conn2
INSERT
start
SELECT
start
INSERT
finish
SELECT
finish
PGroonga
No slow down!
PGroonga & Zulip Powered by Rabbit 2.2.1
Wrap up
まとめ
Zulip: Low latency for UX
ZulipはUXのために低レイテンシーをがんばっている
Delay index update
インデックスの更新は後回し
PGroonga: Keeps fast search
with update
PGroongaは更新しながらでも高速検索を維持
Chat friendly characteristics
チャット向きの特性
PGroonga & Zulip Powered by Rabbit 2.2.1
More PGroonga features
PGroongaの機能いろいろ
Query expansion(クエリー展開)
Support synonyms(同義語検索をサポート)
Similar search(類似文書検索)
Find similar messages
類似メッセージ検索
Fuzzy search(あいまい検索)
Stemming(ステミング)

More Related Content

Similar to PGroonga & Zulip

Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
 
Gophers Riding Elephants: Writing PostgreSQL tools in Go
Gophers Riding Elephants: Writing PostgreSQL tools in GoGophers Riding Elephants: Writing PostgreSQL tools in Go
Gophers Riding Elephants: Writing PostgreSQL tools in GoAJ Bahnken
 
Introduction to pig.
Introduction to pig.Introduction to pig.
Introduction to pig.Triloki Gupta
 
Pro PostgreSQL, OSCon 2008
Pro PostgreSQL, OSCon 2008Pro PostgreSQL, OSCon 2008
Pro PostgreSQL, OSCon 2008Robert Treat
 
Globalization autdi for Fedora Atomic
Globalization autdi for Fedora AtomicGlobalization autdi for Fedora Atomic
Globalization autdi for Fedora AtomicPravin Satpute
 
Puppet DSL: back to the basics
Puppet DSL: back to the basicsPuppet DSL: back to the basics
Puppet DSL: back to the basicsJulien Pivotto
 
Useful PostgreSQL Extensions
Useful PostgreSQL ExtensionsUseful PostgreSQL Extensions
Useful PostgreSQL ExtensionsEDB
 
Scaling Django with gevent
Scaling Django with geventScaling Django with gevent
Scaling Django with geventMahendra M
 

Similar to PGroonga & Zulip (9)

Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
 
Pig
PigPig
Pig
 
Gophers Riding Elephants: Writing PostgreSQL tools in Go
Gophers Riding Elephants: Writing PostgreSQL tools in GoGophers Riding Elephants: Writing PostgreSQL tools in Go
Gophers Riding Elephants: Writing PostgreSQL tools in Go
 
Introduction to pig.
Introduction to pig.Introduction to pig.
Introduction to pig.
 
Pro PostgreSQL, OSCon 2008
Pro PostgreSQL, OSCon 2008Pro PostgreSQL, OSCon 2008
Pro PostgreSQL, OSCon 2008
 
Globalization autdi for Fedora Atomic
Globalization autdi for Fedora AtomicGlobalization autdi for Fedora Atomic
Globalization autdi for Fedora Atomic
 
Puppet DSL: back to the basics
Puppet DSL: back to the basicsPuppet DSL: back to the basics
Puppet DSL: back to the basics
 
Useful PostgreSQL Extensions
Useful PostgreSQL ExtensionsUseful PostgreSQL Extensions
Useful PostgreSQL Extensions
 
Scaling Django with gevent
Scaling Django with geventScaling Django with gevent
Scaling Django with gevent
 

More from Kouhei Sutou

RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowRubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowKouhei Sutou
 
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Kouhei Sutou
 
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowRubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowKouhei Sutou
 
Rubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェアRubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェアKouhei Sutou
 
Apache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかApache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかKouhei Sutou
 
Apache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory dataApache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory dataKouhei Sutou
 
Redmine検索の未来像
Redmine検索の未来像Redmine検索の未来像
Redmine検索の未来像Kouhei Sutou
 
Apache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory dataApache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory dataKouhei Sutou
 
Better CSV processing with Ruby 2.6
Better CSV processing with Ruby 2.6Better CSV processing with Ruby 2.6
Better CSV processing with Ruby 2.6Kouhei Sutou
 
Apache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームApache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームKouhei Sutou
 
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムMySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムKouhei Sutou
 
MySQL 8.0でMroonga
MySQL 8.0でMroongaMySQL 8.0でMroonga
MySQL 8.0でMroongaKouhei Sutou
 
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!Kouhei Sutou
 
MariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システムMariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システムKouhei Sutou
 
PGroonga 2 - PostgreSQLでの全文検索の決定版
PGroonga 2 - PostgreSQLでの全文検索の決定版PGroonga 2 - PostgreSQLでの全文検索の決定版
PGroonga 2 - PostgreSQLでの全文検索の決定版Kouhei Sutou
 

More from Kouhei Sutou (20)

RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowRubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
 
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
 
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowRubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
 
Rubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェアRubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェア
 
Apache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかApache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのか
 
Apache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory dataApache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory data
 
Apache Arrow 2019
Apache Arrow 2019Apache Arrow 2019
Apache Arrow 2019
 
Redmine検索の未来像
Redmine検索の未来像Redmine検索の未来像
Redmine検索の未来像
 
Apache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory dataApache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory data
 
Better CSV processing with Ruby 2.6
Better CSV processing with Ruby 2.6Better CSV processing with Ruby 2.6
Better CSV processing with Ruby 2.6
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
Apache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームApache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォーム
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムMySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
 
MySQL 8.0でMroonga
MySQL 8.0でMroongaMySQL 8.0でMroonga
MySQL 8.0でMroonga
 
My way with Ruby
My way with RubyMy way with Ruby
My way with Ruby
 
Red Data Tools
Red Data ToolsRed Data Tools
Red Data Tools
 
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
 
MariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システムMariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システム
 
PGroonga 2 - PostgreSQLでの全文検索の決定版
PGroonga 2 - PostgreSQLでの全文検索の決定版PGroonga 2 - PostgreSQLでの全文検索の決定版
PGroonga 2 - PostgreSQLでの全文検索の決定版
 

Recently uploaded

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 

Recently uploaded (20)

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 

PGroonga & Zulip

  • 1. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga & Zulip Kouhei Sutou ClearCode Inc. Zulip & PGroonga Night 2017-09-06
  • 2. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga Pronunciation: píːzí:lúnɡά 読み方:ぴーじーるんが PostgreSQL extension PostgreSQLの拡張機能 Fast full text search 高速全文検索機能 All languages are supported! 全言語対応!
  • 3. PGroonga & Zulip Powered by Rabbit 2.2.1 Fast?(高速?) Need to measure to confirm 確認するには測定しないと Targets(測定対象) textsearch (built-in)(組み込み) pg_bigm (third party)(外部プロダク ト)
  • 4. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroona and textsearch 0 0.2 0.4 0.6 0.8 1 1.2 1.4 PostgreSQL OR MySQL database America Data: English Wikipedia (Many records and large docs) N records: About 5.3millions Average text size: 6.4KiB Elapsedtime(ms) (Shorterisbetter) Query PGroonga textsearch
  • 5. PGroonga & Zulip Powered by Rabbit 2.2.1 As fast as textsearch textsearchと同じくらいの速さ textsearch uses word based full text search textsearchは単語ベースの全文検索実装 PostgreSQL has enough performance for the approach PostgreSQLはこの方法では十分な性能を出せる
  • 6. PGroonga & Zulip Powered by Rabbit 2.2.1 textsearch and Japanese textsearchと日本語 Asian languages including Japanese aren't supported 日本語を含むアジア圏の言語は非サポート Need plugin(プラグインが必要) Plugin exists but isn't maintained プラグインはあるがメンテナンスされていない
  • 7. PGroonga & Zulip Powered by Rabbit 2.2.1 Japanese support 日本語対応 Need one of them(どちらかが必要) N-gram approach support N-gramというやり方のサポート Japanese specific word based approach support 日本語を考慮した単語ベースのやり方のサポート PGroonga supports both of them
  • 8. PGroonga & Zulip Powered by Rabbit 2.2.1 PostgreSQL and N-gram PostgreSQLとN-gram PostgreSQL is slow with N- gram approach PostgreSQLでN-gramというやり方を使うと遅い N-gram approach: pg_trgm (contrib) Japanese isn't supported by default デフォルトでは日本語に対応していない pg_bigm (third-party)
  • 9. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroona and pg_bigm 0 0.5 1 1.5 2 2.5 3 311 14706 20389 Data: Japanese Wikipedia (Many records and large documents) N records: About 0.9millions Average text size: 6.7KiB Fast Fast Elapsedtime(sec) (Lowerisbetter) N hits PGroonga pg_bigm
  • 10. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga is fast stably PGroongaは安定して速い PostgreSQL needs "recheck" for N-gram approach PostgreSQLはN-gramのときは「recheck」が必要 Seq search after index search インデックスサーチのあとにシーケンシャルサーチ PGroonga doesn't need PGroongaでは必要ない Only index search インデックスサーチだけでOK
  • 11. PGroonga & Zulip Powered by Rabbit 2.2.1 Wrap up まとめ textsearch is fast but Asian langs aren't supported textsearchは速いけどアジア圏の言語を未サポート pg_bigm supports Japanese but is slow for large hits pg_bigmは日本語対応だがヒット数が多くなると遅い PGroonga is fast and supports all languages PGroongaは速くて全言語対応
  • 12. PGroonga & Zulip Powered by Rabbit 2.2.1 FYI: textsearch, PGroonga and Groonga 0 0.2 0.4 0.6 0.8 1 1.2 1.4 PostgreSQL OR MySQL database America Data: English Wikipedia (Many records and large docs) N records: About 5.3millions Average text size: 6.4KiB Groonga is 30x faster than others Elapsedtime(ms) (Shorterisbetter) Query PGroonga Groonga textsearch
  • 13. PGroonga & Zulip Powered by Rabbit 2.2.1 Zulip and PGroonga Zulip uses textsearch by default Zulipはデフォルトでtextsearchを使用 Japanese isn't supported 日本語非対応 Zulip supports PGroonga as option ZulipでPGroongaも使うこともできる Implemented by me 私が実装
  • 14. PGroonga & Zulip Powered by Rabbit 2.2.1 Zulip: full text search Zulipと全文検索 Zulip is chat tool Zulipはチャットツール Latency is important for UX UX的にレイテンシーは大事 Index update is heavy インデックスの更新は重い Delay index update インデックスの更新を後回しにしている
  • 15. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update インデックス更新を後回し CREATE TABLE zerver_message ( rendered_content text, -- ... ↓Column for full text search search_tsvector tsvector ); -- ↓Index for full text search CREATE INDEX zerver_message_search_tsvector ON zerver_message USING gin (search_tsvector);
  • 16. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update インデックス更新を後回し -- Execute append_to_fts_update_log() on change CREATE TRIGGER zerver_message_update_search_tsvector_async BEFORE INSERT OR UPDATE OF rendered_content ON zerver_message FOR EACH ROW EXECUTE PROCEDURE append_to_fts_update_log();
  • 17. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update インデックス更新を後回し -- Insert ID to fts_update_log table CREATE FUNCTION append_to_fts_update_log() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN INSERT INTO fts_update_log (message_id) VALUES (NEW.id); RETURN NEW; END $$;
  • 18. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update インデックス更新を後回し -- Keep ID to be updated CREATE TABLE fts_update_log ( id SERIAL PRIMARY KEY, message_id INTEGER NOT NULL );
  • 19. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update インデックス更新を後回し -- Execute do_notify_fts_update_log() -- on INSERT CREATE TRIGGER fts_update_log_notify AFTER INSERT ON fts_update_log FOR EACH STATEMENT EXECUTE PROCEDURE do_notify_fts_update_log();
  • 20. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update インデックス更新を後回し -- NOTIFY to fts_update_log channel! CREATE FUNCTION do_notify_fts_update_log() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN NOTIFY fts_update_log; RETURN NEW; END $$;
  • 21. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update インデックス更新を後回し cursor.execute("LISTEN ftp_update_log") # Wait cursor.execute("SELECT id, message_id FROM fts_update_log") ids = [] for (id, message_id) in cursor.fetchall(): cursor.execute("UPDATE zerver_message SET search_tsvector = " "to_tsvector('zulip.english_us_search', " "rendered_content) " "WHERE id = %s", (message_id,)) ids.append(id) cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)", (ids,))
  • 22. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga: index update PGroongaとインデックス更新 PGroonga's index update is fast too PGroongaはインデックス更新も速い PGroonga's search while index update is still fast PGroongaはインデックス更新中の検索も速い
  • 23. PGroonga & Zulip Powered by Rabbit 2.2.1 Perf characteristics 性能の傾向 Searchthroughput Update throughput PGroonga Searchthroughput Update throughput GIN Keep search performance while many updates Decrease search performance while updating
  • 24. PGroonga & Zulip Powered by Rabbit 2.2.1 Update and lock 更新とロック Update without read locks 参照ロックなしで更新 Write locks are required 書き込みロックは必要
  • 25. PGroonga & Zulip Powered by Rabbit 2.2.1 GIN: Read/Write GIN:読み書き Conn1 Conn2 INSERT start SELECT start Blocked INSERT finish SELECT finish GIN Slow down!
  • 26. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga: Read/Write PGroonga:読み書き Conn1 Conn2 INSERT start SELECT start INSERT finish SELECT finish PGroonga No slow down!
  • 27. PGroonga & Zulip Powered by Rabbit 2.2.1 Wrap up まとめ Zulip: Low latency for UX ZulipはUXのために低レイテンシーをがんばっている Delay index update インデックスの更新は後回し PGroonga: Keeps fast search with update PGroongaは更新しながらでも高速検索を維持 Chat friendly characteristics チャット向きの特性
  • 28. PGroonga & Zulip Powered by Rabbit 2.2.1 More PGroonga features PGroongaの機能いろいろ Query expansion(クエリー展開) Support synonyms(同義語検索をサポート) Similar search(類似文書検索) Find similar messages 類似メッセージ検索 Fuzzy search(あいまい検索) Stemming(ステミング)