SlideShare a Scribd company logo
1 of 105
Download to read offline
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
PGroonga 2Make PostgreSQL rich full text
search system backend!
Kouhei Sutou ClearCode Inc.
PGConf.ASIA 2017
2017-12-05
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Targets
対象者
Want to implement full text
search with PostgreSQL
PostgreSQLで全文検索したい
Not good at full text search
全文検索はよく知らない
PGroonga 1.0.0 users
PGroonga 1.0.0は使ったことがある
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Abbreviations
略語
PG: PostgreSQL
ポスグレ: PostgreSQL
FTS: Full text search
FTS: 全文検索
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS system: Targets
全文検索システム:対象
Many tests大量のテキスト
e.g.: Text data in office
docs in file servers
例:ファイルサーバー内のオフィス文書内のテキスト
e.g.: Item descriptions,
chat logs, Wiki data, ...
例:商品説明やチャットログ、Wikiのデータなど
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS system: Goal
全文検索システム:目的
Provide
needed info
when you need必要な情報を必要なときに提供すること
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Provide needed info
必要な情報を提供
Not found
探している情報が見つからない
Found
探している情報が見つかる
Found unconscious needed
info too!
意識していなかったけど実は欲しかった情報も見つか
る!
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
When you need
必要なときに活用
Need many times to find
なかなか見つからない
Find in no time
すぐに見つかる
Already found
すでに見つかっていた
e.g.: Recommendation
例:レコメンデーション
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
How to impl.: Options
実装方法:選択肢
Use FTS server
全文検索サーバーを使う
Use PostgreSQL
PostgreSQLを使う
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS server: Pros
全文検索サーバー案:メリット
Provides all basic features
必要な機能が揃っている
Provides advanced features
+αの機能もある
Fast
速い
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS server: Cons1
全文検索サーバー案:デメリット1
Large implementation cost
実装コスト大
Learn how to use from scratch
使い方を1から学ぶ必要がある
How to implement data sync?
マスターデータの同期はどうする?
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS server: Cons2
全文検索サーバー案:デメリット2
Large maintenance cost
メンテナンスコスト大
Learn how to operate from
scratch
運用方法を1から学ぶ必要がある
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
PostgreSQL: Pros1
PostgreSQL案:メリット1
Less implementation cost
実装コスト小
Less things to be learned
新しく覚えることが少ない
Can manage data at the same
place
データの一元管理
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
PostgreSQL: Pros2
PostgreSQL案:メリット2
Less operation cost
メンテナンスコスト小
The current operation knowledge
is reusable
既存の運用ノウハウを使える
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
PostgreSQL: Cons
PostgreSQL案:デメリット
Built-in features aren't
enough
組込機能では機能不足
SQL limits efficiency
SQLの表現力不足
e.g.: SQL needs multiple
queries for a process that can
be done by 1 query by FTS server
例:全文検索サーバーなら1クエリーで実現できる処
理にSQLだと複数クエリー必要なことがある
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
The 3rd option
第3の選択肢
Use FTS engine via
PostgreSQL (SQL)
PostgreSQL経由(SQL)で全文検索エンジンを使う
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Pros
メリット
Fast and rich features
高速で豊富な機能
Less implementation cost
実装コスト小
Less operation cost
メンテナンスコスト小
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Cons
デメリット
Need PostgreSQL extension
PostgreSQLに拡張機能が必要
Not available on DBaaS
DBaaSで使えない
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Option: No FTS knowledge
オススメの選択肢:全文検索の知識ナシ
Need only simple features
まだ単純な機能で十分
Less data: LIKE with PostgreSQL
データ少:PostgreSQLでLIKE
Need up-to-date FTS features
いまどきの全文検索機能が必要
FTS engine via PostgreSQL
PostgreSQL経由で全文検索エンジン
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Option: With FTS knowledge
オススメの選択肢:全文検索の知識アリ
Need tuned FTS feature
カリカリにチューニングしたい
PostgreSQL + FTS server
PostgreSQL+全文検索サーバー
Others
それ以外
FTS engine via PostgreSQL
PostgreSQL経由で全文検索エンジン
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Described option
説明する選択肢
FTS engine via
PostgreSQLPostgreSQL経由で全文検索エンジン
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS engine: Groonga
全文検索エンジン:Groonga(ぐるんが)
Embeddable FTS engine
組込可能な全文検索エンジン
PGroonga: Groonga in PostgreSQL
PGroonga:PostgreSQLに組込
Usable as FTS server
全文検索サーバーとして単独でも使用可能
PostgreSQL + FTS server
architecture is also available
PostgreSQL+全文検索サーバー構成もできる
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Groonga's hobby: data update
Groongaの得意な事:データの追加・更新
Make fresh data searchable!
新鮮な情報をすぐ検索可能!
Batch update is needless
バッチで更新しなくてもよい
Can use as chat backend
チャットくらいの頻度でもOK
e.g.: Zulip uses PGroonga
例:ZulipはPGroongaを採用
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Groonga's hobby: data update
Groongaの得意な事:データの追加・更新
Keep search performance
while updating!
更新中も検索性能が落ちない!
Updatable when there are many
search users
利用ユーザーが多い時でも更新可能
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
PGroonga
PGroonga(ぴーじーるんが)
PostgreSQL index
PostgreSQLのインデックス
Alternative of GIN, RUM, ...
GIN・RUMなどと同じレイヤー
Usage
使用方法
CREATE INDEX ...
USING PGroonga ...
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
PostgreSQL and FTS
PostgreSQLと全文検索
LIKE: Built-in(組込機能)
textsearch: Built-in(組込機能)
pg_trgm: Contrib(標準添付)
Bundled in the archive
アーカイブには含まれている
Need to install separately
別途インストールすれば使える
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
LIKE and performance
LIKEと速度
Small data
少ないデータ
Enough performance
十分実用的
Not small data
少なくないデータ
Need to tune
性能問題アリ
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
LIKE and FTS system
LIKEと全文検索システム
Enough performance
in most case
速度が実用的なことも多い
Data are small in many case
少ないデータなら
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
LIKE and FTS system
LIKEと全文検索システム
Unable to sort
それっぽい順のソート不可
Sort is important in FTS
全文検索ではソート順が重要
Users check only
the first N entries
ユーザーは先頭N件しか見ない
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
textsearch
Fast search by index
インデックスを作るので速い
Need module for each lang
言語毎にモジュールが必要
Modules for English,
French, ... are built-in
英語やフランス語などは組込
Modules for languages in Asia
aren't maintained
アジア圏の言語用のモジュールはメンテされていない
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
pg_trgm
Fast search by index
インデックスを作るので速い
Asian languages aren't
enough supported
アジア圏の言語のサポートは十分ではない
Unable to sort
それっぽい順のソート不可
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
RUM
RUM = GIN + position
RUMは位置情報付きのGIN
https://github.com/postgrespro/rum
pg_trgm/pg_bigm are slow
for much matches case
pg_trgmとpg_bigmはマッチ数が多いと遅くなる
RUM will solve it
GINの代わりにRUMを使うことで解決できるかも!
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
PGroonga
Fast search by index
インデックスを作るので速い
Sortable
それっぽい順のソート可
Support all languages
全言語対応
Need to install
separately
別途インストールする必要アリ
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS system with PostgreSQL
PostgreSQLで全文検索システム
PGroonga is the best!
PGroongaがベスト!
PGroonga
Fast(高速)
Support all langs(全言語対応)
Sortable(それっぽい順でソート可)
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS system: Basic features
全文検索システム:基本機能
Fast FTS + sort
高速全文検索+ソート
Show texts around keyword
キーワード周辺テキスト表示
Highlight keyword
検索キーワードハイライト
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS system: Adv. features
全文検索システム:高度な機能
Auto complete
オートコンプリート
Similar search
類似文書検索
Synonym expansion
同義語展開
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
PGroonga 1.0.0
↓ are only supported
以下の機能のみ対応
Fast FTS + sort
高速全文検索+ソート
Show texts around keyword
キーワード周辺テキスト表示
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
PGroonga 2
All features
are supported!全機能対応!
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
PGroonga 1.0.0 → 2
Many new features
たくさんの新機能
Improve performance
性能改善
API is changed
APIが変わった
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
API change
API変更
Operator is changed
演算子変更
@@ → &@~
%% → &@
...
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
API change
API変更
pgroonga schema is deprecated
pgroongaスキーマを非推奨に
pgroonga.score → pgroonga_score
pgroonga.flush → pgroonga_flush
...
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
App for PGroonga 1.0.0
PGroonga 1.0.0用アプリ
Broken with PGroonga 2?
PGroonga 2では動かない?
No! Work without any changes!
何も変更しなくても動くよ!
Great! But why?
いいじゃん!でもなんで動くの?
↓
"Painless upgrade" technique
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Painless upgrade
PGroonga 2 provides
both 1 API and 2 API
PGroonga 2は1用のAPIも2用のAPIも両方提供
Can use PGroonga 2 with 1 API
PGroonga 1のAPIでPGroonga 2を使える
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Painless upgrade
The last PGroonga 1.X
provides both 1 API and
partially 2 API
PGroonga 1系の最終版は1用のAPIも2用のAPIの一部も提
供
Can use PGroonga 1 with 2 API
PGroonga 2のAPIでPGroonga 1を使える
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Painless upgrade
PGroonga 2 keeps 1 API
PGroonga 2の間は1のAPIを維持
PGroonga 3 will drop 1 API
PGroonga 3で1のAPIを削除予定
Just need to upgrade API until 3
PGroonga 3までにAPIをアップグレードすればよい
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Painless upgrade
App for PGroonga 1.0.0
doesn't work with PGroonga 2
PGroonga 1.0.0用のアプリがPGroonga 2で動かない
It's a bug. Please report it!
バグなので報告してね!
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS system: Basic features
全文検索システム:基本機能
Fast FTS + sort
高速全文検索+ソート
Show texts around keyword
キーワード周辺テキスト表示
Highlight keyword
検索キーワードハイライト
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Fast FTS + sort
高速全文検索+ソート
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Table definition
CREATE TABLE entries (
-- Need primary key
-- It's needed for sort
id integer PRIMARY KEY,
title text,
content text
);
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Index definition
-- For FTS.
-- The default is good enough!
CREATE INDEX entries_full_text_search
ON entries
-- "USING PGroonga" is important!
-- Primary key is for sort!
USING PGroonga (id, title, content);
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Insert data
-- Normal INSERT.
INSERT INTO entries
VALUES (1,
'Fast FTS with Groonga!',
'Fast FTS is needed!');
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS
SELECT title FROM entries
WHERE
-- &@~ is for FTS
-- AND search with "search" and "fast"
title &@~ 'search fast' OR
content &@~ 'search fast';
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS: LIKE
SELECT title FROM entries
WHERE
-- Index search for LIKE is supported
-- = Improve app perf without any changes
-- NOTE: &@~ is faster than LIKE
title LIKE '%search%' OR
content LIKE '%search%';
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Sort
SELECT
title,
-- pgroonga_score(TABLE_NAME) returns
-- precision as number
pgroonga_score(entries) AS score
FROM entries
WHERE -- ...
-- Sort by precision
ORDER BY score DESC LIMIT 10;
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Highlight keyword
キーワードハイライト
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Hightlight for HTML
SELECT
pgroonga_highlight_html(
title,
-- Extract keywords from query
pgroonga_query_extract_keywords('search fast'))
FROM entries
WHERE title &@~ 'search fast' OR
content &@~ 'search fast';
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Highlight for HTML: Example
Fast search with <Groonga>!
↓
<span class="keyword">Fast</span>
↑↓ Keywords are marked up with "class"
<span class="keyword">search</span>!
with &lt;Groonga&gt;! ← Escape tag
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Texts around keyword
キーワード周辺テキスト
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Texts around keyword for HTML
SELECT
pgroonga_snippet_html(
content,
-- Extract keywords from query
pgroonga_query_extract_keywords('search fast'))
FROM entries
WHERE title &@~ 'search fast' OR
content &@~ 'search fast';
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Example
...fast search with <Groonga>!...
↓
ARRAY[
↓ First
'<span class="keyword">fast</span>
↑↓ Keywords are marked up with "class"
<span class="keyword">search/span>!
with &lt;Groonga&gt;!', ← Escape tag
'...' ← Second
]
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS system: Adv. features
全文検索システム:高度な機能
Auto complete
オートコンプリート
Similar search
類似文書検索
Synonym expansion
同義語展開
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Auto complete
オートコンプリート
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Auto complete: Preparation
オートコンプリート:準備
Master table
マスターテーブル
Candidate
候補:(例:牛乳)
Readings in Katakana
(Only for Japanese)
ヨミ(日本語の場合。カタカナ。複数登録可。)
例:ギュウニュウ・ミルク
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Auto complete: Implementation
オートコンプリート:実装方法
OR search with ...
Prefix search against readings
(Only for Japanese)
ヨミを前方一致検索(日本語の場合。)
Loose FTS against candidate
候補をゆるく全文検索
Sort by candidate
候補でソート
https://pgroonga.github.io/how-to/auto-complete.html
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Table definition
CREATE TABLE terms (
term text, -- Candidate
readings text[], -- Readings
);
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Data example
INSERT INTO terms VALUES (
'milk', -- Candidate
ARRAY[
-- Reading in Katakana
'ギュウニュウ', -- "milk" in Japanese
-- Multiple readings
'ミルク' -- "milk" in Katakana
]
);
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Data management
データ管理
Easy to maintain because
it's a normal table
普通のテーブルなので管理が楽
Easy to insert/delete/update
追加・削除・更新が楽
Normal backup and replication
ダンプ・リストアもレプリケーションもいつも通り
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Index for prefix search
前方一致用インデックス
CREATE INDEX prefix_search ON terms
USING PGroonga
-- ...text_array_term_search...
(readings
pgroonga_text_array_term_search_ops_v2);
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Index for loose FTS
緩い全文検索用インデックス
CREATE INDEX loose_search ON terms
USING PGroonga (term)
-- Tokenizer for loose full text search
WITH (tokenizer='TokenBigramSplitSymbolAlphaDigit');
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
How to search
検索方法
SELECT term FROM terms
-- Prefix search against readings
WHERE readings &^~ '${INPUT}' OR
-- Loose full text search
term &@ '${INPUT}'
ORDER BY term LIMIT 10; -- Sort
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Search example: Candidate
検索例:候補
-- User inputs "il"
SELECT term FROM terms
-- Prefix search against readings
WHERE readings &^~ 'il' OR
-- Loose full text search (Hit)
term &@ 'li'
ORDER BY term LIMIT 10; -- Sort
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Search example: Katakana
検索例:カタカナ
-- User inputs "ギュウ"
SELECT term FROM terms
-- Prefix search against readings (Hit)
WHERE readings &^~ 'ギュウ' OR
-- Loose full text search
term &@ 'ギュウ'
ORDER BY term LIMIT 10; -- Sort
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Search example: Hiragana
検索例:ひらがな
-- User inputs "ぎゅう"
SELECT term FROM terms
-- Prefix search against readings (Hit)
WHERE readings &^~ 'ぎゅう' OR
-- Loose full text search
term &@ 'ぎゅう'
ORDER BY term LIMIT 10; -- Sort
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Search example: Romaji
検索例:ローマ字
-- User inputs "gyu"
SELECT term FROM terms
-- Prefix search against readings (Hit)
WHERE readings &^~ 'gyu' OR
-- Loose full text search
term &@ 'gyu'
ORDER BY term LIMIT 10; -- Sort
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Synonym expansion
同義語展開
Synonym
同義語
Same mean but different notation
同じ意味だが表記が異なる語
e.g.: "PostgreSQL" and "PG"
例:「PostgreSQL」と「ポスグレ」
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Synonym expansion
同義語展開
Users don't want to care
ユーザーは気にしたくない
Synonym expansion
同義語展開
OR search with all synonyms
同義語すべてでOR検索
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Implementation
実装方法
Create synonym table
同義語管理テーブルを作成
Expand synonyms in query
クエリー内の同義語を展開
Search by expanded query
展開後のクエリーで検索
https://pgroonga.github.io/reference/functions/
pgroonga-query-expand.html
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Table definition
CREATE TABLE synonyms (
-- Term to be expanded
term text,
-- Synonym list.
-- Including the "term" itself.
-- If you don't input the "term",
-- the "term" is unsearchable term.
terms text[]
);
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Data example
INSERT INTO synonyms
VALUES ('PostgreSQL', -- Expand "PostgreSQL"
ARRAY['PostgreSQL', 'PG']),
('PG', -- Expand "PG"
ARRAY['PG', 'PostgreSQL']);
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Data management
データ管理
Easy to maintain because
it's a normal table
普通のテーブルなので管理が楽
Easy to insert/delete/update
追加・削除・更新が楽
Normal backup and replication
ダンプ・リストアもレプリケーションもいつも通り
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Index definition
CREATE INDEX synonym_search ON synonyms
USING PGroonga
-- ...text_term_search...
-- For equal search
(term pgroonga_text_term_search_ops_v2);
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Confirm
確認方法
SELECT pgroonga_query_expand(
'synonyms', -- Table name
'term', -- Column name to be expanded
'terms', -- Column name for synonyms
'PostgreSQL' -- Query
);
-- '((PostgreSQL) OR (PG))'
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Search
検索方法
SELECT title FROM entries
WHERE
-- title &@~ 'DB ((PostgreSQL) OR (PG))'
title &@~
pgroonga_query_expand('synonyms',
'term',
'terms',
'DB PostgreSQL');
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Similar search
類似文書検索
Query is document itself
検索クエリーは文書そのもの
Not keyword
キーワードではない
Use case
利用例
Show related entries
関連エントリーの提示に使える
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Implementation
実現方法
Create dedicated index
類似検索用のインデックスを作る
Use tokenizer for target
language
対象の言語に合わせた処理で精度向上
e.g.: MeCab based tokenizer for
Japanese
例:日本語ならMeCabベースのトークナイザーを活用
Use dedicated operator
類似検索用の演算子を使う
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Index definition
CREATE INDEX entries_similar_search
ON entries
-- Target: Both title and content
-- Reason: Title is important
USING PGroonga (id, (title || ' ' || content))
-- TokenMecab is good for Japanese
WITH (tokenizer='TokenMecab');
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Search
SELECT title,
pgroonga_score(entries) AS score
FROM entries
WHERE
-- &@* is operator for similar search.
-- Search with existing document.
(title || ' ' || content) &@*
'...fast search with Groonga!...'
ORDER BY score DESC LIMIT 3;
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Result example
結果例
Query:
...search with Groonga!...
Hit example:
...search with PGroonga!...
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Wrap up: Basic features
全文検索システム:基本機能
Fast FTS + sort
高速全文検索+ソート
Show texts around keyword
キーワード周辺テキスト表示
Highlight keyword
検索キーワードハイライト
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Wrap up: Adv. features
全文検索システム:高度な機能
Auto complete
オートコンプリート
Similar search
類似文書検索
Synonym expansion
同義語展開
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
FTS system: Next step
全文検索システム:次の一歩
Support structured data
構造化データ対応
Office document, HTML, ...
オフィス文書・HTMLなど
Needed features
対応に必要な処理
Text/metadata extraction
テキスト・メタデータ抽出
Create screenshot
スクリーンショット作成
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Extraction tool
抽出ツール
Apache Tika
Apache Lucene's subproject
Many supported formats
対応フォーマットが多い
ChupaText
Groonga's subproject
Screenshot support
スクリーンショット作成対応
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
ChupaText
Supported formats(対応フォーマット)
Word/Excel/PowerPoint
ODT/ODS/ODP(OpenDocument)
PDF/HTML/XML/CSV/...
Interface(インターフェイス)
HTTP and command line
HTTPとコマンドライン
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Install
インストール
Use Docker or Vagrant
DockerかVagrantを使うのが楽
https://github.com/ranguba/chupa-text-docker
https://github.com/ranguba/chupa-text-vagrant
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
ChupaText:Docker
% GITHUB=https://github.com
% git clone 
${GITHUB}/ranguba/chupa-text-docker.git
% cd chupa-text-docker
% docker-compose up --build
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Usage
使い方
% curl 
--form data=@XXX.pdf 
http://localhost:20080/extraction.json
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Result example
結果例
{
"mime-type": "application/pdf", # MIME type for the original data
"size": 147159, # Metadata
...,
"texts": [ # Extracted texts
{
"mime-type": "text/plain", # MIME type for the extracted data
...,
"creator": "Adobe Illustrator CS3", # Metadata
"body": "This is sample PDF. ...", # Extracted text
"screenshot": {
"mime-type": "image/png", # MIME type for screenshot
"data": "iVBORw...", # Base64-ed image data
"encoding": "base64"
}
}
]
}
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Web UI
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Web UI: Extraction example
Web UI:抽出例
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Web UI: Extraction example
Web UI:抽出例
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
ChupaText:Vagrant
% GITHUB=https://github.com
% git clone 
${GITHUB}/ranguba/chupa-text-vagrant.git
% cd chupa-text-vagrant
% vagrant up
Usage is the same as Docker's
使い方はDocker版と同じ
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Use cases(活用例)
Extracted text
Insert into PGroonga
Extracted metadata
Insert into PGroonga
Use for condition(絞り込みに活用)
Created screenshot
Show in search result(検索結果で表
示)
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Wrap up
まとめ
FTS engine via PostgreSQL
PostgreSQL経由で全文検索エンジン
Provide decision info
採用の判断材料を提供
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Wrap up
まとめ
Show how to impl. FTS system
全文検索システム実装例を紹介
PGroonga
PGroonga 1.0.0 and 2
PGroonga 1.0.0と2
Painless upgrade
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Wrap up
まとめ
Show how to support
structured data
構造化データの対応方法を紹介
ChupaText
PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2
Support service
サポートサービス紹介
Install support(導入支援)
設計支援・性能検証・移行支援・…
Development support(開発支援)
サンプルコード提供・問い合わせ対応・…
Operation support(運用支援)
障害対応・チューニング支援・…
Contact(問い合わせ先)
https://www.clear-code.com/contact/?
type=groonga

More Related Content

Similar to PGroonga 2 – Make PostgreSQL rich full text search system backend!

Using Thinking Sphinx with rails
Using Thinking Sphinx with railsUsing Thinking Sphinx with rails
Using Thinking Sphinx with railsRishav Dixit
 
Nancy CLI. Automated Database Experiments
Nancy CLI. Automated Database ExperimentsNancy CLI. Automated Database Experiments
Nancy CLI. Automated Database ExperimentsNikolay Samokhvalov
 
Database Tools by Skype
Database Tools by SkypeDatabase Tools by Skype
Database Tools by Skypeelliando dias
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
 
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseNikolay Samokhvalov
 
OSMC 2008 | PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
OSMC 2008 |  PostgreSQL Monitoring - Introduction, Internals And Monitoring S...OSMC 2008 |  PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
OSMC 2008 | PostgreSQL Monitoring - Introduction, Internals And Monitoring S...NETWAYS
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화NAVER D2
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화Henry Jeong
 
Islamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningIslamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningUmair Shahid
 
Islamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningIslamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningUmair Shahid
 
groonga with PostgreSQL
groonga with PostgreSQLgroonga with PostgreSQL
groonga with PostgreSQLAkihiro Okuno
 
Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-
Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-
Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-Hiroshi Yuki
 
Capistrano and SystemD
Capistrano and SystemDCapistrano and SystemD
Capistrano and SystemDAmoniac OÜ
 
#1HanoiMagentoMeetup_Magento 2 vs Magento 1
#1HanoiMagentoMeetup_Magento 2 vs Magento 1#1HanoiMagentoMeetup_Magento 2 vs Magento 1
#1HanoiMagentoMeetup_Magento 2 vs Magento 1Hanoi MagentoMeetup
 
Gtpp general truncated pyramid p2 p architecture over structured dht networks
Gtpp general truncated pyramid p2 p architecture over structured dht networksGtpp general truncated pyramid p2 p architecture over structured dht networks
Gtpp general truncated pyramid p2 p architecture over structured dht networksLixiang Liu
 
Gtpp general truncated pyramid p2 p architecture over structured dht networks
Gtpp general truncated pyramid p2 p architecture over structured dht networksGtpp general truncated pyramid p2 p architecture over structured dht networks
Gtpp general truncated pyramid p2 p architecture over structured dht networksLixiang Liu
 

Similar to PGroonga 2 – Make PostgreSQL rich full text search system backend! (20)

Using Thinking Sphinx with rails
Using Thinking Sphinx with railsUsing Thinking Sphinx with rails
Using Thinking Sphinx with rails
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Nancy CLI. Automated Database Experiments
Nancy CLI. Automated Database ExperimentsNancy CLI. Automated Database Experiments
Nancy CLI. Automated Database Experiments
 
Database Tools by Skype
Database Tools by SkypeDatabase Tools by Skype
Database Tools by Skype
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
 
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
 
OSMC 2008 | PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
OSMC 2008 |  PostgreSQL Monitoring - Introduction, Internals And Monitoring S...OSMC 2008 |  PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
OSMC 2008 | PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
 
Puppet
PuppetPuppet
Puppet
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
Islamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningIslamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuning
 
Islamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningIslamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuning
 
groonga with PostgreSQL
groonga with PostgreSQLgroonga with PostgreSQL
groonga with PostgreSQL
 
Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-
Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-
Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-
 
Capistrano and SystemD
Capistrano and SystemDCapistrano and SystemD
Capistrano and SystemD
 
Capistrano && SystemD
Capistrano && SystemDCapistrano && SystemD
Capistrano && SystemD
 
#1HanoiMagentoMeetup_Magento 2 vs Magento 1
#1HanoiMagentoMeetup_Magento 2 vs Magento 1#1HanoiMagentoMeetup_Magento 2 vs Magento 1
#1HanoiMagentoMeetup_Magento 2 vs Magento 1
 
Cascalog
CascalogCascalog
Cascalog
 
Gtpp general truncated pyramid p2 p architecture over structured dht networks
Gtpp general truncated pyramid p2 p architecture over structured dht networksGtpp general truncated pyramid p2 p architecture over structured dht networks
Gtpp general truncated pyramid p2 p architecture over structured dht networks
 
Gtpp general truncated pyramid p2 p architecture over structured dht networks
Gtpp general truncated pyramid p2 p architecture over structured dht networksGtpp general truncated pyramid p2 p architecture over structured dht networks
Gtpp general truncated pyramid p2 p architecture over structured dht networks
 

More from Kouhei Sutou

RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowRubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowKouhei Sutou
 
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Kouhei Sutou
 
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowRubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowKouhei Sutou
 
Rubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェアRubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェアKouhei Sutou
 
Apache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかApache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかKouhei Sutou
 
Apache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory dataApache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory dataKouhei Sutou
 
Redmine検索の未来像
Redmine検索の未来像Redmine検索の未来像
Redmine検索の未来像Kouhei Sutou
 
Apache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory dataApache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory dataKouhei Sutou
 
Better CSV processing with Ruby 2.6
Better CSV processing with Ruby 2.6Better CSV processing with Ruby 2.6
Better CSV processing with Ruby 2.6Kouhei Sutou
 
Apache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームApache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームKouhei Sutou
 
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムMySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムKouhei Sutou
 
MySQL 8.0でMroonga
MySQL 8.0でMroongaMySQL 8.0でMroonga
MySQL 8.0でMroongaKouhei Sutou
 
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!Kouhei Sutou
 
MariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システムMariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システムKouhei Sutou
 
PGroonga 2 - PostgreSQLでの全文検索の決定版
PGroonga 2 - PostgreSQLでの全文検索の決定版PGroonga 2 - PostgreSQLでの全文検索の決定版
PGroonga 2 - PostgreSQLでの全文検索の決定版Kouhei Sutou
 

More from Kouhei Sutou (20)

RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowRubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
 
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
 
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowRubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
 
Rubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェアRubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェア
 
Apache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかApache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのか
 
Apache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory dataApache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory data
 
Apache Arrow 2019
Apache Arrow 2019Apache Arrow 2019
Apache Arrow 2019
 
Redmine検索の未来像
Redmine検索の未来像Redmine検索の未来像
Redmine検索の未来像
 
Apache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory dataApache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory data
 
Better CSV processing with Ruby 2.6
Better CSV processing with Ruby 2.6Better CSV processing with Ruby 2.6
Better CSV processing with Ruby 2.6
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
Apache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームApache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォーム
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムMySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
 
MySQL 8.0でMroonga
MySQL 8.0でMroongaMySQL 8.0でMroonga
MySQL 8.0でMroonga
 
My way with Ruby
My way with RubyMy way with Ruby
My way with Ruby
 
Red Data Tools
Red Data ToolsRed Data Tools
Red Data Tools
 
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
 
MariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システムMariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システム
 
PGroonga 2 - PostgreSQLでの全文検索の決定版
PGroonga 2 - PostgreSQLでの全文検索の決定版PGroonga 2 - PostgreSQLでの全文検索の決定版
PGroonga 2 - PostgreSQLでの全文検索の決定版
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

PGroonga 2 – Make PostgreSQL rich full text search system backend!

  • 1. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 PGroonga 2Make PostgreSQL rich full text search system backend! Kouhei Sutou ClearCode Inc. PGConf.ASIA 2017 2017-12-05
  • 2. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Targets 対象者 Want to implement full text search with PostgreSQL PostgreSQLで全文検索したい Not good at full text search 全文検索はよく知らない PGroonga 1.0.0 users PGroonga 1.0.0は使ったことがある
  • 3. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Abbreviations 略語 PG: PostgreSQL ポスグレ: PostgreSQL FTS: Full text search FTS: 全文検索
  • 4. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS system: Targets 全文検索システム:対象 Many tests大量のテキスト e.g.: Text data in office docs in file servers 例:ファイルサーバー内のオフィス文書内のテキスト e.g.: Item descriptions, chat logs, Wiki data, ... 例:商品説明やチャットログ、Wikiのデータなど
  • 5. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS system: Goal 全文検索システム:目的 Provide needed info when you need必要な情報を必要なときに提供すること
  • 6. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Provide needed info 必要な情報を提供 Not found 探している情報が見つからない Found 探している情報が見つかる Found unconscious needed info too! 意識していなかったけど実は欲しかった情報も見つか る!
  • 7. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 When you need 必要なときに活用 Need many times to find なかなか見つからない Find in no time すぐに見つかる Already found すでに見つかっていた e.g.: Recommendation 例:レコメンデーション
  • 8. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 How to impl.: Options 実装方法:選択肢 Use FTS server 全文検索サーバーを使う Use PostgreSQL PostgreSQLを使う
  • 9. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS server: Pros 全文検索サーバー案:メリット Provides all basic features 必要な機能が揃っている Provides advanced features +αの機能もある Fast 速い
  • 10. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS server: Cons1 全文検索サーバー案:デメリット1 Large implementation cost 実装コスト大 Learn how to use from scratch 使い方を1から学ぶ必要がある How to implement data sync? マスターデータの同期はどうする?
  • 11. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS server: Cons2 全文検索サーバー案:デメリット2 Large maintenance cost メンテナンスコスト大 Learn how to operate from scratch 運用方法を1から学ぶ必要がある
  • 12. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 PostgreSQL: Pros1 PostgreSQL案:メリット1 Less implementation cost 実装コスト小 Less things to be learned 新しく覚えることが少ない Can manage data at the same place データの一元管理
  • 13. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 PostgreSQL: Pros2 PostgreSQL案:メリット2 Less operation cost メンテナンスコスト小 The current operation knowledge is reusable 既存の運用ノウハウを使える
  • 14. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 PostgreSQL: Cons PostgreSQL案:デメリット Built-in features aren't enough 組込機能では機能不足 SQL limits efficiency SQLの表現力不足 e.g.: SQL needs multiple queries for a process that can be done by 1 query by FTS server 例:全文検索サーバーなら1クエリーで実現できる処 理にSQLだと複数クエリー必要なことがある
  • 15. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 The 3rd option 第3の選択肢 Use FTS engine via PostgreSQL (SQL) PostgreSQL経由(SQL)で全文検索エンジンを使う
  • 16. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Pros メリット Fast and rich features 高速で豊富な機能 Less implementation cost 実装コスト小 Less operation cost メンテナンスコスト小
  • 17. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Cons デメリット Need PostgreSQL extension PostgreSQLに拡張機能が必要 Not available on DBaaS DBaaSで使えない
  • 18. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Option: No FTS knowledge オススメの選択肢:全文検索の知識ナシ Need only simple features まだ単純な機能で十分 Less data: LIKE with PostgreSQL データ少:PostgreSQLでLIKE Need up-to-date FTS features いまどきの全文検索機能が必要 FTS engine via PostgreSQL PostgreSQL経由で全文検索エンジン
  • 19. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Option: With FTS knowledge オススメの選択肢:全文検索の知識アリ Need tuned FTS feature カリカリにチューニングしたい PostgreSQL + FTS server PostgreSQL+全文検索サーバー Others それ以外 FTS engine via PostgreSQL PostgreSQL経由で全文検索エンジン
  • 20. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Described option 説明する選択肢 FTS engine via PostgreSQLPostgreSQL経由で全文検索エンジン
  • 21. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS engine: Groonga 全文検索エンジン:Groonga(ぐるんが) Embeddable FTS engine 組込可能な全文検索エンジン PGroonga: Groonga in PostgreSQL PGroonga:PostgreSQLに組込 Usable as FTS server 全文検索サーバーとして単独でも使用可能 PostgreSQL + FTS server architecture is also available PostgreSQL+全文検索サーバー構成もできる
  • 22. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Groonga's hobby: data update Groongaの得意な事:データの追加・更新 Make fresh data searchable! 新鮮な情報をすぐ検索可能! Batch update is needless バッチで更新しなくてもよい Can use as chat backend チャットくらいの頻度でもOK e.g.: Zulip uses PGroonga 例:ZulipはPGroongaを採用
  • 23. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Groonga's hobby: data update Groongaの得意な事:データの追加・更新 Keep search performance while updating! 更新中も検索性能が落ちない! Updatable when there are many search users 利用ユーザーが多い時でも更新可能
  • 24. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 PGroonga PGroonga(ぴーじーるんが) PostgreSQL index PostgreSQLのインデックス Alternative of GIN, RUM, ... GIN・RUMなどと同じレイヤー Usage 使用方法 CREATE INDEX ... USING PGroonga ...
  • 25. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 PostgreSQL and FTS PostgreSQLと全文検索 LIKE: Built-in(組込機能) textsearch: Built-in(組込機能) pg_trgm: Contrib(標準添付) Bundled in the archive アーカイブには含まれている Need to install separately 別途インストールすれば使える
  • 26. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 LIKE and performance LIKEと速度 Small data 少ないデータ Enough performance 十分実用的 Not small data 少なくないデータ Need to tune 性能問題アリ
  • 27. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 LIKE and FTS system LIKEと全文検索システム Enough performance in most case 速度が実用的なことも多い Data are small in many case 少ないデータなら
  • 28. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 LIKE and FTS system LIKEと全文検索システム Unable to sort それっぽい順のソート不可 Sort is important in FTS 全文検索ではソート順が重要 Users check only the first N entries ユーザーは先頭N件しか見ない
  • 29. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 textsearch Fast search by index インデックスを作るので速い Need module for each lang 言語毎にモジュールが必要 Modules for English, French, ... are built-in 英語やフランス語などは組込 Modules for languages in Asia aren't maintained アジア圏の言語用のモジュールはメンテされていない
  • 30. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 pg_trgm Fast search by index インデックスを作るので速い Asian languages aren't enough supported アジア圏の言語のサポートは十分ではない Unable to sort それっぽい順のソート不可
  • 31. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 RUM RUM = GIN + position RUMは位置情報付きのGIN https://github.com/postgrespro/rum pg_trgm/pg_bigm are slow for much matches case pg_trgmとpg_bigmはマッチ数が多いと遅くなる RUM will solve it GINの代わりにRUMを使うことで解決できるかも!
  • 32. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 PGroonga Fast search by index インデックスを作るので速い Sortable それっぽい順のソート可 Support all languages 全言語対応 Need to install separately 別途インストールする必要アリ
  • 33. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS system with PostgreSQL PostgreSQLで全文検索システム PGroonga is the best! PGroongaがベスト! PGroonga Fast(高速) Support all langs(全言語対応) Sortable(それっぽい順でソート可)
  • 34. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS system: Basic features 全文検索システム:基本機能 Fast FTS + sort 高速全文検索+ソート Show texts around keyword キーワード周辺テキスト表示 Highlight keyword 検索キーワードハイライト
  • 35. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS system: Adv. features 全文検索システム:高度な機能 Auto complete オートコンプリート Similar search 類似文書検索 Synonym expansion 同義語展開
  • 36. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 PGroonga 1.0.0 ↓ are only supported 以下の機能のみ対応 Fast FTS + sort 高速全文検索+ソート Show texts around keyword キーワード周辺テキスト表示
  • 37. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 PGroonga 2 All features are supported!全機能対応!
  • 38. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 PGroonga 1.0.0 → 2 Many new features たくさんの新機能 Improve performance 性能改善 API is changed APIが変わった
  • 39. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 API change API変更 Operator is changed 演算子変更 @@ → &@~ %% → &@ ...
  • 40. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 API change API変更 pgroonga schema is deprecated pgroongaスキーマを非推奨に pgroonga.score → pgroonga_score pgroonga.flush → pgroonga_flush ...
  • 41. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 App for PGroonga 1.0.0 PGroonga 1.0.0用アプリ Broken with PGroonga 2? PGroonga 2では動かない? No! Work without any changes! 何も変更しなくても動くよ! Great! But why? いいじゃん!でもなんで動くの? ↓ "Painless upgrade" technique
  • 42. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Painless upgrade PGroonga 2 provides both 1 API and 2 API PGroonga 2は1用のAPIも2用のAPIも両方提供 Can use PGroonga 2 with 1 API PGroonga 1のAPIでPGroonga 2を使える
  • 43. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Painless upgrade The last PGroonga 1.X provides both 1 API and partially 2 API PGroonga 1系の最終版は1用のAPIも2用のAPIの一部も提 供 Can use PGroonga 1 with 2 API PGroonga 2のAPIでPGroonga 1を使える
  • 44. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Painless upgrade PGroonga 2 keeps 1 API PGroonga 2の間は1のAPIを維持 PGroonga 3 will drop 1 API PGroonga 3で1のAPIを削除予定 Just need to upgrade API until 3 PGroonga 3までにAPIをアップグレードすればよい
  • 45. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Painless upgrade App for PGroonga 1.0.0 doesn't work with PGroonga 2 PGroonga 1.0.0用のアプリがPGroonga 2で動かない It's a bug. Please report it! バグなので報告してね!
  • 46. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS system: Basic features 全文検索システム:基本機能 Fast FTS + sort 高速全文検索+ソート Show texts around keyword キーワード周辺テキスト表示 Highlight keyword 検索キーワードハイライト
  • 47. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Fast FTS + sort 高速全文検索+ソート
  • 48. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Table definition CREATE TABLE entries ( -- Need primary key -- It's needed for sort id integer PRIMARY KEY, title text, content text );
  • 49. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Index definition -- For FTS. -- The default is good enough! CREATE INDEX entries_full_text_search ON entries -- "USING PGroonga" is important! -- Primary key is for sort! USING PGroonga (id, title, content);
  • 50. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Insert data -- Normal INSERT. INSERT INTO entries VALUES (1, 'Fast FTS with Groonga!', 'Fast FTS is needed!');
  • 51. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS SELECT title FROM entries WHERE -- &@~ is for FTS -- AND search with "search" and "fast" title &@~ 'search fast' OR content &@~ 'search fast';
  • 52. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS: LIKE SELECT title FROM entries WHERE -- Index search for LIKE is supported -- = Improve app perf without any changes -- NOTE: &@~ is faster than LIKE title LIKE '%search%' OR content LIKE '%search%';
  • 53. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Sort SELECT title, -- pgroonga_score(TABLE_NAME) returns -- precision as number pgroonga_score(entries) AS score FROM entries WHERE -- ... -- Sort by precision ORDER BY score DESC LIMIT 10;
  • 54. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Highlight keyword キーワードハイライト
  • 55. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Hightlight for HTML SELECT pgroonga_highlight_html( title, -- Extract keywords from query pgroonga_query_extract_keywords('search fast')) FROM entries WHERE title &@~ 'search fast' OR content &@~ 'search fast';
  • 56. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Highlight for HTML: Example Fast search with <Groonga>! ↓ <span class="keyword">Fast</span> ↑↓ Keywords are marked up with "class" <span class="keyword">search</span>! with &lt;Groonga&gt;! ← Escape tag
  • 57. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Texts around keyword キーワード周辺テキスト
  • 58. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Texts around keyword for HTML SELECT pgroonga_snippet_html( content, -- Extract keywords from query pgroonga_query_extract_keywords('search fast')) FROM entries WHERE title &@~ 'search fast' OR content &@~ 'search fast';
  • 59. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Example ...fast search with <Groonga>!... ↓ ARRAY[ ↓ First '<span class="keyword">fast</span> ↑↓ Keywords are marked up with "class" <span class="keyword">search/span>! with &lt;Groonga&gt;!', ← Escape tag '...' ← Second ]
  • 60. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS system: Adv. features 全文検索システム:高度な機能 Auto complete オートコンプリート Similar search 類似文書検索 Synonym expansion 同義語展開
  • 61. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Auto complete オートコンプリート
  • 62. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Auto complete: Preparation オートコンプリート:準備 Master table マスターテーブル Candidate 候補:(例:牛乳) Readings in Katakana (Only for Japanese) ヨミ(日本語の場合。カタカナ。複数登録可。) 例:ギュウニュウ・ミルク
  • 63. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Auto complete: Implementation オートコンプリート:実装方法 OR search with ... Prefix search against readings (Only for Japanese) ヨミを前方一致検索(日本語の場合。) Loose FTS against candidate 候補をゆるく全文検索 Sort by candidate 候補でソート https://pgroonga.github.io/how-to/auto-complete.html
  • 64. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Table definition CREATE TABLE terms ( term text, -- Candidate readings text[], -- Readings );
  • 65. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Data example INSERT INTO terms VALUES ( 'milk', -- Candidate ARRAY[ -- Reading in Katakana 'ギュウニュウ', -- "milk" in Japanese -- Multiple readings 'ミルク' -- "milk" in Katakana ] );
  • 66. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Data management データ管理 Easy to maintain because it's a normal table 普通のテーブルなので管理が楽 Easy to insert/delete/update 追加・削除・更新が楽 Normal backup and replication ダンプ・リストアもレプリケーションもいつも通り
  • 67. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Index for prefix search 前方一致用インデックス CREATE INDEX prefix_search ON terms USING PGroonga -- ...text_array_term_search... (readings pgroonga_text_array_term_search_ops_v2);
  • 68. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Index for loose FTS 緩い全文検索用インデックス CREATE INDEX loose_search ON terms USING PGroonga (term) -- Tokenizer for loose full text search WITH (tokenizer='TokenBigramSplitSymbolAlphaDigit');
  • 69. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 How to search 検索方法 SELECT term FROM terms -- Prefix search against readings WHERE readings &^~ '${INPUT}' OR -- Loose full text search term &@ '${INPUT}' ORDER BY term LIMIT 10; -- Sort
  • 70. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Search example: Candidate 検索例:候補 -- User inputs "il" SELECT term FROM terms -- Prefix search against readings WHERE readings &^~ 'il' OR -- Loose full text search (Hit) term &@ 'li' ORDER BY term LIMIT 10; -- Sort
  • 71. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Search example: Katakana 検索例:カタカナ -- User inputs "ギュウ" SELECT term FROM terms -- Prefix search against readings (Hit) WHERE readings &^~ 'ギュウ' OR -- Loose full text search term &@ 'ギュウ' ORDER BY term LIMIT 10; -- Sort
  • 72. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Search example: Hiragana 検索例:ひらがな -- User inputs "ぎゅう" SELECT term FROM terms -- Prefix search against readings (Hit) WHERE readings &^~ 'ぎゅう' OR -- Loose full text search term &@ 'ぎゅう' ORDER BY term LIMIT 10; -- Sort
  • 73. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Search example: Romaji 検索例:ローマ字 -- User inputs "gyu" SELECT term FROM terms -- Prefix search against readings (Hit) WHERE readings &^~ 'gyu' OR -- Loose full text search term &@ 'gyu' ORDER BY term LIMIT 10; -- Sort
  • 74. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Synonym expansion 同義語展開 Synonym 同義語 Same mean but different notation 同じ意味だが表記が異なる語 e.g.: "PostgreSQL" and "PG" 例:「PostgreSQL」と「ポスグレ」
  • 75. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Synonym expansion 同義語展開 Users don't want to care ユーザーは気にしたくない Synonym expansion 同義語展開 OR search with all synonyms 同義語すべてでOR検索
  • 76. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Implementation 実装方法 Create synonym table 同義語管理テーブルを作成 Expand synonyms in query クエリー内の同義語を展開 Search by expanded query 展開後のクエリーで検索 https://pgroonga.github.io/reference/functions/ pgroonga-query-expand.html
  • 77. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Table definition CREATE TABLE synonyms ( -- Term to be expanded term text, -- Synonym list. -- Including the "term" itself. -- If you don't input the "term", -- the "term" is unsearchable term. terms text[] );
  • 78. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Data example INSERT INTO synonyms VALUES ('PostgreSQL', -- Expand "PostgreSQL" ARRAY['PostgreSQL', 'PG']), ('PG', -- Expand "PG" ARRAY['PG', 'PostgreSQL']);
  • 79. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Data management データ管理 Easy to maintain because it's a normal table 普通のテーブルなので管理が楽 Easy to insert/delete/update 追加・削除・更新が楽 Normal backup and replication ダンプ・リストアもレプリケーションもいつも通り
  • 80. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Index definition CREATE INDEX synonym_search ON synonyms USING PGroonga -- ...text_term_search... -- For equal search (term pgroonga_text_term_search_ops_v2);
  • 81. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Confirm 確認方法 SELECT pgroonga_query_expand( 'synonyms', -- Table name 'term', -- Column name to be expanded 'terms', -- Column name for synonyms 'PostgreSQL' -- Query ); -- '((PostgreSQL) OR (PG))'
  • 82. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Search 検索方法 SELECT title FROM entries WHERE -- title &@~ 'DB ((PostgreSQL) OR (PG))' title &@~ pgroonga_query_expand('synonyms', 'term', 'terms', 'DB PostgreSQL');
  • 83. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Similar search 類似文書検索 Query is document itself 検索クエリーは文書そのもの Not keyword キーワードではない Use case 利用例 Show related entries 関連エントリーの提示に使える
  • 84. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Implementation 実現方法 Create dedicated index 類似検索用のインデックスを作る Use tokenizer for target language 対象の言語に合わせた処理で精度向上 e.g.: MeCab based tokenizer for Japanese 例:日本語ならMeCabベースのトークナイザーを活用 Use dedicated operator 類似検索用の演算子を使う
  • 85. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Index definition CREATE INDEX entries_similar_search ON entries -- Target: Both title and content -- Reason: Title is important USING PGroonga (id, (title || ' ' || content)) -- TokenMecab is good for Japanese WITH (tokenizer='TokenMecab');
  • 86. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Search SELECT title, pgroonga_score(entries) AS score FROM entries WHERE -- &@* is operator for similar search. -- Search with existing document. (title || ' ' || content) &@* '...fast search with Groonga!...' ORDER BY score DESC LIMIT 3;
  • 87. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Result example 結果例 Query: ...search with Groonga!... Hit example: ...search with PGroonga!...
  • 88. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Wrap up: Basic features 全文検索システム:基本機能 Fast FTS + sort 高速全文検索+ソート Show texts around keyword キーワード周辺テキスト表示 Highlight keyword 検索キーワードハイライト
  • 89. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Wrap up: Adv. features 全文検索システム:高度な機能 Auto complete オートコンプリート Similar search 類似文書検索 Synonym expansion 同義語展開
  • 90. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 FTS system: Next step 全文検索システム:次の一歩 Support structured data 構造化データ対応 Office document, HTML, ... オフィス文書・HTMLなど Needed features 対応に必要な処理 Text/metadata extraction テキスト・メタデータ抽出 Create screenshot スクリーンショット作成
  • 91. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Extraction tool 抽出ツール Apache Tika Apache Lucene's subproject Many supported formats 対応フォーマットが多い ChupaText Groonga's subproject Screenshot support スクリーンショット作成対応
  • 92. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 ChupaText Supported formats(対応フォーマット) Word/Excel/PowerPoint ODT/ODS/ODP(OpenDocument) PDF/HTML/XML/CSV/... Interface(インターフェイス) HTTP and command line HTTPとコマンドライン
  • 93. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Install インストール Use Docker or Vagrant DockerかVagrantを使うのが楽 https://github.com/ranguba/chupa-text-docker https://github.com/ranguba/chupa-text-vagrant
  • 94. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 ChupaText:Docker % GITHUB=https://github.com % git clone ${GITHUB}/ranguba/chupa-text-docker.git % cd chupa-text-docker % docker-compose up --build
  • 95. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Usage 使い方 % curl --form data=@XXX.pdf http://localhost:20080/extraction.json
  • 96. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Result example 結果例 { "mime-type": "application/pdf", # MIME type for the original data "size": 147159, # Metadata ..., "texts": [ # Extracted texts { "mime-type": "text/plain", # MIME type for the extracted data ..., "creator": "Adobe Illustrator CS3", # Metadata "body": "This is sample PDF. ...", # Extracted text "screenshot": { "mime-type": "image/png", # MIME type for screenshot "data": "iVBORw...", # Base64-ed image data "encoding": "base64" } } ] }
  • 97. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Web UI
  • 98. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Web UI: Extraction example Web UI:抽出例
  • 99. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Web UI: Extraction example Web UI:抽出例
  • 100. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 ChupaText:Vagrant % GITHUB=https://github.com % git clone ${GITHUB}/ranguba/chupa-text-vagrant.git % cd chupa-text-vagrant % vagrant up Usage is the same as Docker's 使い方はDocker版と同じ
  • 101. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Use cases(活用例) Extracted text Insert into PGroonga Extracted metadata Insert into PGroonga Use for condition(絞り込みに活用) Created screenshot Show in search result(検索結果で表 示)
  • 102. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Wrap up まとめ FTS engine via PostgreSQL PostgreSQL経由で全文検索エンジン Provide decision info 採用の判断材料を提供
  • 103. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Wrap up まとめ Show how to impl. FTS system 全文検索システム実装例を紹介 PGroonga PGroonga 1.0.0 and 2 PGroonga 1.0.0と2 Painless upgrade
  • 104. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Wrap up まとめ Show how to support structured data 構造化データの対応方法を紹介 ChupaText
  • 105. PGroonga 2 - Make PostgreSQL rich full text search system backend! Powered by Rabbit 2.2.2 Support service サポートサービス紹介 Install support(導入支援) 設計支援・性能検証・移行支援・… Development support(開発支援) サンプルコード提供・問い合わせ対応・… Operation support(運用支援) 障害対応・チューニング支援・… Contact(問い合わせ先) https://www.clear-code.com/contact/? type=groonga