SlideShare a Scribd company logo
1 of 35
Download to read offline
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
Mroonga
2016高速日本語全文検索 for MariaDB
Super fast full text search for MariaDB
Kouhei Sutou ClearCode Inc.
MariaDB Community Event in Tokyo
2016-07-21
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
Mroonga
読み方:むるんが
Pronunciation: múlúnɡά
ストレージエンジン
Storage engine
MariaDBバンドル
Bundled in MariaDB
別途インストールしなくてもよい
No need to install Mroonga separately
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
特徴
Characteristics
高速日本語全文検索(全言語OK)
Super fast full text search for all languages
カラムストアによる高速処理
Super fast processing by column store architecture
全文検索初心者でも使える
Easy to use by full text search beginners
全文検索上級者は活用できる
Features for full text search specialists
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
高速日本語全文検索
Super fast full text search
ベンチマーク
Benchmark
1.
速さの秘密
The reason why Mroonga is fast
2.
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
ベンチマーク環境
Benchmark environment
対象:Wikipedia日本語版
Target: Japanese version Wikipedia
レコード数:約185万件
The number of records: About 1.85 millions
データサイズ:約7GB
Data size: About 7GB
メモリー4GB・SSD250GB(ConoHa)
Memory: 4GB, SSD: 250GB
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
補足
Supplement
MySQL 5.7を使用
MySQL 5.7 is used
MariaDBのInnoDBは日本語未対応
InnoDB in MariaDB doesn't support Japanese yet
他人のベンチマークは参考程度
Just refer benchmark result by others
検討時は実環境でベンチマークを!
Run benchmark with the real data on real env
詳細(Detail):
https://github.com/groonga/wikipedia-search/issues/4
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
検索1
Search1
キーワード:テレビアニメ
(ヒット数:約2万3千件)
Keyword: TV animation
(N hits: About 23K)
InnoDB ngram 3m2s
InnoDB MeCab 6m20s
Mroonga:1 0.11s
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
検索2
Search2
キーワード:データベース
(ヒット数:約1万7千件)
Keyword: Database
(N hits: About 17K)
InnoDB ngram 36s
InnoDB MeCab:1 0.03s
Mroonga:2 0.09s
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
検索3
Search3
キーワード:PostgreSQL OR MySQL
(ヒット数:約400件)
Keyword: PostgreSQL OR MySQL
(N hits: About 400)
InnoDB ngram N/A(Error)
InnoDB MeCab:1 0.005s
Mroonga:2 0.028s
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
検索4
Search4
キーワード:日本
(ヒット数:約63万件)
Keyword: Japan
(N hits: About 630K)
InnoDB ngram 1.3s
InnoDB MeCab 1.3s
Mroonga:1 0.21s
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
検索まとめ
Wrap up search
Mroonga:安定して速い
Always fast
InnoDB FTS MeCab
ハマれば速い
Fast only for one token query
InnoDB FTS ngram
安定して遅い
Always slow
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
速さの秘密
The reason why Mroonga is fast
最適化された転置索引実装
Optimized inverted index implementation
2段階のデータ圧縮
2 level data compression
高速なポスティングリスト探索
Fast posting list search
検索だけでなく更新も速い
Not only search but also update is fast
11年以上開発が続いている全文検索エンジンGroongaを使用
Groonga full text search engine (11 years old) is used
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
もっと速さの秘密
More reasons why Mroonga is fast
カラムストアを活かした最適化
Optimizations based on column store architecture
ポイント1:余計なI/Oを減らす
Point1: Reduce needless I/O
ポイント2:I/Oを局所化
Point2: Localize I/O
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
カラムストア
Column store
Columns
Rows
a b c
1
2
3
V V V
V V V
V V V
Columns
Rows
a b c
1
2
3
V V V
V V V
V V V
Mroonga
Per column
InnoDB etc
Column Row
Value manage unit Per row
Fast access unit
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
必要なカラムのみアクセス
Access to only needed columns
-- Access to only a
SELECT a
FROM table
-- Access to only c
WHERE c = XXX;
-- b isn't accessed
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
減ったI/O
Reduced I/O
Columns
Rows
a b c
1
2
3
V V V
V V V
V V V
Columns
Rows
a b c
1
2
3
V V V
V V V
V V V
Mroonga
Per column
InnoDB etc
Column Row
Value manage unit Per row
Fast access unit
Not accessed
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
行カウント
Row count
-- No column values are needed
SELECT COUNT(*)
FROM table
-- Access to only full text search index of c
WHERE MATCH(c)
AGAINST('+keyword' IN BOOLEAN MODE);
-- a, b and c aren't accessed
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
減ったI/O
Reduced I/O
Columns
Rows
a b c
1
2
3
V V V
V V V
V V V
Columns
Rows
a b c
1
2
3
V V V
V V V
V V V
Mroonga
Per column
InnoDB etc
Column Row
Value manage unit Per row
Fast access unit
Not accessed
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
ORDER BY LIMIT
SELECT *
FROM table
WHERE MATCH(c)
AGAINST('+keyword' IN BOOLEAN MODE)
-- Mroonga processes ORDER BY LIMIT
-- instead of MariaDB
-- → Mroonga returns only 10 records
-- to MariaDB instead of all matched records
ORDER BY a LIMIT 10;
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
Optimized ORDER BY LIMIT
検索(Search) by Mroonga
カラム毎の処理でI/Oを局所化
(索引非使用時)
Localize I/O by per column processing
(on no index case)
ソート(Sort) by Mroonga
カラム毎の処理でI/Oを局所化
Localize I/O by per column processing
OFFSET/LIMIT by Mroonga
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
カラム毎の処理は速い
Per column processing is fast
Columns
Rows
a b c
1
2
3
V V V
V V V
V V V
Columns
Rows
a b c
1
2
3
V V V
V V V
V V V
Mroonga
Per column
InnoDB etc
Column Row
Value manage unit Per row
Fast access unit
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
最適化のまとめ
Wrap up optimization
転置索引実装が速い
Inverted index implementation is fast
検索も更新も速い
Both search and update are fast
カラムストアで速い
Fast by column store architecture
ポイント:I/O削減・I/O局所化
Points: Reduce and localize I/O
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
全文検索初心者でも使える
Easy to use by beginners
インストールが簡単
Easy to install
MySQLの標準機能のみで使える
Usable only with MySQL standard features
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
インストールが簡単
Easy to install
MariaDBバンドル
MariaDB bundles Mroonga
Apt/Yumリポジトリー
Apt/Yum repositories
MariaDB込みのWindowsバイナリ
Windows binary with MariaDB
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
標準機能のみで使える
Require only MySQL standard features
-- Create
CREATE TABLE table (
-- ...,
FULLTEXT INDEX (column)
) ENGINE=Mroonga;
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
標準機能のみで使える
Require only MySQL standard features
-- Convert
ALTER TABLE table
ADD FULLTEXT INDEX (column)
ENGINE=Mroonga;
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
標準機能のみで使える
Require only MySQL standard features
SELECT * FROM table
WHERE
MATCH(column)
AGAINST('+keyword'
IN BOOLEAN MODE);
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
全文検索上級者向け機能
Features for specialists
カスタマイズ
Customizable
デフォルト値はいい感じ
→初心者はカスタマイズなしでよい
Suitable default values
→Beginners don't need to customize
Groongaの機能をもっと使える
(高速・高機能)
Specialists can use more Groonga features
(Fast and high functionality)
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
文字正規化ルール変更
Change normalizer
CREATE TABLE table (
-- ...,
FULLTEXT INDEX (column)
--
-- Specify a parameter as comment
COMMENT='normalizer "NormalizerAuto"'
) ENGINE=Mroonga;
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
文字正規化ルール変更
Change normalizer
CREATE TABLE table (
-- ...,
FULLTEXT INDEX (column)
-- MariaDB:
-- Custom parameter can be used
NORMALIZER='NormalizerAuto'
) ENGINE=Mroonga;
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
Groongaの検索機能を使う
Use full Groonga search features
SELECT * FROM table
WHERE
-- "c1" is meaningless with "*SS" pragma
MATCH(c1)
-- "*SS" is a pragma to use
-- full Groonga search features
-- Multiple indexes can be used in A query
AGAINST('*SS c1 @ "keyword" && c2 < 100'
IN BOOLEAN MODE);
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
今後
Futures
最新機能サポート
Support the latest features
JSONを全文検索
(JSON型のデータの読み書きは対応済み)
Full text search against JSON
(Storing/fetching JSON are already supported)
virtual column/generated column
最新版をMariaDBにバンドル
Bundle the latest Mroonga to MariaDB
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
最新版をバンドル
Bundle the latest Mroonga
Mroongaは毎月リリース
Mroonga is released monthly
MariaDB 10.2.1 bundles
Mroonga 5.04
The latest Mroonga is 6.06
Mroonga supports MariaDB 10.2
since 6.03
How can we improve this?
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
まとめ1
Wrap up1
高速日本語全文検索(全言語OK)
Super fast full text search for all languages
カラムストアによる高速処理
Super fast processing by column store architecture
全文検索初心者でも使える
Easy to use by full text search beginners
全文検索上級者は活用できる
Features for full text search specialists
Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0
まとめ2
Wrap up2
今後もMroongaは便利になる
We continue to improve Mroonga
MariaDBで最新Mroongaを使える
MariaDB will bundle the latest Mroonga
MariaDBで全文検索ならMroonga!
Mroonga is the best for full text search on MariaDB!

More Related Content

Similar to Mroonga最新情報2016

Dat305 Deep Dive on Amazon Aurora PostgreSQL
Dat305 Deep Dive on Amazon Aurora PostgreSQLDat305 Deep Dive on Amazon Aurora PostgreSQL
Dat305 Deep Dive on Amazon Aurora PostgreSQLGrant McAlister
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
 
How To Access Code In Large w/ Vim
How To Access Code In Large w/ VimHow To Access Code In Large w/ Vim
How To Access Code In Large w/ VimCheng Hsien Chen
 
MariaDB TX 3.0 新機能 / ロードマップ
MariaDB TX 3.0 新機能 / ロードマップMariaDB TX 3.0 新機能 / ロードマップ
MariaDB TX 3.0 新機能 / ロードマップGOTO Satoru
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon Web Services Korea
 
Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-
Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-
Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-Hiroshi Yuki
 
AWS Blackbelt NINJA Dojo – Dean Samuels
AWS Blackbelt NINJA Dojo – Dean SamuelsAWS Blackbelt NINJA Dojo – Dean Samuels
AWS Blackbelt NINJA Dojo – Dean SamuelsAmazon Web Services
 
Fosdem2012 replication-features-of-2011
Fosdem2012 replication-features-of-2011Fosdem2012 replication-features-of-2011
Fosdem2012 replication-features-of-2011Sergey Petrunya
 
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScaleHow Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScaleMariaDB plc
 
MySQL Roadmap NoSQL HA Fev17
MySQL Roadmap NoSQL HA Fev17MySQL Roadmap NoSQL HA Fev17
MySQL Roadmap NoSQL HA Fev17MySQL Brasil
 
M|18 What's New in the MariaDB AX Platform
M|18 What's New in the MariaDB AX PlatformM|18 What's New in the MariaDB AX Platform
M|18 What's New in the MariaDB AX PlatformMariaDB plc
 
M|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksM|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksMariaDB plc
 
Elastic Stack 最新动态
Elastic Stack 最新动态 Elastic Stack 最新动态
Elastic Stack 最新动态 Elasticsearch
 
State of the art: server-side javaScript - NantesJS
State of the art: server-side javaScript - NantesJSState of the art: server-side javaScript - NantesJS
State of the art: server-side javaScript - NantesJSAlexandre Morgaut
 
HandlerSocket plugin Client for Javaとそれを用いたベンチマーク
HandlerSocket plugin Client for Javaとそれを用いたベンチマークHandlerSocket plugin Client for Javaとそれを用いたベンチマーク
HandlerSocket plugin Client for Javaとそれを用いたベンチマークmoai kids
 
Perf onjs final
Perf onjs finalPerf onjs final
Perf onjs finalqi yang
 
MongoDB, RabbitMQ y Applicaciones en Nube
MongoDB, RabbitMQ y Applicaciones en NubeMongoDB, RabbitMQ y Applicaciones en Nube
MongoDB, RabbitMQ y Applicaciones en NubeSocialmetrix
 

Similar to Mroonga最新情報2016 (20)

Dat305 Deep Dive on Amazon Aurora PostgreSQL
Dat305 Deep Dive on Amazon Aurora PostgreSQLDat305 Deep Dive on Amazon Aurora PostgreSQL
Dat305 Deep Dive on Amazon Aurora PostgreSQL
 
8051 micro controller
8051 micro controller8051 micro controller
8051 micro controller
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
How To Access Code In Large w/ Vim
How To Access Code In Large w/ VimHow To Access Code In Large w/ Vim
How To Access Code In Large w/ Vim
 
MariaDB TX 3.0 新機能 / ロードマップ
MariaDB TX 3.0 新機能 / ロードマップMariaDB TX 3.0 新機能 / ロードマップ
MariaDB TX 3.0 新機能 / ロードマップ
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
 
Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-
Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-
Fast fulltext search in Ruby, without Java -Groonga, Rroonga and Droonga-
 
AWS Blackbelt NINJA Dojo – Dean Samuels
AWS Blackbelt NINJA Dojo – Dean SamuelsAWS Blackbelt NINJA Dojo – Dean Samuels
AWS Blackbelt NINJA Dojo – Dean Samuels
 
Fosdem2012 replication-features-of-2011
Fosdem2012 replication-features-of-2011Fosdem2012 replication-features-of-2011
Fosdem2012 replication-features-of-2011
 
AWS Blackbelt NINJA Dojo
AWS Blackbelt NINJA DojoAWS Blackbelt NINJA Dojo
AWS Blackbelt NINJA Dojo
 
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScaleHow Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
 
MySQL Roadmap NoSQL HA Fev17
MySQL Roadmap NoSQL HA Fev17MySQL Roadmap NoSQL HA Fev17
MySQL Roadmap NoSQL HA Fev17
 
M|18 What's New in the MariaDB AX Platform
M|18 What's New in the MariaDB AX PlatformM|18 What's New in the MariaDB AX Platform
M|18 What's New in the MariaDB AX Platform
 
M|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksM|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocks
 
Elastic Stack 最新动态
Elastic Stack 最新动态 Elastic Stack 最新动态
Elastic Stack 最新动态
 
State of the art: server-side javaScript - NantesJS
State of the art: server-side javaScript - NantesJSState of the art: server-side javaScript - NantesJS
State of the art: server-side javaScript - NantesJS
 
HandlerSocket plugin Client for Javaとそれを用いたベンチマーク
HandlerSocket plugin Client for Javaとそれを用いたベンチマークHandlerSocket plugin Client for Javaとそれを用いたベンチマーク
HandlerSocket plugin Client for Javaとそれを用いたベンチマーク
 
Perf onjs final
Perf onjs finalPerf onjs final
Perf onjs final
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
MongoDB, RabbitMQ y Applicaciones en Nube
MongoDB, RabbitMQ y Applicaciones en NubeMongoDB, RabbitMQ y Applicaciones en Nube
MongoDB, RabbitMQ y Applicaciones en Nube
 

More from Kouhei Sutou

RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowRubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowKouhei Sutou
 
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Kouhei Sutou
 
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowRubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowKouhei Sutou
 
Rubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェアRubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェアKouhei Sutou
 
Apache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかApache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかKouhei Sutou
 
Apache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory dataApache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory dataKouhei Sutou
 
Redmine検索の未来像
Redmine検索の未来像Redmine検索の未来像
Redmine検索の未来像Kouhei Sutou
 
Apache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory dataApache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory dataKouhei Sutou
 
Better CSV processing with Ruby 2.6
Better CSV processing with Ruby 2.6Better CSV processing with Ruby 2.6
Better CSV processing with Ruby 2.6Kouhei Sutou
 
Apache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームApache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームKouhei Sutou
 
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムMySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムKouhei Sutou
 
MySQL 8.0でMroonga
MySQL 8.0でMroongaMySQL 8.0でMroonga
MySQL 8.0でMroongaKouhei Sutou
 
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!Kouhei Sutou
 
MariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システムMariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システムKouhei Sutou
 
PGroonga 2 – Make PostgreSQL rich full text search system backend!
PGroonga 2 – Make PostgreSQL rich full text search system backend!PGroonga 2 – Make PostgreSQL rich full text search system backend!
PGroonga 2 – Make PostgreSQL rich full text search system backend!Kouhei Sutou
 

More from Kouhei Sutou (20)

RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowRubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
 
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
 
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowRubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
 
Rubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェアRubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェア
 
Apache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかApache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのか
 
Apache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory dataApache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory data
 
Apache Arrow 2019
Apache Arrow 2019Apache Arrow 2019
Apache Arrow 2019
 
Redmine検索の未来像
Redmine検索の未来像Redmine検索の未来像
Redmine検索の未来像
 
Apache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory dataApache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory data
 
Better CSV processing with Ruby 2.6
Better CSV processing with Ruby 2.6Better CSV processing with Ruby 2.6
Better CSV processing with Ruby 2.6
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
Apache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームApache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォーム
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムMySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
 
MySQL 8.0でMroonga
MySQL 8.0でMroongaMySQL 8.0でMroonga
MySQL 8.0でMroonga
 
My way with Ruby
My way with RubyMy way with Ruby
My way with Ruby
 
Red Data Tools
Red Data ToolsRed Data Tools
Red Data Tools
 
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
 
MariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システムMariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システム
 
PGroonga 2 – Make PostgreSQL rich full text search system backend!
PGroonga 2 – Make PostgreSQL rich full text search system backend!PGroonga 2 – Make PostgreSQL rich full text search system backend!
PGroonga 2 – Make PostgreSQL rich full text search system backend!
 

Mroonga最新情報2016

  • 1. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 Mroonga 2016高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Kouhei Sutou ClearCode Inc. MariaDB Community Event in Tokyo 2016-07-21
  • 2. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 Mroonga 読み方:むるんが Pronunciation: múlúnɡά ストレージエンジン Storage engine MariaDBバンドル Bundled in MariaDB 別途インストールしなくてもよい No need to install Mroonga separately
  • 3. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 特徴 Characteristics 高速日本語全文検索(全言語OK) Super fast full text search for all languages カラムストアによる高速処理 Super fast processing by column store architecture 全文検索初心者でも使える Easy to use by full text search beginners 全文検索上級者は活用できる Features for full text search specialists
  • 4. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 高速日本語全文検索 Super fast full text search ベンチマーク Benchmark 1. 速さの秘密 The reason why Mroonga is fast 2.
  • 5. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 ベンチマーク環境 Benchmark environment 対象:Wikipedia日本語版 Target: Japanese version Wikipedia レコード数:約185万件 The number of records: About 1.85 millions データサイズ:約7GB Data size: About 7GB メモリー4GB・SSD250GB(ConoHa) Memory: 4GB, SSD: 250GB
  • 6. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 補足 Supplement MySQL 5.7を使用 MySQL 5.7 is used MariaDBのInnoDBは日本語未対応 InnoDB in MariaDB doesn't support Japanese yet 他人のベンチマークは参考程度 Just refer benchmark result by others 検討時は実環境でベンチマークを! Run benchmark with the real data on real env 詳細(Detail): https://github.com/groonga/wikipedia-search/issues/4
  • 7. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 検索1 Search1 キーワード:テレビアニメ (ヒット数:約2万3千件) Keyword: TV animation (N hits: About 23K) InnoDB ngram 3m2s InnoDB MeCab 6m20s Mroonga:1 0.11s
  • 8. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 検索2 Search2 キーワード:データベース (ヒット数:約1万7千件) Keyword: Database (N hits: About 17K) InnoDB ngram 36s InnoDB MeCab:1 0.03s Mroonga:2 0.09s
  • 9. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 検索3 Search3 キーワード:PostgreSQL OR MySQL (ヒット数:約400件) Keyword: PostgreSQL OR MySQL (N hits: About 400) InnoDB ngram N/A(Error) InnoDB MeCab:1 0.005s Mroonga:2 0.028s
  • 10. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 検索4 Search4 キーワード:日本 (ヒット数:約63万件) Keyword: Japan (N hits: About 630K) InnoDB ngram 1.3s InnoDB MeCab 1.3s Mroonga:1 0.21s
  • 11. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 検索まとめ Wrap up search Mroonga:安定して速い Always fast InnoDB FTS MeCab ハマれば速い Fast only for one token query InnoDB FTS ngram 安定して遅い Always slow
  • 12. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 速さの秘密 The reason why Mroonga is fast 最適化された転置索引実装 Optimized inverted index implementation 2段階のデータ圧縮 2 level data compression 高速なポスティングリスト探索 Fast posting list search 検索だけでなく更新も速い Not only search but also update is fast 11年以上開発が続いている全文検索エンジンGroongaを使用 Groonga full text search engine (11 years old) is used
  • 13. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 もっと速さの秘密 More reasons why Mroonga is fast カラムストアを活かした最適化 Optimizations based on column store architecture ポイント1:余計なI/Oを減らす Point1: Reduce needless I/O ポイント2:I/Oを局所化 Point2: Localize I/O
  • 14. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 カラムストア Column store Columns Rows a b c 1 2 3 V V V V V V V V V Columns Rows a b c 1 2 3 V V V V V V V V V Mroonga Per column InnoDB etc Column Row Value manage unit Per row Fast access unit
  • 15. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 必要なカラムのみアクセス Access to only needed columns -- Access to only a SELECT a FROM table -- Access to only c WHERE c = XXX; -- b isn't accessed
  • 16. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 減ったI/O Reduced I/O Columns Rows a b c 1 2 3 V V V V V V V V V Columns Rows a b c 1 2 3 V V V V V V V V V Mroonga Per column InnoDB etc Column Row Value manage unit Per row Fast access unit Not accessed
  • 17. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 行カウント Row count -- No column values are needed SELECT COUNT(*) FROM table -- Access to only full text search index of c WHERE MATCH(c) AGAINST('+keyword' IN BOOLEAN MODE); -- a, b and c aren't accessed
  • 18. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 減ったI/O Reduced I/O Columns Rows a b c 1 2 3 V V V V V V V V V Columns Rows a b c 1 2 3 V V V V V V V V V Mroonga Per column InnoDB etc Column Row Value manage unit Per row Fast access unit Not accessed
  • 19. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 ORDER BY LIMIT SELECT * FROM table WHERE MATCH(c) AGAINST('+keyword' IN BOOLEAN MODE) -- Mroonga processes ORDER BY LIMIT -- instead of MariaDB -- → Mroonga returns only 10 records -- to MariaDB instead of all matched records ORDER BY a LIMIT 10;
  • 20. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 Optimized ORDER BY LIMIT 検索(Search) by Mroonga カラム毎の処理でI/Oを局所化 (索引非使用時) Localize I/O by per column processing (on no index case) ソート(Sort) by Mroonga カラム毎の処理でI/Oを局所化 Localize I/O by per column processing OFFSET/LIMIT by Mroonga
  • 21. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 カラム毎の処理は速い Per column processing is fast Columns Rows a b c 1 2 3 V V V V V V V V V Columns Rows a b c 1 2 3 V V V V V V V V V Mroonga Per column InnoDB etc Column Row Value manage unit Per row Fast access unit
  • 22. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 最適化のまとめ Wrap up optimization 転置索引実装が速い Inverted index implementation is fast 検索も更新も速い Both search and update are fast カラムストアで速い Fast by column store architecture ポイント:I/O削減・I/O局所化 Points: Reduce and localize I/O
  • 23. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 全文検索初心者でも使える Easy to use by beginners インストールが簡単 Easy to install MySQLの標準機能のみで使える Usable only with MySQL standard features
  • 24. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 インストールが簡単 Easy to install MariaDBバンドル MariaDB bundles Mroonga Apt/Yumリポジトリー Apt/Yum repositories MariaDB込みのWindowsバイナリ Windows binary with MariaDB
  • 25. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 標準機能のみで使える Require only MySQL standard features -- Create CREATE TABLE table ( -- ..., FULLTEXT INDEX (column) ) ENGINE=Mroonga;
  • 26. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 標準機能のみで使える Require only MySQL standard features -- Convert ALTER TABLE table ADD FULLTEXT INDEX (column) ENGINE=Mroonga;
  • 27. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 標準機能のみで使える Require only MySQL standard features SELECT * FROM table WHERE MATCH(column) AGAINST('+keyword' IN BOOLEAN MODE);
  • 28. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 全文検索上級者向け機能 Features for specialists カスタマイズ Customizable デフォルト値はいい感じ →初心者はカスタマイズなしでよい Suitable default values →Beginners don't need to customize Groongaの機能をもっと使える (高速・高機能) Specialists can use more Groonga features (Fast and high functionality)
  • 29. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 文字正規化ルール変更 Change normalizer CREATE TABLE table ( -- ..., FULLTEXT INDEX (column) -- -- Specify a parameter as comment COMMENT='normalizer "NormalizerAuto"' ) ENGINE=Mroonga;
  • 30. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 文字正規化ルール変更 Change normalizer CREATE TABLE table ( -- ..., FULLTEXT INDEX (column) -- MariaDB: -- Custom parameter can be used NORMALIZER='NormalizerAuto' ) ENGINE=Mroonga;
  • 31. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 Groongaの検索機能を使う Use full Groonga search features SELECT * FROM table WHERE -- "c1" is meaningless with "*SS" pragma MATCH(c1) -- "*SS" is a pragma to use -- full Groonga search features -- Multiple indexes can be used in A query AGAINST('*SS c1 @ "keyword" && c2 < 100' IN BOOLEAN MODE);
  • 32. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 今後 Futures 最新機能サポート Support the latest features JSONを全文検索 (JSON型のデータの読み書きは対応済み) Full text search against JSON (Storing/fetching JSON are already supported) virtual column/generated column 最新版をMariaDBにバンドル Bundle the latest Mroonga to MariaDB
  • 33. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 最新版をバンドル Bundle the latest Mroonga Mroongaは毎月リリース Mroonga is released monthly MariaDB 10.2.1 bundles Mroonga 5.04 The latest Mroonga is 6.06 Mroonga supports MariaDB 10.2 since 6.03 How can we improve this?
  • 34. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 まとめ1 Wrap up1 高速日本語全文検索(全言語OK) Super fast full text search for all languages カラムストアによる高速処理 Super fast processing by column store architecture 全文検索初心者でも使える Easy to use by full text search beginners 全文検索上級者は活用できる Features for full text search specialists
  • 35. Mroonga 2016 - 高速日本語全文検索 for MariaDB Super fast full text search for MariaDB Powered by Rabbit 2.2.0 まとめ2 Wrap up2 今後もMroongaは便利になる We continue to improve Mroonga MariaDBで最新Mroongaを使える MariaDB will bundle the latest Mroonga MariaDBで全文検索ならMroonga! Mroonga is the best for full text search on MariaDB!