SlideShare a Scribd company logo
1 of 26
Download to read offline
Pharos as a Pluggable Secondary
Index Component
Lei Wang
China Everbright Bank Enterprise Architect
Introduce
Lei Wang 王 磊
光大银行科技部领域架构师,曾任职于IBM全球咨询服务部从事技术咨询工作,具有十余年数据领域
研发及咨询经验。目前负责全行数据领域系统的日常架构设计、评审及内部研发等工作,对分布式数据库、
Hadoop等基础架构研究有浓厚兴趣。
I Overall Introduction
II Architecture Design
III Future Development
Part I
Overall Introduction
Research Background
Existing Products or Solutions
1. Native Filter,Low performance
2. HBase +Solr(ES) , Complex architecture / Performance is not good enough / High maintenance cost
3. Phoenix,Heavy Solution / Community inactivity / Imperfect function
4. HBase On Cloud,Unable to use because of security requirements
Our requirements
1. Non-Invasive
2. High performance
3. Universal
4. Simple architecture
5. Support transaction consistency
Pharos
1. Name
comes from English word ‘pharos’
2. Business Scenarios
•Read Only , T+1 Batch Load
•Read and Less Write ( Experimental )
3. Design Principle
Non-Invasive, Simple architecture
Research Process
Startup
V0.1 Release V0.22 Release
V0.3
Single Index
Multi Condition
Multi Data Type
V0.2 Release
Multi Index
Sorting
Paging
Cache
Index Builder Improvement
Refactoring Code
Bitmap Index
CBO Improvement
More Complex Conditions
Todo…
Transaction Consistency Index
2018.4 2018.11 2019.3 2019.7 2019.11
Pharos V0.22 Features (on HBase 1.2.6 or CDH 5.8.3-HBase 1.2.0)
1. Single Index(single column、multi column),Multi Index.
2. Paging, Sorting.
3. Multi Condition Query,including equal, less (equal), greater (equal).
4. AND / OR logic operation.
5. Multiple Data Type, including Char, Date ,Double and so on.
6. Simple Function Compute, for example record count.
7. Batch Index Creation.
Components
Client
API
Coordinator
Server
Coprocessor
Major Business Logic
Builder
Create Index
Coprocessor
Client
1. Define conditions
2. Set conditions in the scan
Server
1. Intercept and parse scan
2. Scan Index
3. Set matched index to filter
Client Code Example
1. Keep the original HBase style
2. Avoiding the complexity of SQL parser
Part II
Architecture Design
Global Index VS Partition(Local)Index
Global Index
• Support unique index
• Index creating and updating will be part of distributed
transactions, performance is not good.
• Query will cross different nodes, so performance may not
be good
Partition Index
• Index and data are co-distribution ,so queries can be pushed down to
each node. We can get good performance.
• Avoiding distributed transaction
• Not support unique index or other global constraints.
Storage Policy
Shadow Column Family
Index and data are exist the same region but in different column
family. We just need to control the generating logic of the index start
rowkey. It is un-invasive.
Single Index Table
Region is the smallest unit that is balanced. We must
guarantee that an index region is distributed with the
corresponding data region. So we must the modify the
balancer . It is invasive.
Index Key
1. Start key, keep index co-distribution
with data
2. Index name / number
3. Indexed column value
4. Reference data row key
Index Value
1. Version info
2. Metadata for deserialization
3. Transaction flag
Index Data Structure
Client as Global Coordinator, Keep Arch Simple
Two-Phase Sorting
1. Region Side Sorting
Base on natural sequence of index
2. Client Side Sorting
Merge results from regions, If data skew occurs,
query again.
Sorting Mechanisms
Reason
1. Paging is a universal requirement.
2. Always, the matched index is greater than the memory.
Design Strategy
Adding global session, shielding internal complexity.
One session breakpoint mapping multi region’s breakpoint.
Implement
1.Assuming that the data is evenly distributed, the page size
is spread to each region.
2.Region side, we control return indexes number and cache
the breakpoint.
3.Client side, merge result, if not enough then query again .
Paging Mechanisms
Choose local cache instead of distributed cache
Keep Arch Simple
Client side cache + Server side cache
Cache Mechanisms
Avoid Rebuilding index due to Region Splitting.
For different rowkey design, Bulkload may lead to
region splitting.
After data loading, stable regions can be obtained.
So we can create indexes and keep co-distribution
with the reference data.
Index Builder
Architecture
Part III
Future Development
Transaction Consistency
Inspire by Google’s Percolator
Bob -> Joe $7
Deformation of 2 Phase Commit
The state of the primary data is the state of the
transaction.
In the write process, only modify the primary state.
In the subsequent query, the state of second data
can be modified asynchronously based on the
primary.
Step 4 -> Transaction complete
Transaction Consistency
In the write process, we can
complete majority consistency, but
not all;
In the read process, we can confirm
transaction state by data row state,
then update index state.
Bitmap Index
CBO Improvement
Integration with SQL Engine(Presto?)
Other features
Thanks!
公众号:金融数士

More Related Content

Similar to hbaseconasia2019 Pharos as a Pluggable Secondary Index Component

J2EE Batch Processing
J2EE Batch ProcessingJ2EE Batch Processing
J2EE Batch ProcessingChris Adkin
 
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsLarge Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsJoel Oleson
 
Qardio experience with Core Data
Qardio experience with Core DataQardio experience with Core Data
Qardio experience with Core DataDmitrii Ivanov
 
BI Environment Technical Analysis
BI Environment Technical AnalysisBI Environment Technical Analysis
BI Environment Technical AnalysisRyan Casey
 
Government and Education Webinar: SQL Server—Indexing for Performance
Government and Education Webinar: SQL Server—Indexing for PerformanceGovernment and Education Webinar: SQL Server—Indexing for Performance
Government and Education Webinar: SQL Server—Indexing for PerformanceSolarWinds
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Trivadis
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsCalpont
 
Building High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic ApplicationsBuilding High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic Applicationsguest40cda0b
 
IRJET- Physical Database Design Techniques to improve Database Performance
IRJET-	 Physical Database Design Techniques to improve Database PerformanceIRJET-	 Physical Database Design Techniques to improve Database Performance
IRJET- Physical Database Design Techniques to improve Database PerformanceIRJET Journal
 
MySQL HA Presentation
MySQL HA PresentationMySQL HA Presentation
MySQL HA Presentationpapablues
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 

Similar to hbaseconasia2019 Pharos as a Pluggable Secondary Index Component (20)

J2EE Batch Processing
J2EE Batch ProcessingJ2EE Batch Processing
J2EE Batch Processing
 
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsLarge Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint Deployments
 
Qardio experience with Core Data
Qardio experience with Core DataQardio experience with Core Data
Qardio experience with Core Data
 
No sql database
No sql databaseNo sql database
No sql database
 
BI Environment Technical Analysis
BI Environment Technical AnalysisBI Environment Technical Analysis
BI Environment Technical Analysis
 
Government and Education Webinar: SQL Server—Indexing for Performance
Government and Education Webinar: SQL Server—Indexing for PerformanceGovernment and Education Webinar: SQL Server—Indexing for Performance
Government and Education Webinar: SQL Server—Indexing for Performance
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
 
Richa_Profile
Richa_ProfileRicha_Profile
Richa_Profile
 
NoSQL Consepts
NoSQL ConseptsNoSQL Consepts
NoSQL Consepts
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic Applications
 
Building High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic ApplicationsBuilding High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic Applications
 
Spring
SpringSpring
Spring
 
IRJET- Physical Database Design Techniques to improve Database Performance
IRJET-	 Physical Database Design Techniques to improve Database PerformanceIRJET-	 Physical Database Design Techniques to improve Database Performance
IRJET- Physical Database Design Techniques to improve Database Performance
 
Redshift
RedshiftRedshift
Redshift
 
MySQL HA Presentation
MySQL HA PresentationMySQL HA Presentation
MySQL HA Presentation
 
Ravi Kiran Resume
Ravi Kiran ResumeRavi Kiran Resume
Ravi Kiran Resume
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Isset Presentation @ EECI2009
Isset Presentation @ EECI2009Isset Presentation @ EECI2009
Isset Presentation @ EECI2009
 
Vineet Kurrewar
Vineet KurrewarVineet Kurrewar
Vineet Kurrewar
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 

More from Michael Stack

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on CloudMichael Stack
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at PinterestMichael Stack
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., LtdMichael Stack
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at DidiMichael Stack
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...Michael Stack
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at TencentMichael Stack
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...Michael Stack
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...Michael Stack
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at XiaomiMichael Stack
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBaseMichael Stack
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index SolutionMichael Stack
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent MemoryMichael Stack
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLMichael Stack
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBaseMichael Stack
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...Michael Stack
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteMichael Stack
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesHBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesMichael Stack
 

More from Michael Stack (20)

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinterest
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didi
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencent
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomi
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBase
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 Keynote
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesHBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
 

Recently uploaded

VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Deliverybabeytanya
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewingbigorange77
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663Call Girls Mumbai
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneCall girls in Ahmedabad High profile
 

Recently uploaded (20)

Vip Call Girls Aerocity ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Aerocity ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Aerocity ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Aerocity ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewing
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
 

hbaseconasia2019 Pharos as a Pluggable Secondary Index Component

  • 1.
  • 2. Pharos as a Pluggable Secondary Index Component Lei Wang China Everbright Bank Enterprise Architect
  • 3. Introduce Lei Wang 王 磊 光大银行科技部领域架构师,曾任职于IBM全球咨询服务部从事技术咨询工作,具有十余年数据领域 研发及咨询经验。目前负责全行数据领域系统的日常架构设计、评审及内部研发等工作,对分布式数据库、 Hadoop等基础架构研究有浓厚兴趣。
  • 4. I Overall Introduction II Architecture Design III Future Development
  • 6. Research Background Existing Products or Solutions 1. Native Filter,Low performance 2. HBase +Solr(ES) , Complex architecture / Performance is not good enough / High maintenance cost 3. Phoenix,Heavy Solution / Community inactivity / Imperfect function 4. HBase On Cloud,Unable to use because of security requirements Our requirements 1. Non-Invasive 2. High performance 3. Universal 4. Simple architecture 5. Support transaction consistency
  • 7. Pharos 1. Name comes from English word ‘pharos’ 2. Business Scenarios •Read Only , T+1 Batch Load •Read and Less Write ( Experimental ) 3. Design Principle Non-Invasive, Simple architecture
  • 8. Research Process Startup V0.1 Release V0.22 Release V0.3 Single Index Multi Condition Multi Data Type V0.2 Release Multi Index Sorting Paging Cache Index Builder Improvement Refactoring Code Bitmap Index CBO Improvement More Complex Conditions Todo… Transaction Consistency Index 2018.4 2018.11 2019.3 2019.7 2019.11
  • 9. Pharos V0.22 Features (on HBase 1.2.6 or CDH 5.8.3-HBase 1.2.0) 1. Single Index(single column、multi column),Multi Index. 2. Paging, Sorting. 3. Multi Condition Query,including equal, less (equal), greater (equal). 4. AND / OR logic operation. 5. Multiple Data Type, including Char, Date ,Double and so on. 6. Simple Function Compute, for example record count. 7. Batch Index Creation.
  • 11. Coprocessor Client 1. Define conditions 2. Set conditions in the scan Server 1. Intercept and parse scan 2. Scan Index 3. Set matched index to filter
  • 12. Client Code Example 1. Keep the original HBase style 2. Avoiding the complexity of SQL parser
  • 14. Global Index VS Partition(Local)Index Global Index • Support unique index • Index creating and updating will be part of distributed transactions, performance is not good. • Query will cross different nodes, so performance may not be good Partition Index • Index and data are co-distribution ,so queries can be pushed down to each node. We can get good performance. • Avoiding distributed transaction • Not support unique index or other global constraints.
  • 15. Storage Policy Shadow Column Family Index and data are exist the same region but in different column family. We just need to control the generating logic of the index start rowkey. It is un-invasive. Single Index Table Region is the smallest unit that is balanced. We must guarantee that an index region is distributed with the corresponding data region. So we must the modify the balancer . It is invasive.
  • 16. Index Key 1. Start key, keep index co-distribution with data 2. Index name / number 3. Indexed column value 4. Reference data row key Index Value 1. Version info 2. Metadata for deserialization 3. Transaction flag Index Data Structure
  • 17. Client as Global Coordinator, Keep Arch Simple Two-Phase Sorting 1. Region Side Sorting Base on natural sequence of index 2. Client Side Sorting Merge results from regions, If data skew occurs, query again. Sorting Mechanisms
  • 18. Reason 1. Paging is a universal requirement. 2. Always, the matched index is greater than the memory. Design Strategy Adding global session, shielding internal complexity. One session breakpoint mapping multi region’s breakpoint. Implement 1.Assuming that the data is evenly distributed, the page size is spread to each region. 2.Region side, we control return indexes number and cache the breakpoint. 3.Client side, merge result, if not enough then query again . Paging Mechanisms
  • 19. Choose local cache instead of distributed cache Keep Arch Simple Client side cache + Server side cache Cache Mechanisms
  • 20. Avoid Rebuilding index due to Region Splitting. For different rowkey design, Bulkload may lead to region splitting. After data loading, stable regions can be obtained. So we can create indexes and keep co-distribution with the reference data. Index Builder
  • 23. Transaction Consistency Inspire by Google’s Percolator Bob -> Joe $7 Deformation of 2 Phase Commit The state of the primary data is the state of the transaction. In the write process, only modify the primary state. In the subsequent query, the state of second data can be modified asynchronously based on the primary. Step 4 -> Transaction complete
  • 24. Transaction Consistency In the write process, we can complete majority consistency, but not all; In the read process, we can confirm transaction state by data row state, then update index state.
  • 25. Bitmap Index CBO Improvement Integration with SQL Engine(Presto?) Other features