SlideShare a Scribd company logo
1 of 31
An assessment of the
migration
September 1st, 2017 presented by Markiyan RIZUN mrizun@gmail.com
From a relational database to a NoSQL store
Outline. Contents of the presentation
2
II – Problem & Objective
III – Background
I – Internship context
IV – The Migration
V – The Analysis & Results
VI – Conclusion
I – Internship Context
3
SOFTEAM R&D department + Modelio software
4
MODELIOSOFT
UML Java SQL
MODELIO MODELS
… and provides many other features
Working in SOFTEAM
Adventures with at
II – Problem & Objective
6
“To migrate, or not to migrate, that is the question”
– William Shakespeare, Hamlet
Problem definition
7
Migration
“What really matter are the business requirements and access/write pattern of the
applications.”
– Chongxin Li
If to migrate? How to migrate?
• data
• queries
• structure
Objective definition
8
1. Discover the “symptoms” of a relational database that indicate the need of migration to NoSQL?
2. Propose a guideline for the migration that would allow to generate adapted NoSQL database
accordingly to the usage of a source relational database.
Migration
III – Background
9
A brief overview of the relational and NoSQL DBMSs*
*DBMS – database management system
Relational approach. Overview
10
• data is stored in interconnected relations (tables)
• strictly defined database schema
• introduced by E. F. Codd in 1969
• one query language (SQL) for all DBMSs
• relies on principle of normalization
Relational approach. Normalization
11
Data structure
reorganization
Normalization
Eliminates data
redundancy
Consistent
data
“Normalization is far from being a panacea.”
– Christopher J. Date
- slow read performance
Relational approach. Normalization
12
+ no data duplication
+ consistent data
+ better flexibility
+ simplified design
+ saved space*
*accordingly to http://www.mkomo.com/cost-per-gigabyte-update storage cost is dropping rapidly, therefore nowadays it becomes almost irrelevant
- poor horizontal scalability
+ fast read performance
Rapid querying and an ability to scale are critical for distributed systems.
Relational approach. Denormalization
13
- data duplication
- inconsistent data
- worse flexibility
- messy design
- requires more space
+ better horizontal scalability
NoSQL* approach. Overview
14
• schema-free
• non-relational
• distributed
• horizontally scalable
• no common querying language
*NoSQL – Not Only SQL
NoSQL* approach. Types
15
*NoSQL – Not Only SQL
• column
• key-value
• document
• graph
• and many more …
NoSQL* approach. DBMSs
16
2. NoSQL is not a replacement of relational approach, but an
alternative.
Relational & NoSQL. Conclusion
17
1. NoSQL is designed for modern large scale distributed
applications that work with big volumes of unstructured data.
IV – The Migration
18
Review of the existing migration approaches
The migration. Definition
19
Migration
• data
• queries
• structure
The migration. Methods
20
• manual control over denormalization of database
• preservation of an equivalent normalized database
• full denormalization of database
• heuristic-based approach to creation of database
“What really matter are the business requirements and access/write pattern of the
applications.”
– Chongxin Li
None of the methods considers actual database usage.
- manual, without guidelines
- not adapted to NoSQL data model
- not adapted to specific database
- simplistic, manual
V – The Analysis & Results
21
Answering the migration questions: “If?” and “How?”
If to migrate? Discovering the “symptoms”
22
• denormalization may be a solution
• typically a relational database is normalized
• denormalization is unnatural for relational design
Denormalization motivates and simplifies the migration to NoSQL.
Denormalization is an attempt to artificially approximate
structure and characteristics of a relational database to NoSQL.
- fast querying / better horizontal scaling
- slow querying / poor horizontal scaling
- slow writes / NULL values
How to migrate? Defining important information
23
“What really matter are the business requirements and access/write pattern of the
applications.”
– Chongxin Li
• dynamic information
• static information
- monitoring and logging of real database activity
- considering indexes, views and procedures
- analysis of logged information
How to migrate? Proposing guidelines
24
• frequency of usage
• execution speed
• join operations
- allows to avoid unnecessary remodeling
- highlights important access patterns
- signify real database access patterns
Database schema remodelling decision tree
based on the database access patterns
The migration. Summary
26
• static database information analysis – potential access patterns revealed by indexes and procedures
• pre-denormalized relational database – a “symptom” that signifies necessity / possibility to migrate
• dynamic database information analysis – real access patterns revealed by join operations, queries’ speed & frequency
If to migrate?
How to migrate?
The migration. Modelio implementation*
27
• automatic model generation – generation of Modelio database model from source database
• document-oriented NoSQL store meta model – for MongoDB and Elasticsearch
• automatic Java code generation – generation of Java source code from Modelio model
Meta model
Features
*video tutorial is available at: https://youtu.be/wPDxk0YeTmw
VI – Conclusion
28
Foundation is built, now the real work begins …
Foundation. What do we have so far?
29
• “symptoms” of a database to migrate were discovered
• hypotheses for proper migration were proposed
Theoretical result
• meta model of NoSQL document oriented store and related functionality*
Practical result
*for details, see the report;
The real work. What will we do in the future?
30
• further developing theoretical analysis
• continue studying the question of migration
• confirming our hypotheses on practice
• implementing the migration on practice
to do on PhD
Your attention is much appreciated. Thank you!
31

More Related Content

What's hot

Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-ReduceBrendan Tierney
 
Difference between snowflake schema and fact constellation
Difference between snowflake schema and fact constellationDifference between snowflake schema and fact constellation
Difference between snowflake schema and fact constellationAsim Saif
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxGovardhanV7
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
Map Reduce data types and formats
Map Reduce data types and formatsMap Reduce data types and formats
Map Reduce data types and formatsVigen Sahakyan
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilledrICh morrow
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellKhalid Imran
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsightsWilfried Hoge
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesNatalino Busa
 
7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome ThemQubole
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 

What's hot (20)

Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
Difference between snowflake schema and fact constellation
Difference between snowflake schema and fact constellationDifference between snowflake schema and fact constellation
Difference between snowflake schema and fact constellation
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptx
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Map Reduce data types and formats
Map Reduce data types and formatsMap Reduce data types and formats
Map Reduce data types and formats
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Data mining
Data miningData mining
Data mining
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologies
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 

Similar to Migration of a relational database to a NoSQL store

Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessInfiniteGraph
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information quality201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information qualityPeter O'Kelly
 
Tour of Big Data
Tour of Big DataTour of Big Data
Tour of Big DataRaymond Yu
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBigDataExpo
 
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and ToolsBIOVIA
 
Msbi 2012 online training
Msbi 2012 online trainingMsbi 2012 online training
Msbi 2012 online trainingssmasters
 
Cloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia DavisCloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia DavisVMware Tanzu
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big GraphNeo4j
 
NoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarNoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarJainul Musani
 
Msbi online training
Msbi online trainingMsbi online training
Msbi online trainingssmsbi
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Effort estimation for web applications
Effort estimation for web applicationsEffort estimation for web applications
Effort estimation for web applicationsNagaraja Gundappa
 
Practical msbi(ssis, ssas,ssrs)
Practical msbi(ssis, ssas,ssrs)Practical msbi(ssis, ssas,ssrs)
Practical msbi(ssis, ssas,ssrs)ssmasters
 

Similar to Migration of a relational database to a NoSQL store (20)

Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-less
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information quality201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information quality
 
Tour of Big Data
Tour of Big DataTour of Big Data
Tour of Big Data
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
 
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
 
2013.05 - IASSIST 2013 - 2
2013.05 - IASSIST 2013 - 22013.05 - IASSIST 2013 - 2
2013.05 - IASSIST 2013 - 2
 
Zloch, Bosch, Wegener: A technical perspective...
Zloch, Bosch, Wegener: A technical perspective... Zloch, Bosch, Wegener: A technical perspective...
Zloch, Bosch, Wegener: A technical perspective...
 
Msbi 2012 online training
Msbi 2012 online trainingMsbi 2012 online training
Msbi 2012 online training
 
NoSql Brownbag
NoSql BrownbagNoSql Brownbag
NoSql Brownbag
 
Cloud-native Data
Cloud-native DataCloud-native Data
Cloud-native Data
 
Cloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia DavisCloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia Davis
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big Graph
 
NoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarNoSQL - 05March2014 Seminar
NoSQL - 05March2014 Seminar
 
Msbi online training
Msbi online trainingMsbi online training
Msbi online training
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
NoSQL
NoSQLNoSQL
NoSQL
 
Effort estimation for web applications
Effort estimation for web applicationsEffort estimation for web applications
Effort estimation for web applications
 
Practical msbi(ssis, ssas,ssrs)
Practical msbi(ssis, ssas,ssrs)Practical msbi(ssis, ssas,ssrs)
Practical msbi(ssis, ssas,ssrs)
 

Recently uploaded

Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 

Recently uploaded (20)

Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 

Migration of a relational database to a NoSQL store

  • 1. An assessment of the migration September 1st, 2017 presented by Markiyan RIZUN mrizun@gmail.com From a relational database to a NoSQL store
  • 2. Outline. Contents of the presentation 2 II – Problem & Objective III – Background I – Internship context IV – The Migration V – The Analysis & Results VI – Conclusion
  • 3. I – Internship Context 3 SOFTEAM R&D department + Modelio software
  • 4. 4 MODELIOSOFT UML Java SQL MODELIO MODELS … and provides many other features Working in SOFTEAM
  • 6. II – Problem & Objective 6 “To migrate, or not to migrate, that is the question” – William Shakespeare, Hamlet
  • 7. Problem definition 7 Migration “What really matter are the business requirements and access/write pattern of the applications.” – Chongxin Li If to migrate? How to migrate? • data • queries • structure
  • 8. Objective definition 8 1. Discover the “symptoms” of a relational database that indicate the need of migration to NoSQL? 2. Propose a guideline for the migration that would allow to generate adapted NoSQL database accordingly to the usage of a source relational database. Migration
  • 9. III – Background 9 A brief overview of the relational and NoSQL DBMSs* *DBMS – database management system
  • 10. Relational approach. Overview 10 • data is stored in interconnected relations (tables) • strictly defined database schema • introduced by E. F. Codd in 1969 • one query language (SQL) for all DBMSs • relies on principle of normalization
  • 11. Relational approach. Normalization 11 Data structure reorganization Normalization Eliminates data redundancy Consistent data
  • 12. “Normalization is far from being a panacea.” – Christopher J. Date - slow read performance Relational approach. Normalization 12 + no data duplication + consistent data + better flexibility + simplified design + saved space* *accordingly to http://www.mkomo.com/cost-per-gigabyte-update storage cost is dropping rapidly, therefore nowadays it becomes almost irrelevant - poor horizontal scalability
  • 13. + fast read performance Rapid querying and an ability to scale are critical for distributed systems. Relational approach. Denormalization 13 - data duplication - inconsistent data - worse flexibility - messy design - requires more space + better horizontal scalability
  • 14. NoSQL* approach. Overview 14 • schema-free • non-relational • distributed • horizontally scalable • no common querying language *NoSQL – Not Only SQL
  • 15. NoSQL* approach. Types 15 *NoSQL – Not Only SQL • column • key-value • document • graph • and many more …
  • 17. 2. NoSQL is not a replacement of relational approach, but an alternative. Relational & NoSQL. Conclusion 17 1. NoSQL is designed for modern large scale distributed applications that work with big volumes of unstructured data.
  • 18. IV – The Migration 18 Review of the existing migration approaches
  • 19. The migration. Definition 19 Migration • data • queries • structure
  • 20. The migration. Methods 20 • manual control over denormalization of database • preservation of an equivalent normalized database • full denormalization of database • heuristic-based approach to creation of database “What really matter are the business requirements and access/write pattern of the applications.” – Chongxin Li None of the methods considers actual database usage. - manual, without guidelines - not adapted to NoSQL data model - not adapted to specific database - simplistic, manual
  • 21. V – The Analysis & Results 21 Answering the migration questions: “If?” and “How?”
  • 22. If to migrate? Discovering the “symptoms” 22 • denormalization may be a solution • typically a relational database is normalized • denormalization is unnatural for relational design Denormalization motivates and simplifies the migration to NoSQL. Denormalization is an attempt to artificially approximate structure and characteristics of a relational database to NoSQL. - fast querying / better horizontal scaling - slow querying / poor horizontal scaling - slow writes / NULL values
  • 23. How to migrate? Defining important information 23 “What really matter are the business requirements and access/write pattern of the applications.” – Chongxin Li • dynamic information • static information - monitoring and logging of real database activity - considering indexes, views and procedures - analysis of logged information
  • 24. How to migrate? Proposing guidelines 24 • frequency of usage • execution speed • join operations - allows to avoid unnecessary remodeling - highlights important access patterns - signify real database access patterns
  • 25. Database schema remodelling decision tree based on the database access patterns
  • 26. The migration. Summary 26 • static database information analysis – potential access patterns revealed by indexes and procedures • pre-denormalized relational database – a “symptom” that signifies necessity / possibility to migrate • dynamic database information analysis – real access patterns revealed by join operations, queries’ speed & frequency If to migrate? How to migrate?
  • 27. The migration. Modelio implementation* 27 • automatic model generation – generation of Modelio database model from source database • document-oriented NoSQL store meta model – for MongoDB and Elasticsearch • automatic Java code generation – generation of Java source code from Modelio model Meta model Features *video tutorial is available at: https://youtu.be/wPDxk0YeTmw
  • 28. VI – Conclusion 28 Foundation is built, now the real work begins …
  • 29. Foundation. What do we have so far? 29 • “symptoms” of a database to migrate were discovered • hypotheses for proper migration were proposed Theoretical result • meta model of NoSQL document oriented store and related functionality* Practical result *for details, see the report;
  • 30. The real work. What will we do in the future? 30 • further developing theoretical analysis • continue studying the question of migration • confirming our hypotheses on practice • implementing the migration on practice to do on PhD
  • 31. Your attention is much appreciated. Thank you! 31

Editor's Notes

  1. The topic of my internship is “An assessment of the migration from a relational database to a NoSQL store”. In a nutshell, during the internship I was conducting the analysis of the migration techniques and searching for the possibilities to improve the migration process.
  2. The presentation is structured as follows. First, I will briefly present the company and tool that I was using for the development. In the following part, I define problem and our objective. Next, I will talk a bit about the context of my work. In particular, about relational and NoSQL database management systems. Later, I present the review of the migration techniques. Finally, we will arrive to the key part of the work – analysis and its results. In the end, I will make a short conclusion of the work as well as discuss future work.
  3. A bit about where I worked and what I was doing.
  4. I was working as an intern in the R&D department of the company called Softeam, which is located in the Parisian region. Throughout my entire work to conduct the research, I was using Modelio, which enables users to model things. For example, one can create a UML diagram, Java project model or a model of a relational database. The models can be built manually or generated automatically from source elements, for example a database.
  5. One of the most exciting events during the internship was our trip to Helsinki, Finland. Me and Andrey SADOVYKH (my supervisor) went there for the DataBio EU project meeting, in which Softeam participates. There we discussed contributions of the company to the project. Additionally, I presented my research work to the representatives of different companies. Besides the work, we had free time to see the country, which was great!
  6. In this section I will describe the problem that we are facing and define our objectives.
  7. Nowadays, many companies wish to migrate their databases from relational approach to NoSQL. I will explain these two approaches in a minute. So, the migration usually means the transformation of the database structure, transition of data and modification of queries. The migration is very costly and complex. Also, there are no rules on how the migration has to be executed. To solve this, many migration methods were proposed. However, all of them do not define the conditions that would indicate the need to migrate. So first problem is to understand “if to migrate?”. Also, most of the existing methods are simplistic, as during the structure transformation they do not attempt to adapt a relational database to principles of a chosen NoSQL. And they do not consider the actual usage of a database (e.g., queries, indexes, views) despite the fact that “What really matter are the business requirements and access/write pattern of the applications.” Therefore the second issue is to comprehend “how to migrate?” so that the database is well-adapted.
  8. Therefore, the objective of our research is to answer the two questions and as a result: discover what are the “symptoms” of a relational database that might indicate the need of migration to NoSQL; propose a guideline for the migration that would allow to generate adapted NoSQL database accordingly to the usage of the source relational database.
  9. Since I am talking about migration from a relational to a NoSQL database, I will briefly discuss these two database management approaches, starting with relational approach.
  10. relational approach was introduced almost half a century ago by Edgar Codd in 1969, its principles are well-established. Data is stored in tables that are connected by relationships. The structure of the tables as well their relationships is strictly defined and, therefore, always known in advance. Relational approach is well-known for the querying language (SQL) that is able to work with any relational system. Finally, the most interesting point for us in the context of the migration is the principle of normalization that is strongly used for all relational databases. Here are some examples of relational database management systems.
  11. The idea of normalization is to reorganize the structure of a database in order to achieve full data consistency. Normalization includes the step of table decomposition into more tables and step of the creation of new relationships between these tables. Properly normalized database eliminates any data redundancy, meaning that no data is duplicated. This allows to completely avoid any modification anomalies such as having inconsistent data for the same entity in two different tables.
  12. So to sum up, normalization give the following benefits… Nevertheless, despite the fact that normalization is a recommended approach for relational database design, it has two major flaws. As the information about single entity is scattered across many tables, it causes 1) slow read queries and 2) poor scalability (meaning over a distributed system). And two these factors are very important for the modern applications such as cloud application
  13. And of course, due to these reasons there exists an opposite principle that is called denormalization. It has completely opposite disadvantages such as… On the contrary to normalization, the read performance is much faster and horizontal scalability is improved too. And these two features are absolutely crucial in order to support distributed systems. There are many different denormalization strategies that are discussed in the report, but we will not talk about the in the presentation.
  14. Normally, with quite a few exceptions, NoSQL (meaning Not Only SQL) systems should satisfy following list. All of them are non-relational, and one could argue that this is the main difference. There is no strict structure definition, also known as being schema-free. That is why NoSQL DBs work with unstructured data and are flexible in that sense, unlike relational databases that work with highly structured data. Distributed, meaning working on clusters of machines. Therefore, they should be horizontally scalable. This means that without restructuring a database, one could easily add new node to cluster in order to handle growing number of data. Usually, they are open source. Unfortunately, there is no such language as SQL that would be common fr all systems.
  15. I will not explain in details each type, its data model or characteristics. What is very important to understand, is that NoSQL types propose different approaches to handle data, and each of them has its own strong as well as weak sides. For example, key-value type has rapid query execution because of the data model, where each key points to a value – an actual data. However, querying being really fast, is poor in terms of flexibility. There is no possibility to filter data, only search by a key. To sum up, the idea is that one has to use certain NoSQL type for suited task and data. Sometimes it may be difficult to choose which one is the best for your case, and other times it is beneficial to use a few of them for one system
  16. And some examples of NoSQL systems, as you can see there are a lot alternatives to choose from and each one is different than another. They vary in NoSQL types, and also in terms of their specific implementation features. This makes it even more problematic to select the correct system for your needs.
  17. The main idea is that NoSQL is meant for modern large scale distributed applications, that work with massive volumes of unstructured data. And of course, NoSQL is not a replacement of relational approach, but an alternative to it.
  18. Ok, so now let us move to the migration and existing approaches.
  19. Just to remind, that migration of relational databases to NoSQL means the transformation of the structure, transition of data and modification of queries and the application code.
  20. We classify existing methods into four categories. First category focuses on the preservation of relational database schema and, therefore, the database access patterns in queries and applications. Unfortunately, such technique does not adapt the structure to the principles on NoSQL. The methods of the second group take similar approach, i.e., keep the database structure unchanged, but they additionally offer a possibility to manually adjust the target database’s structure, however never propose any guidelines to do so whatsoever. Third category follows the opposite direction – full denormalization of the relational database, hence the resulting structure adheres to some of the NoSQL design principles, but does not consider specific needs of concrete database. The techniques of the last category aim to define guidelines for transformation of a relational database to a NoSQL database, however the resulting heuristics is simplistic as it resembles denormalization ideas and more importantly it is completely manual.
  21. NoSQL offers the same as denormalized RDBMS + fast writes / no Null values and many other features such as dynamic schema, automatic replication, etc.
  22. Relational database access patterns and its business rules are the definitive factors for the generation of target NoSQL database schema. In each specific case of migration, NoSQL database schema and its denormalization level has to be adapted accordingly to its planned usage. Static information (indexes, views and procedures) reveal typical, or at least, potential frequent database access patterns. Dynamic information shows actual access patterns – the real usage of a database.
  23. For the analysis step we highlight the following information from the log: execution speed, frequency of usage and join operations. We provide the analysis guidelines of this information, which allow to assess how to adapt a NoSQL document-oriented database schema. Execution speed and frequency of usage of a query are critical factors for deciding if the database schema needs to be adjusted to this query. They point out the queries that are worth paying attention to. If the query is slow and frequent, then clearly structure of a database should be modified accordingly to the access patterns. Generally, this information enables to avoid unnecessary remodelling and highlights important access patterns. Join operations signify database access patterns, therefore have critical impact on the decision making process when remodelling database schema.
  24. Algorithm… There is a high possibility that queries of a relational database exploit completely different access patterns, which leads to the situation when single structure of a database cannot be suitable for all of the queries to be efficient. Therefore, one can create several copies of the same data using different structures that are suitable for all the queries
  25. To conclude, our approach for the migration from a relational database to a NoSQL database consists of the following recommendations:   The migration should rely on pre-denormalized relational database, which is also a “symptom” that signifies the necessity to migrate. Adjust NoSQL schema accordingly to static database information (potential access patterns, indexing). Remodel if needed respectively to database actual usage (queries’ speed and frequency, real access patterns).
  26. To conclude, our approach for the migration from a relational database to a NoSQL database consists of the following recommendations:   The migration should rely on pre-denormalized relational database, which is also a “symptom” that signifies the necessity to migrate. Adjust NoSQL schema accordingly to static database information (potential access patterns, indexing). Remodel if needed respectively to database actual usage (queries’ speed and frequency, real access patterns).
  27. The theoretical results presented in this work, alongside with document-oriented database meta model implementation, will serve as a strong foundation for our future research pursuits: … To do this, we plan to obtain an access to a database that is in production and a company would like to migrate it. Such an opportunity would enable us to either support or refute our hypotheses. Moreover, an access to real database would allow us to analyse its actual usage and asses how to properly migrate it. At the moment, we are negotiating with several companies that are willing to migrate their legacy relational systems to NoSQL modern databases. We discuss the possibility to get an access to their databases within the framework of the DataBio EU project, in which SOFTEAM participates.