SlideShare a Scribd company logo
1 of 52
AIRBNB DATA
WAREHOUSE &
GRAPH DATABASE
GROUP 8
Nishigandha Dhanu
(10545208)
Sagar Deogirkar
(10547321)
Lamidi Abdulrahman Taiwo
(10545249)
INDEX
2
▰ Introduction
▰ Data Set
▰ Star Schema
▰ Creating Database in SQL
▰ Data Import to Tables
▰ Data Import to Dimension and Fact Table Using SSIS
▰ SSRS Reports
▰ Data Visualisation Using Tableau
▰ General information on Graph database,
▰ Neo4j; data source used in its Creation, steps, queries, main differences between NOSQL and SQL.
INTRODUCTION
3
1
INTRODUCTION
In this presentation we are going to see how a Data Warehouse is projected on a
relational database .
This include the steps to create a database in SQL Server, Importing data to the tables,
Populating data into Dimension and Fact tables using SSIS, Report generation using
SSRS, Data Presentation using Tableau, and also using the Adventure works dataset
to differentiate between graph data base and dbmss.
4
DATA SET
5
2
DATA SET
The data set that we are using for the assignment is from an online site, Airbnb. Its an online
marketplace for arranging or offering lodging, primarily homestays. The data represents
the information about the owner, the property, property location and various type of
reviews that the host received from their previous guest.
The main reason to select this data set is, it belongs to one of the most trending industry i.e.
Tourism and we as a customer have experienced such scenario where we look for
hotel/resort/dormitory in our budget with best possible ratings/reviews on different
websites and mobile apps. Plus it would be a learning experience how the data is stored
and fetched on such platforms.
So, taking Data Warehouse into account we collected the data in fact table and generated
reports accordingly taking company’s perspective in mind which will seek the host’s
performance based on reviews, facility they provide to their guest and the price. Based on
that company may take decision to extend or cancel the contract or lower their rank in
recommendation list.
6
DATA SET
The data is mainly contains three section.
1. Host Information – This section with the help of necessary attributes focus on the host’s
information. This includes following attributes: host_id | host_name | host_since_anniversary |
host_response_time | host_response_rate
2. Property Information - This section contains information about property information and facility
available at the site. This includes following attributes: property_id | property_type | room_type
| accommodates | bathrooms | bedrooms | beds | bed_type | price | guests_included |
extra_people | minimum_nights | neighbourhood_cleansed | city | state | zipcode | country |
latitude | longitude
3. Review - This is the most important section where the attributes showcase different types of
reviews the host: review_id | number_of_reviews | review_scores_rating |
review_scores_accuracy | review_scores_cleanliness | review_scores_checkin |
review_scores_communication | review_scores_location | review_scores_value
7
STAR SCHEMA
8
3
STAR SCHEMA
The Star Schema is a relational database schema used to
represent multidimensional data. This schema is the
simplest form of a schema which contains one or more
dimensions and fact tables. It is called a Star Schema as the
entity-relationship diagram between the fact tables and the
dimensions resemble the shape of a star.
The Star Schema for our data set is shown here. We have created
3 dimension table based on the data available and our
objective.
Host_Dim contains essential information about the host i.e.
Host’s name, Host response rate and time (to query), and
since when the host is been associated with Airbnb with
Host since anniversary attribute. And are represented by
unique Host ID of each host. 9
STAR SCHEMA
Property Dim contains attributes consist thorough information of the
facility available at the site, its location, room type, its price and
other necessary details which is required to the company to
make necessary decisions. It also carries host_id as a foreign
key as it contains the attribute showcasing the property
information respected to the host.
Calendar_Dim contains the attributes which represent the date from
which the host are associated with Airbnb in diff formate.
Best host fact table as mentioned earlier taking company’s
perspective in mind which will seek the host’s performance
based on types of reviews, facility they provide to their guest
and the price. It uses Host Id, Calendar key, Property Id as a
foreign key.
10
CREATING DATABASE IN SQL
11
4
CREATING DATABASE IN SQL
12
Steps:
1. We created 3 tables Host, Property and Review with attributes
mentioned on slide no. 7.
Host Id is the primary key of Host table and used as foreign key on
property table. Property_ID is the primary key of Property table and
used as foreign key on review table where Review_ID is the primary
key to represent relation between tables,
DATA IMPORT TO TABLES
13
5
DATA IMPORT TO TABLES
14
Steps:
We imported the data from excel sheet to SQL to save it in the
respected table.
(Right click on database  Task  Import data  Select data
source as ‘MS Excel’  Select file  Select ‘SQL Server
Native client 11.0’ as destination  Select the excel sheet
and destination table  Click Edit Mapping to check the
mapping  Finish )
DATA IMPORT TO TABLES
15
If steps, code, data and data type are correct the final window will show the no of rows
transferred. We repeated the same procedure for remaining tables to import the data.
The table’s data can be observed (Right click on table  Select top 1000 rows)
DATA IMPORT TO DIMESNSION
AND FACT TABLE USING SSIS
16
6
DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS
Steps:
Open Visual Studio 2019  Create a new project  Choose
Integration Service Project  Enter project and solution
name  Add ‘Data Flow’ task to the package  Open Data
Flow  Build a new OLE DB/ADO connection with SQL
server and desired database  Test Connection
17
DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS
For Host_Dim:
As all the values from host table goes to host_dim we
used ADO Net Source and Destination.
At destination we check the mapping to ensure data
is going to desired attribute.
Click on start to execute the import.
18
DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS
For Property_Dim:
Here we send data from property table to Property_Dim table neglect few
attributes considering company’s interest from this Data Warehouse.
Hence choose OLE DB as source and destination.
We followed the same procedure to executed the import.
19
DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS
For Calendar_Dim:
Here we are using Host_Since_Anniversary column data (from Host
table) to fetch the date to Calendar_Dim. Hence we used Host table
in ADO NET Source and then used Derived column
transformation editor to get the same date in different format
following the same procedure.
20
DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS
For Best_Host_Fact:
To accumulate essential data into fact table we used SQL command in ADO NET Source editor to
fetch data from Host, Property and Review tables and selected required attributes to generate
query in query builder.
Then we used lookup transformation editor to match Host_Since_Anniversary column’s data with
full date from Calendar_Dim and checked the Calendar_Key to fetch data into Fact Table and
selected Look Match Output towards the ADO NET Destination.
21
DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS
For Calendar_Dim:
We checked mapping at ADO NET Editor by selecting Best_Host_Fact_Table as destination And executed it to
transfer data into Best_Host_Fact_Table.
22
SSRS REPORTS
23
7
SSRS REPORTS
▰ Faster processing of reports on both relational and multidimensional data.
▰ Allows users to interact with information without involving IT
professionals.
▰ Allows better and more accurate Decision-making mechanism for the users
▰ Embed graphics, images to the reports. You can also integrate with external
content using SharePoint.
24
SSRS Reports
Steps :
▰ Using Report Server Project created new Project.
▰ Under new report project using shared data source created Shared Data Source (Host Data Source )which will be
shared under all projects.
▰ Now to start with report we added new item under report.
25
BEST PROPERTIES REPORT
 Adding data source and Data set for created
report.
 In our first report, we are showing the
number of reviews received by user for top
rated properties .
 This report is created by keeping customer
as well as company in mind.
 Customer can directly chose the property
type they are looking for depending the
number of reviews as these are one of the
best properties among all
 Also company can work more on the
properties which are receiving less reviews
to increase its revenue and property value .
26
REVENUE REPORT
▰ This report is created keeping
stakeholders of company in
mind.
▰ From following report we can
see the revenue generated by
all different properties.
▰ from this report company can
make decision whether in
future they need to invest in
properties which are
generating less revenue.
27
PROPERTY REVEIWS
▰ This report is created keeping host as
well as customer in mind.
▰ From following report we can see
that review received by all different
properties.
▰ So depending on the reviews host can
decide whether in future which
properties they should host more.
▰ Also by checking all these reviews
customer can decide what kind of
properties they should opt for
whenever they are vacationing.
28
HOST REGISTERED PER YEAR
▰ This report is created keeping
company stakeholders in mind.
▰ From following report we can see
that how many host registered
each year.
▰ Depending on number of Host
registered company can make
decision whether in future they
can make business decision.
29
DATA VISUALISATION
USING TABLEAU
30
8
DATA VISUALISATION USING TABLEAU
▰ It is one of the best data visualization tool used in the
Business Intelligence Industry.
▰ Data analysis is very fast with Tableau .
▰ The created visualizations are in the form of dashboards
and worksheets
Steps:
 After connecting to server we connected to our Database
using SQL server.
 Using our Fact table we have created visualisation of
same in tableau
 We are uploading data by using left join we are getting
Data from our dimension tables.
 And finally after clicking on upload we are uploading all
data into tableau.
31
Tableau Reports
▰ Using Sheet we have created our first
report.
▰ The beside report shows the graphical
visualisation of total reviews gained
by each properties.
▰ From this graph we can say that
apartments were most preferred and
chalet was the least preferred by
customer.
▰ This report is created by using
attributes number of Reviews and
Property type .
32
Tableau Reports
▰ The following report is the
visualisation of all the properties who
are best rated considering all the
reviews received by customer.
▰ From the graphs we can say that
properties such as apartment , boat,
cabin hosted by Airbnb are one of the
top rated by customer.
▰ However property such as chalet are
least preferred by customer.
33
Tableau Reports
▰ The following graph shows the host
registered from year 2002 till 2019
year with Airbnb.
▰ From the graph we can say that year
there was dip in year 2007.
▰ However the graph remain constant by
following year.
▰ Each year we can say at least 300 host
are registering with Airbnb
considering it is one growing
marketplace for tourism.
34
Tableau Reports
▰ The following visualisation is the
representation of different types of Room
type preferred by customer for their stay
with Airbnb.
▰ The most reviewed Room type is entire
home/apartment as considering most
customer preferring privacy for their stay.
▰ Very few customers preferred shared room.
35
GRAPH DATABASE
36
9
DATA SOURCE
▰ As directed in the question, for comparing the performance of graph database and relational
database, Adventure Works database is been used as our data source. The link the data source is
downloaded is https://github.com/Microsoft/sql-server-
samples/releases/download/adventureworks/AdventureWorks2017.bak.
▰ The Adventure Works database is been loaded from the link above.
37
GRAPH DATABASE
A graph database like NEO4J(which is one of the most used NOSQL application tool) is a
type of database that uses graph structures for queries with different nodes and
properties to represent and store data. The main properties of this graph database are
the nodes with the relationships that links them and also the edges(which represents
the relationships within the nodes).This also stores csv data files and a main source of
information to be worked on. Also the relationships allows this stored csv data file to
be linked together directly and retrieved with one or more query operations.
38
RESOURCE DESCRIPTION FRAMEWORK (RDF)
This model shows the sketch explanation of how the nodes are related to one another,
with the nodes also demonstrating the connections between them. Also gives the
highlight of what is happening .The diagram below is our RDF model for
Adventure Works database drawn using VISIO
39
CREATION OF MAIN USED NODES IN GRAPH DATABASE
CUSTOMER NODE :
LOAD csv WITH HEADERS FROM "file:///CustomerSales.csv" as row CREATE (c:Customer) SET c=
row{customer_id:row.CustomerID, territory:row.TerritoryID, accountNum:row.AccountNumber,
customerType:row.CustomerType, row_guid:row.rowguid,date:row.ModifiedDate} return c
CUSTOMER CONSTRAINT:
CREATE CONSTRAINT ON (c:Customer) ASSERT c.customer_id IS UNIQUE
SALES PERSON NODE:
LOAD csv WITH HEADERS FROM "file:///SalesPerson.csv" as row CREATE (sp:SalesPerson) SET sp=
row{salesperson_id:row.SalesPersonID, spterritory_id:row.TerritoryID, salesqu:row.SalesQuota,
spbonus:row.Bonus, spcomm_pst:row.CommissionPct, spsalesytd:row.SalesYTD,
spsaleslyr:row.SalesLastYear, sprow_guid:row.rowguid, spdate:row.ModifiedDate } return sp
SALESPERSON CONSTRAINT: CREATE CONSTRAINT ON (sp:SalesPerson) ASSERT sp.SalesPerson_id IS
UNIQUE 40
ORDER (SALES){SaleOrderHeader}:
LOAD csv WITH HEADERS FROM "file:///SalesOrderHeader.csv" as row CREATE (sh:SalesOrderHeader) SET sh=
row{salesorder_id:row.SalesOrderID, revision_n:row.RevisionNumber, order_d:row.OrderDate,
due_d:row.DueDate, ship_d:row.ShipDate,stat:row.Status, onlineorder_f:row.OnlineOrderFlag,
salesorder_n:row.SalesOrderNumber, purchaseorder_n:row.PurchaseOrderNumber,
account_n:row.AccountNumber, customer_id:row.CustomerID, contact_id:row.ContactID,
salesperson_id:row.SalesPersonID, territory:row.TerritoryID, billaddr_id:row.BillToAddressID,
shipaddr_id:row.ShipToAddressID, shipmeth_id:row.ShipMethodID, ccard_id:row.CreditCardID,
ccardA_code:row.CreditCardApprovalCode, currR_id:row.CurrencyRateID, s_total:row.SubTotal,
taxamt:row.TaxAmt, frht:row.Freight, totalD:row.TotalDue, com:row.Comment, row_guid:row.rowguid,
date:row.ModifiedDate } return sh
ORDER (SALES){SaleOrderHeader}:
CONSTRAINT: CREATE CONSTRAINT ON (sh:SalesOrderHeader) ASSERT sh.salesorder_id IS UNIQUE
42
PRODUCT NODE: LOAD csv WITH HEADERS FROM "file:///Product.csv" as row CREATE (p:Product) SET p=
row{product_id:row.ProductID, pname:row.Name, pnum:row.ProductNumber, mkflag:row.MakeFlag,
fgflag:row.FinishedGoodsFlag, safetystockl:row.SafetyStockLevel, reorder_p:row.ReorderPoint,
standard_cost:row.StandardCost, plistprice:row.ListPrice, pdaysto_man:row.DaysToManufacture,
pss_date:row.SellStartDate, prow_guid:row.rowguid, pdate:row.ModifiedDate } return p
PRODUCT CONSTRAINT:
CREATE CONSTRAINT ON (p:Product) ASSERT p.product_id IS UNIQUE
ORDER DETAILS (SALES){SaleOrderDetails}:
LOAD csv WITH HEADERS FROM "file:///SalesOrderDetails.csv" as row CREATE (sd:SalesOrderDetails) SET sd=
row{salesorder_id:row.SalesOrderID, salesordD_id:row.SalesOrderDetailID,
carrierT_num:row.CarrierTrackingNumber, orderQ:row.OrderQty, prod_id:row.ProductID,
specialoffer_id:row.SpecialOfferID, uniprice:row.UnitPrice, unitpdisc:row.UnitPriceDiscount,
line_t:row.LineTotal, row_guid:row.rowguid, date:row.ModifiedDate } return sd
ESTABLISHMENT OF
RELATIONSHIPS BETWEEN THE
CREATED NODES
43
10
ESTABLISHMENT OF RELATIONSHIPS BETWEEN THE CREATED
NODES
CUSTOMER ORDER(SALES){SaleOrderHeader}:
MATCH(c:Customer), (sh:SalesOrderHeader) WHERE
c.customer_id=sh.customer_id CREATE (c)-
[r:purchased]- >(sh) RETURN c, sh, r
MATCH(p:SalesOrderHeader) REMOVE
p.customer_id RETURN p
The figure below shows the relationship between
customer and order i.e the customers that
purchased order:
44
SALESPERSON- ORDER (SALES){SaleOrderHeader}
RELATIONSHIP:
MATCH(sp:SalesPerson), (sh:SalesOrderHeader) WHERE
sp.salesperson_id=sh.salesperson_id CREATE (sp)-[r:sold]-
>(sh) RETURN sp, sh, r
MATCH(p:SalesOrderHeader) REMOVE p.salesperson_id RETURN
p
The figure shows the relationship between salesperson and order
i.e the salesperson sold order:
45
ESTABLISHMENT OF RELATIONSHIPS BETWEEN THE
CREATED NODES
ESTABLISHMENT OF RELATIONSHIPS BETWEEN THE
CREATED NODES
ORDER--- ORDER DETAILS-PRODUCT:
MATCH (sh:SalesOrderHeader), (sd:SalesOrderDetails),
(p:Product) WHERE sh.salesorder_id=sd.salesorder_id and
sd.prod_id=p.product_id CREATE (sh)-[r:orders
{carrierTnum:sd.carrierTnum, orderQ:sd.orderQ,
uniprice:sd.uniprice, unitpdisc:sd.unitpdisc, line_t:sd.line_t}]-
>(p) RETURN sh, p, r LIMIT 100
The figure below shows the relationship between order and
product linking by order details
46
INSIGHTS FROM ADVENTUREWORKS DATABASE
USING CQL CODES
This query we wrote helps the business to check
their first 20 products and first 20
products standard price.
MATCH(p:Product) RETURN p.pname,
p.standard_cost ORDER BY p.standard_cost
LIMIT 20
47
the query give information about the top
selling sales man
MATCH(s:SalesPerson)-[r:sold]-
>(sh:SalesOrderHeader) RETURN s,
sh, r limit 100
48
RELATIONAL DATABASE
This are simple referred to as data in a table ,they store
data in tables. They enforce the properties of
Atomicity, Consistency, Isolation, Durability
(ACID) and strictly on schema. RDBMSs
(Relational database management systems) makes
use of SQL to manage tables that’s are bulky.
49
RELATIONAL DATABASE
50
GRAPH RDBMS
1. This is in graph form This is in a tabular form
2. Grah is having a better performnce Data is normalized, meaning lots of joins,
thus affecting speed
3 This is highly scalable Cannot scale horizontally
4 Its deletes the need for an expensive
search or match computation
It is expensive with join expensive
5 Constrains can be represented usin
relationships
It depends on key constraint
6. No declarative query language Structured and organized data
7. Eventual consistency reather than
ACID property
Follows the ACID property
REFERANCE
Data warehouse data set download link: https://data.world/aewart/airbnb-raw-
data/workspace/file?filename=Unit_1_Project_Dataset+%281%29.csv
Data set download link for Graph Database: https://github.com/Microsoft/sql-
server-samples/releases/download/adventureworks/AdventureWorks2017.bak
51
52
THANK YOU!

More Related Content

What's hot

Essbase aso a quick reference guide part i
Essbase aso a quick reference guide part iEssbase aso a quick reference guide part i
Essbase aso a quick reference guide part i
Amit Sharma
 
Columnar Databases (1).pptx
Columnar Databases (1).pptxColumnar Databases (1).pptx
Columnar Databases (1).pptx
ssuser55cbdb
 
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdfGartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
momirlan
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Edureka!
 

What's hot (20)

Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Cybersecurity Automation with OSCAL and Neo4J
Cybersecurity Automation with OSCAL and Neo4JCybersecurity Automation with OSCAL and Neo4J
Cybersecurity Automation with OSCAL and Neo4J
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data Modeling
 
Scaling Data Quality @ Netflix
Scaling Data Quality @ NetflixScaling Data Quality @ Netflix
Scaling Data Quality @ Netflix
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Domain Driven Data: Apache Kafka® and the Data Mesh
Domain Driven Data: Apache Kafka® and the Data MeshDomain Driven Data: Apache Kafka® and the Data Mesh
Domain Driven Data: Apache Kafka® and the Data Mesh
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Essbase aso a quick reference guide part i
Essbase aso a quick reference guide part iEssbase aso a quick reference guide part i
Essbase aso a quick reference guide part i
 
Columnar Databases (1).pptx
Columnar Databases (1).pptxColumnar Databases (1).pptx
Columnar Databases (1).pptx
 
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdfGartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
 
Schedule File Transfer from SFTP to S3 with AWS Lambda
Schedule File Transfer from SFTP to S3 with AWS LambdaSchedule File Transfer from SFTP to S3 with AWS Lambda
Schedule File Transfer from SFTP to S3 with AWS Lambda
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data monetization pov
Data monetization   povData monetization   pov
Data monetization pov
 
How to Design Retail Recommendation Engines with Neo4j
How to Design Retail Recommendation Engines with Neo4jHow to Design Retail Recommendation Engines with Neo4j
How to Design Retail Recommendation Engines with Neo4j
 
SAP NetWeaver Application Server Add-On for Code Vulnerability Analysis Overview
SAP NetWeaver Application Server Add-On for Code Vulnerability Analysis OverviewSAP NetWeaver Application Server Add-On for Code Vulnerability Analysis Overview
SAP NetWeaver Application Server Add-On for Code Vulnerability Analysis Overview
 
DMP Data Management Platform
DMP Data Management PlatformDMP Data Management Platform
DMP Data Management Platform
 
SAP and Public Cloud
SAP and Public CloudSAP and Public Cloud
SAP and Public Cloud
 
SAP BW/4HANA - The Intelligent Enterprise Data Warehouse
SAP BW/4HANA - The Intelligent Enterprise Data WarehouseSAP BW/4HANA - The Intelligent Enterprise Data Warehouse
SAP BW/4HANA - The Intelligent Enterprise Data Warehouse
 
Incorporating ERP metadata in your data models
Incorporating ERP metadata in your data modelsIncorporating ERP metadata in your data models
Incorporating ERP metadata in your data models
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 

Similar to AIRBNB DATA WAREHOUSE & GRAPH DATABASE

Nitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence PortfolioNitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence Portfolio
npatel2362
 
Part 2Provider Database (MS Access)Use the project descriptio.docx
Part 2Provider Database (MS Access)Use the project descriptio.docxPart 2Provider Database (MS Access)Use the project descriptio.docx
Part 2Provider Database (MS Access)Use the project descriptio.docx
dunnramage
 
BUS 145 – Database Project – Part 2 Build and Test The.docx
BUS 145 – Database Project – Part 2 Build and Test The.docxBUS 145 – Database Project – Part 2 Build and Test The.docx
BUS 145 – Database Project – Part 2 Build and Test The.docx
RAHUL126667
 
Angular 12 CRUD Example with Web API
Angular 12 CRUD Example with Web APIAngular 12 CRUD Example with Web API
Angular 12 CRUD Example with Web API
Codingvila
 
Developing a ssrs report using a ssas data source
Developing a ssrs report using a ssas data sourceDeveloping a ssrs report using a ssas data source
Developing a ssrs report using a ssas data source
relekarsushant
 
Case Study Part 2 - Provider Database (Access) AssignmentsPart.docx
Case Study Part 2 - Provider Database (Access) AssignmentsPart.docxCase Study Part 2 - Provider Database (Access) AssignmentsPart.docx
Case Study Part 2 - Provider Database (Access) AssignmentsPart.docx
michelljubborjudd
 
Informatica complex transformation i
Informatica complex transformation iInformatica complex transformation i
Informatica complex transformation i
Amit Sharma
 
Porfolio of Setfocus work
Porfolio of Setfocus workPorfolio of Setfocus work
Porfolio of Setfocus work
KevinPSF
 
Sai Charan_Thotapalli_Internship Poster
Sai Charan_Thotapalli_Internship PosterSai Charan_Thotapalli_Internship Poster
Sai Charan_Thotapalli_Internship Poster
Sai Charan Thotapalli
 

Similar to AIRBNB DATA WAREHOUSE & GRAPH DATABASE (20)

Business Intelligence Technology Presentation
Business Intelligence Technology PresentationBusiness Intelligence Technology Presentation
Business Intelligence Technology Presentation
 
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
 
A Data Model for Listing Apartments
A Data Model for Listing ApartmentsA Data Model for Listing Apartments
A Data Model for Listing Apartments
 
Dwbi Project
Dwbi ProjectDwbi Project
Dwbi Project
 
Nitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence PortfolioNitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence Portfolio
 
Rodney Matejek Portfolio
Rodney Matejek PortfolioRodney Matejek Portfolio
Rodney Matejek Portfolio
 
Academic Team Project: Machine Learning with R
Academic Team Project: Machine Learning with RAcademic Team Project: Machine Learning with R
Academic Team Project: Machine Learning with R
 
It ready dw_day4_rev00
It ready dw_day4_rev00It ready dw_day4_rev00
It ready dw_day4_rev00
 
Nithin(1)
Nithin(1)Nithin(1)
Nithin(1)
 
Part 2Provider Database (MS Access)Use the project descriptio.docx
Part 2Provider Database (MS Access)Use the project descriptio.docxPart 2Provider Database (MS Access)Use the project descriptio.docx
Part 2Provider Database (MS Access)Use the project descriptio.docx
 
BUS 145 – Database Project – Part 2 Build and Test The.docx
BUS 145 – Database Project – Part 2 Build and Test The.docxBUS 145 – Database Project – Part 2 Build and Test The.docx
BUS 145 – Database Project – Part 2 Build and Test The.docx
 
Angular 12 CRUD Example with Web API
Angular 12 CRUD Example with Web APIAngular 12 CRUD Example with Web API
Angular 12 CRUD Example with Web API
 
Developing a ssrs report using a ssas data source
Developing a ssrs report using a ssas data sourceDeveloping a ssrs report using a ssas data source
Developing a ssrs report using a ssas data source
 
Case Study Part 2 - Provider Database (Access) AssignmentsPart.docx
Case Study Part 2 - Provider Database (Access) AssignmentsPart.docxCase Study Part 2 - Provider Database (Access) AssignmentsPart.docx
Case Study Part 2 - Provider Database (Access) AssignmentsPart.docx
 
Data Mining
Data MiningData Mining
Data Mining
 
Informatica complex transformation i
Informatica complex transformation iInformatica complex transformation i
Informatica complex transformation i
 
Top 10 excel analytic tests to minimize fraud and process risks
Top 10 excel analytic tests to minimize fraud and process risksTop 10 excel analytic tests to minimize fraud and process risks
Top 10 excel analytic tests to minimize fraud and process risks
 
Porfolio of Setfocus work
Porfolio of Setfocus workPorfolio of Setfocus work
Porfolio of Setfocus work
 
Sai Charan_Thotapalli_Internship Poster
Sai Charan_Thotapalli_Internship PosterSai Charan_Thotapalli_Internship Poster
Sai Charan_Thotapalli_Internship Poster
 
Kevin Fahy Bi Portfolio
Kevin Fahy   Bi PortfolioKevin Fahy   Bi Portfolio
Kevin Fahy Bi Portfolio
 

Recently uploaded

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Recently uploaded (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

AIRBNB DATA WAREHOUSE & GRAPH DATABASE

  • 1. AIRBNB DATA WAREHOUSE & GRAPH DATABASE GROUP 8 Nishigandha Dhanu (10545208) Sagar Deogirkar (10547321) Lamidi Abdulrahman Taiwo (10545249)
  • 2. INDEX 2 ▰ Introduction ▰ Data Set ▰ Star Schema ▰ Creating Database in SQL ▰ Data Import to Tables ▰ Data Import to Dimension and Fact Table Using SSIS ▰ SSRS Reports ▰ Data Visualisation Using Tableau ▰ General information on Graph database, ▰ Neo4j; data source used in its Creation, steps, queries, main differences between NOSQL and SQL.
  • 4. INTRODUCTION In this presentation we are going to see how a Data Warehouse is projected on a relational database . This include the steps to create a database in SQL Server, Importing data to the tables, Populating data into Dimension and Fact tables using SSIS, Report generation using SSRS, Data Presentation using Tableau, and also using the Adventure works dataset to differentiate between graph data base and dbmss. 4
  • 6. DATA SET The data set that we are using for the assignment is from an online site, Airbnb. Its an online marketplace for arranging or offering lodging, primarily homestays. The data represents the information about the owner, the property, property location and various type of reviews that the host received from their previous guest. The main reason to select this data set is, it belongs to one of the most trending industry i.e. Tourism and we as a customer have experienced such scenario where we look for hotel/resort/dormitory in our budget with best possible ratings/reviews on different websites and mobile apps. Plus it would be a learning experience how the data is stored and fetched on such platforms. So, taking Data Warehouse into account we collected the data in fact table and generated reports accordingly taking company’s perspective in mind which will seek the host’s performance based on reviews, facility they provide to their guest and the price. Based on that company may take decision to extend or cancel the contract or lower their rank in recommendation list. 6
  • 7. DATA SET The data is mainly contains three section. 1. Host Information – This section with the help of necessary attributes focus on the host’s information. This includes following attributes: host_id | host_name | host_since_anniversary | host_response_time | host_response_rate 2. Property Information - This section contains information about property information and facility available at the site. This includes following attributes: property_id | property_type | room_type | accommodates | bathrooms | bedrooms | beds | bed_type | price | guests_included | extra_people | minimum_nights | neighbourhood_cleansed | city | state | zipcode | country | latitude | longitude 3. Review - This is the most important section where the attributes showcase different types of reviews the host: review_id | number_of_reviews | review_scores_rating | review_scores_accuracy | review_scores_cleanliness | review_scores_checkin | review_scores_communication | review_scores_location | review_scores_value 7
  • 9. STAR SCHEMA The Star Schema is a relational database schema used to represent multidimensional data. This schema is the simplest form of a schema which contains one or more dimensions and fact tables. It is called a Star Schema as the entity-relationship diagram between the fact tables and the dimensions resemble the shape of a star. The Star Schema for our data set is shown here. We have created 3 dimension table based on the data available and our objective. Host_Dim contains essential information about the host i.e. Host’s name, Host response rate and time (to query), and since when the host is been associated with Airbnb with Host since anniversary attribute. And are represented by unique Host ID of each host. 9
  • 10. STAR SCHEMA Property Dim contains attributes consist thorough information of the facility available at the site, its location, room type, its price and other necessary details which is required to the company to make necessary decisions. It also carries host_id as a foreign key as it contains the attribute showcasing the property information respected to the host. Calendar_Dim contains the attributes which represent the date from which the host are associated with Airbnb in diff formate. Best host fact table as mentioned earlier taking company’s perspective in mind which will seek the host’s performance based on types of reviews, facility they provide to their guest and the price. It uses Host Id, Calendar key, Property Id as a foreign key. 10
  • 12. CREATING DATABASE IN SQL 12 Steps: 1. We created 3 tables Host, Property and Review with attributes mentioned on slide no. 7. Host Id is the primary key of Host table and used as foreign key on property table. Property_ID is the primary key of Property table and used as foreign key on review table where Review_ID is the primary key to represent relation between tables,
  • 13. DATA IMPORT TO TABLES 13 5
  • 14. DATA IMPORT TO TABLES 14 Steps: We imported the data from excel sheet to SQL to save it in the respected table. (Right click on database  Task  Import data  Select data source as ‘MS Excel’  Select file  Select ‘SQL Server Native client 11.0’ as destination  Select the excel sheet and destination table  Click Edit Mapping to check the mapping  Finish )
  • 15. DATA IMPORT TO TABLES 15 If steps, code, data and data type are correct the final window will show the no of rows transferred. We repeated the same procedure for remaining tables to import the data. The table’s data can be observed (Right click on table  Select top 1000 rows)
  • 16. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS 16 6
  • 17. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS Steps: Open Visual Studio 2019  Create a new project  Choose Integration Service Project  Enter project and solution name  Add ‘Data Flow’ task to the package  Open Data Flow  Build a new OLE DB/ADO connection with SQL server and desired database  Test Connection 17
  • 18. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS For Host_Dim: As all the values from host table goes to host_dim we used ADO Net Source and Destination. At destination we check the mapping to ensure data is going to desired attribute. Click on start to execute the import. 18
  • 19. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS For Property_Dim: Here we send data from property table to Property_Dim table neglect few attributes considering company’s interest from this Data Warehouse. Hence choose OLE DB as source and destination. We followed the same procedure to executed the import. 19
  • 20. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS For Calendar_Dim: Here we are using Host_Since_Anniversary column data (from Host table) to fetch the date to Calendar_Dim. Hence we used Host table in ADO NET Source and then used Derived column transformation editor to get the same date in different format following the same procedure. 20
  • 21. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS For Best_Host_Fact: To accumulate essential data into fact table we used SQL command in ADO NET Source editor to fetch data from Host, Property and Review tables and selected required attributes to generate query in query builder. Then we used lookup transformation editor to match Host_Since_Anniversary column’s data with full date from Calendar_Dim and checked the Calendar_Key to fetch data into Fact Table and selected Look Match Output towards the ADO NET Destination. 21
  • 22. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS For Calendar_Dim: We checked mapping at ADO NET Editor by selecting Best_Host_Fact_Table as destination And executed it to transfer data into Best_Host_Fact_Table. 22
  • 24. SSRS REPORTS ▰ Faster processing of reports on both relational and multidimensional data. ▰ Allows users to interact with information without involving IT professionals. ▰ Allows better and more accurate Decision-making mechanism for the users ▰ Embed graphics, images to the reports. You can also integrate with external content using SharePoint. 24
  • 25. SSRS Reports Steps : ▰ Using Report Server Project created new Project. ▰ Under new report project using shared data source created Shared Data Source (Host Data Source )which will be shared under all projects. ▰ Now to start with report we added new item under report. 25
  • 26. BEST PROPERTIES REPORT  Adding data source and Data set for created report.  In our first report, we are showing the number of reviews received by user for top rated properties .  This report is created by keeping customer as well as company in mind.  Customer can directly chose the property type they are looking for depending the number of reviews as these are one of the best properties among all  Also company can work more on the properties which are receiving less reviews to increase its revenue and property value . 26
  • 27. REVENUE REPORT ▰ This report is created keeping stakeholders of company in mind. ▰ From following report we can see the revenue generated by all different properties. ▰ from this report company can make decision whether in future they need to invest in properties which are generating less revenue. 27
  • 28. PROPERTY REVEIWS ▰ This report is created keeping host as well as customer in mind. ▰ From following report we can see that review received by all different properties. ▰ So depending on the reviews host can decide whether in future which properties they should host more. ▰ Also by checking all these reviews customer can decide what kind of properties they should opt for whenever they are vacationing. 28
  • 29. HOST REGISTERED PER YEAR ▰ This report is created keeping company stakeholders in mind. ▰ From following report we can see that how many host registered each year. ▰ Depending on number of Host registered company can make decision whether in future they can make business decision. 29
  • 31. DATA VISUALISATION USING TABLEAU ▰ It is one of the best data visualization tool used in the Business Intelligence Industry. ▰ Data analysis is very fast with Tableau . ▰ The created visualizations are in the form of dashboards and worksheets Steps:  After connecting to server we connected to our Database using SQL server.  Using our Fact table we have created visualisation of same in tableau  We are uploading data by using left join we are getting Data from our dimension tables.  And finally after clicking on upload we are uploading all data into tableau. 31
  • 32. Tableau Reports ▰ Using Sheet we have created our first report. ▰ The beside report shows the graphical visualisation of total reviews gained by each properties. ▰ From this graph we can say that apartments were most preferred and chalet was the least preferred by customer. ▰ This report is created by using attributes number of Reviews and Property type . 32
  • 33. Tableau Reports ▰ The following report is the visualisation of all the properties who are best rated considering all the reviews received by customer. ▰ From the graphs we can say that properties such as apartment , boat, cabin hosted by Airbnb are one of the top rated by customer. ▰ However property such as chalet are least preferred by customer. 33
  • 34. Tableau Reports ▰ The following graph shows the host registered from year 2002 till 2019 year with Airbnb. ▰ From the graph we can say that year there was dip in year 2007. ▰ However the graph remain constant by following year. ▰ Each year we can say at least 300 host are registering with Airbnb considering it is one growing marketplace for tourism. 34
  • 35. Tableau Reports ▰ The following visualisation is the representation of different types of Room type preferred by customer for their stay with Airbnb. ▰ The most reviewed Room type is entire home/apartment as considering most customer preferring privacy for their stay. ▰ Very few customers preferred shared room. 35
  • 37. DATA SOURCE ▰ As directed in the question, for comparing the performance of graph database and relational database, Adventure Works database is been used as our data source. The link the data source is downloaded is https://github.com/Microsoft/sql-server- samples/releases/download/adventureworks/AdventureWorks2017.bak. ▰ The Adventure Works database is been loaded from the link above. 37
  • 38. GRAPH DATABASE A graph database like NEO4J(which is one of the most used NOSQL application tool) is a type of database that uses graph structures for queries with different nodes and properties to represent and store data. The main properties of this graph database are the nodes with the relationships that links them and also the edges(which represents the relationships within the nodes).This also stores csv data files and a main source of information to be worked on. Also the relationships allows this stored csv data file to be linked together directly and retrieved with one or more query operations. 38
  • 39. RESOURCE DESCRIPTION FRAMEWORK (RDF) This model shows the sketch explanation of how the nodes are related to one another, with the nodes also demonstrating the connections between them. Also gives the highlight of what is happening .The diagram below is our RDF model for Adventure Works database drawn using VISIO 39
  • 40. CREATION OF MAIN USED NODES IN GRAPH DATABASE CUSTOMER NODE : LOAD csv WITH HEADERS FROM "file:///CustomerSales.csv" as row CREATE (c:Customer) SET c= row{customer_id:row.CustomerID, territory:row.TerritoryID, accountNum:row.AccountNumber, customerType:row.CustomerType, row_guid:row.rowguid,date:row.ModifiedDate} return c CUSTOMER CONSTRAINT: CREATE CONSTRAINT ON (c:Customer) ASSERT c.customer_id IS UNIQUE SALES PERSON NODE: LOAD csv WITH HEADERS FROM "file:///SalesPerson.csv" as row CREATE (sp:SalesPerson) SET sp= row{salesperson_id:row.SalesPersonID, spterritory_id:row.TerritoryID, salesqu:row.SalesQuota, spbonus:row.Bonus, spcomm_pst:row.CommissionPct, spsalesytd:row.SalesYTD, spsaleslyr:row.SalesLastYear, sprow_guid:row.rowguid, spdate:row.ModifiedDate } return sp SALESPERSON CONSTRAINT: CREATE CONSTRAINT ON (sp:SalesPerson) ASSERT sp.SalesPerson_id IS UNIQUE 40
  • 41. ORDER (SALES){SaleOrderHeader}: LOAD csv WITH HEADERS FROM "file:///SalesOrderHeader.csv" as row CREATE (sh:SalesOrderHeader) SET sh= row{salesorder_id:row.SalesOrderID, revision_n:row.RevisionNumber, order_d:row.OrderDate, due_d:row.DueDate, ship_d:row.ShipDate,stat:row.Status, onlineorder_f:row.OnlineOrderFlag, salesorder_n:row.SalesOrderNumber, purchaseorder_n:row.PurchaseOrderNumber, account_n:row.AccountNumber, customer_id:row.CustomerID, contact_id:row.ContactID, salesperson_id:row.SalesPersonID, territory:row.TerritoryID, billaddr_id:row.BillToAddressID, shipaddr_id:row.ShipToAddressID, shipmeth_id:row.ShipMethodID, ccard_id:row.CreditCardID, ccardA_code:row.CreditCardApprovalCode, currR_id:row.CurrencyRateID, s_total:row.SubTotal, taxamt:row.TaxAmt, frht:row.Freight, totalD:row.TotalDue, com:row.Comment, row_guid:row.rowguid, date:row.ModifiedDate } return sh ORDER (SALES){SaleOrderHeader}: CONSTRAINT: CREATE CONSTRAINT ON (sh:SalesOrderHeader) ASSERT sh.salesorder_id IS UNIQUE
  • 42. 42 PRODUCT NODE: LOAD csv WITH HEADERS FROM "file:///Product.csv" as row CREATE (p:Product) SET p= row{product_id:row.ProductID, pname:row.Name, pnum:row.ProductNumber, mkflag:row.MakeFlag, fgflag:row.FinishedGoodsFlag, safetystockl:row.SafetyStockLevel, reorder_p:row.ReorderPoint, standard_cost:row.StandardCost, plistprice:row.ListPrice, pdaysto_man:row.DaysToManufacture, pss_date:row.SellStartDate, prow_guid:row.rowguid, pdate:row.ModifiedDate } return p PRODUCT CONSTRAINT: CREATE CONSTRAINT ON (p:Product) ASSERT p.product_id IS UNIQUE ORDER DETAILS (SALES){SaleOrderDetails}: LOAD csv WITH HEADERS FROM "file:///SalesOrderDetails.csv" as row CREATE (sd:SalesOrderDetails) SET sd= row{salesorder_id:row.SalesOrderID, salesordD_id:row.SalesOrderDetailID, carrierT_num:row.CarrierTrackingNumber, orderQ:row.OrderQty, prod_id:row.ProductID, specialoffer_id:row.SpecialOfferID, uniprice:row.UnitPrice, unitpdisc:row.UnitPriceDiscount, line_t:row.LineTotal, row_guid:row.rowguid, date:row.ModifiedDate } return sd
  • 43. ESTABLISHMENT OF RELATIONSHIPS BETWEEN THE CREATED NODES 43 10
  • 44. ESTABLISHMENT OF RELATIONSHIPS BETWEEN THE CREATED NODES CUSTOMER ORDER(SALES){SaleOrderHeader}: MATCH(c:Customer), (sh:SalesOrderHeader) WHERE c.customer_id=sh.customer_id CREATE (c)- [r:purchased]- >(sh) RETURN c, sh, r MATCH(p:SalesOrderHeader) REMOVE p.customer_id RETURN p The figure below shows the relationship between customer and order i.e the customers that purchased order: 44
  • 45. SALESPERSON- ORDER (SALES){SaleOrderHeader} RELATIONSHIP: MATCH(sp:SalesPerson), (sh:SalesOrderHeader) WHERE sp.salesperson_id=sh.salesperson_id CREATE (sp)-[r:sold]- >(sh) RETURN sp, sh, r MATCH(p:SalesOrderHeader) REMOVE p.salesperson_id RETURN p The figure shows the relationship between salesperson and order i.e the salesperson sold order: 45 ESTABLISHMENT OF RELATIONSHIPS BETWEEN THE CREATED NODES
  • 46. ESTABLISHMENT OF RELATIONSHIPS BETWEEN THE CREATED NODES ORDER--- ORDER DETAILS-PRODUCT: MATCH (sh:SalesOrderHeader), (sd:SalesOrderDetails), (p:Product) WHERE sh.salesorder_id=sd.salesorder_id and sd.prod_id=p.product_id CREATE (sh)-[r:orders {carrierTnum:sd.carrierTnum, orderQ:sd.orderQ, uniprice:sd.uniprice, unitpdisc:sd.unitpdisc, line_t:sd.line_t}]- >(p) RETURN sh, p, r LIMIT 100 The figure below shows the relationship between order and product linking by order details 46
  • 47. INSIGHTS FROM ADVENTUREWORKS DATABASE USING CQL CODES This query we wrote helps the business to check their first 20 products and first 20 products standard price. MATCH(p:Product) RETURN p.pname, p.standard_cost ORDER BY p.standard_cost LIMIT 20 47
  • 48. the query give information about the top selling sales man MATCH(s:SalesPerson)-[r:sold]- >(sh:SalesOrderHeader) RETURN s, sh, r limit 100 48
  • 49. RELATIONAL DATABASE This are simple referred to as data in a table ,they store data in tables. They enforce the properties of Atomicity, Consistency, Isolation, Durability (ACID) and strictly on schema. RDBMSs (Relational database management systems) makes use of SQL to manage tables that’s are bulky. 49
  • 50. RELATIONAL DATABASE 50 GRAPH RDBMS 1. This is in graph form This is in a tabular form 2. Grah is having a better performnce Data is normalized, meaning lots of joins, thus affecting speed 3 This is highly scalable Cannot scale horizontally 4 Its deletes the need for an expensive search or match computation It is expensive with join expensive 5 Constrains can be represented usin relationships It depends on key constraint 6. No declarative query language Structured and organized data 7. Eventual consistency reather than ACID property Follows the ACID property
  • 51. REFERANCE Data warehouse data set download link: https://data.world/aewart/airbnb-raw- data/workspace/file?filename=Unit_1_Project_Dataset+%281%29.csv Data set download link for Graph Database: https://github.com/Microsoft/sql- server-samples/releases/download/adventureworks/AdventureWorks2017.bak 51