This presentation includes, the steps to create a database in SQL Server, Importing data to the tables, Populating data into Dimension and Fact tables using SSIS, Report generation using SSRS, Data Presentation using Tableau, and also using the Adventure works dataset to differentiate between graph database and DBMSs with AirBnB Database.
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
AIRBNB DATA WAREHOUSE & GRAPH DATABASE
1. AIRBNB DATA
WAREHOUSE &
GRAPH DATABASE
GROUP 8
Nishigandha Dhanu
(10545208)
Sagar Deogirkar
(10547321)
Lamidi Abdulrahman Taiwo
(10545249)
2. INDEX
2
▰ Introduction
▰ Data Set
▰ Star Schema
▰ Creating Database in SQL
▰ Data Import to Tables
▰ Data Import to Dimension and Fact Table Using SSIS
▰ SSRS Reports
▰ Data Visualisation Using Tableau
▰ General information on Graph database,
▰ Neo4j; data source used in its Creation, steps, queries, main differences between NOSQL and SQL.
4. INTRODUCTION
In this presentation we are going to see how a Data Warehouse is projected on a
relational database .
This include the steps to create a database in SQL Server, Importing data to the tables,
Populating data into Dimension and Fact tables using SSIS, Report generation using
SSRS, Data Presentation using Tableau, and also using the Adventure works dataset
to differentiate between graph data base and dbmss.
4
6. DATA SET
The data set that we are using for the assignment is from an online site, Airbnb. Its an online
marketplace for arranging or offering lodging, primarily homestays. The data represents
the information about the owner, the property, property location and various type of
reviews that the host received from their previous guest.
The main reason to select this data set is, it belongs to one of the most trending industry i.e.
Tourism and we as a customer have experienced such scenario where we look for
hotel/resort/dormitory in our budget with best possible ratings/reviews on different
websites and mobile apps. Plus it would be a learning experience how the data is stored
and fetched on such platforms.
So, taking Data Warehouse into account we collected the data in fact table and generated
reports accordingly taking company’s perspective in mind which will seek the host’s
performance based on reviews, facility they provide to their guest and the price. Based on
that company may take decision to extend or cancel the contract or lower their rank in
recommendation list.
6
7. DATA SET
The data is mainly contains three section.
1. Host Information – This section with the help of necessary attributes focus on the host’s
information. This includes following attributes: host_id | host_name | host_since_anniversary |
host_response_time | host_response_rate
2. Property Information - This section contains information about property information and facility
available at the site. This includes following attributes: property_id | property_type | room_type
| accommodates | bathrooms | bedrooms | beds | bed_type | price | guests_included |
extra_people | minimum_nights | neighbourhood_cleansed | city | state | zipcode | country |
latitude | longitude
3. Review - This is the most important section where the attributes showcase different types of
reviews the host: review_id | number_of_reviews | review_scores_rating |
review_scores_accuracy | review_scores_cleanliness | review_scores_checkin |
review_scores_communication | review_scores_location | review_scores_value
7
9. STAR SCHEMA
The Star Schema is a relational database schema used to
represent multidimensional data. This schema is the
simplest form of a schema which contains one or more
dimensions and fact tables. It is called a Star Schema as the
entity-relationship diagram between the fact tables and the
dimensions resemble the shape of a star.
The Star Schema for our data set is shown here. We have created
3 dimension table based on the data available and our
objective.
Host_Dim contains essential information about the host i.e.
Host’s name, Host response rate and time (to query), and
since when the host is been associated with Airbnb with
Host since anniversary attribute. And are represented by
unique Host ID of each host. 9
10. STAR SCHEMA
Property Dim contains attributes consist thorough information of the
facility available at the site, its location, room type, its price and
other necessary details which is required to the company to
make necessary decisions. It also carries host_id as a foreign
key as it contains the attribute showcasing the property
information respected to the host.
Calendar_Dim contains the attributes which represent the date from
which the host are associated with Airbnb in diff formate.
Best host fact table as mentioned earlier taking company’s
perspective in mind which will seek the host’s performance
based on types of reviews, facility they provide to their guest
and the price. It uses Host Id, Calendar key, Property Id as a
foreign key.
10
12. CREATING DATABASE IN SQL
12
Steps:
1. We created 3 tables Host, Property and Review with attributes
mentioned on slide no. 7.
Host Id is the primary key of Host table and used as foreign key on
property table. Property_ID is the primary key of Property table and
used as foreign key on review table where Review_ID is the primary
key to represent relation between tables,
14. DATA IMPORT TO TABLES
14
Steps:
We imported the data from excel sheet to SQL to save it in the
respected table.
(Right click on database Task Import data Select data
source as ‘MS Excel’ Select file Select ‘SQL Server
Native client 11.0’ as destination Select the excel sheet
and destination table Click Edit Mapping to check the
mapping Finish )
15. DATA IMPORT TO TABLES
15
If steps, code, data and data type are correct the final window will show the no of rows
transferred. We repeated the same procedure for remaining tables to import the data.
The table’s data can be observed (Right click on table Select top 1000 rows)
16. DATA IMPORT TO DIMESNSION
AND FACT TABLE USING SSIS
16
6
17. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS
Steps:
Open Visual Studio 2019 Create a new project Choose
Integration Service Project Enter project and solution
name Add ‘Data Flow’ task to the package Open Data
Flow Build a new OLE DB/ADO connection with SQL
server and desired database Test Connection
17
18. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS
For Host_Dim:
As all the values from host table goes to host_dim we
used ADO Net Source and Destination.
At destination we check the mapping to ensure data
is going to desired attribute.
Click on start to execute the import.
18
19. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS
For Property_Dim:
Here we send data from property table to Property_Dim table neglect few
attributes considering company’s interest from this Data Warehouse.
Hence choose OLE DB as source and destination.
We followed the same procedure to executed the import.
19
20. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS
For Calendar_Dim:
Here we are using Host_Since_Anniversary column data (from Host
table) to fetch the date to Calendar_Dim. Hence we used Host table
in ADO NET Source and then used Derived column
transformation editor to get the same date in different format
following the same procedure.
20
21. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS
For Best_Host_Fact:
To accumulate essential data into fact table we used SQL command in ADO NET Source editor to
fetch data from Host, Property and Review tables and selected required attributes to generate
query in query builder.
Then we used lookup transformation editor to match Host_Since_Anniversary column’s data with
full date from Calendar_Dim and checked the Calendar_Key to fetch data into Fact Table and
selected Look Match Output towards the ADO NET Destination.
21
22. DATA IMPORT TO DIMESNSION AND FACT TABLE USING SSIS
For Calendar_Dim:
We checked mapping at ADO NET Editor by selecting Best_Host_Fact_Table as destination And executed it to
transfer data into Best_Host_Fact_Table.
22
24. SSRS REPORTS
▰ Faster processing of reports on both relational and multidimensional data.
▰ Allows users to interact with information without involving IT
professionals.
▰ Allows better and more accurate Decision-making mechanism for the users
▰ Embed graphics, images to the reports. You can also integrate with external
content using SharePoint.
24
25. SSRS Reports
Steps :
▰ Using Report Server Project created new Project.
▰ Under new report project using shared data source created Shared Data Source (Host Data Source )which will be
shared under all projects.
▰ Now to start with report we added new item under report.
25
26. BEST PROPERTIES REPORT
Adding data source and Data set for created
report.
In our first report, we are showing the
number of reviews received by user for top
rated properties .
This report is created by keeping customer
as well as company in mind.
Customer can directly chose the property
type they are looking for depending the
number of reviews as these are one of the
best properties among all
Also company can work more on the
properties which are receiving less reviews
to increase its revenue and property value .
26
27. REVENUE REPORT
▰ This report is created keeping
stakeholders of company in
mind.
▰ From following report we can
see the revenue generated by
all different properties.
▰ from this report company can
make decision whether in
future they need to invest in
properties which are
generating less revenue.
27
28. PROPERTY REVEIWS
▰ This report is created keeping host as
well as customer in mind.
▰ From following report we can see
that review received by all different
properties.
▰ So depending on the reviews host can
decide whether in future which
properties they should host more.
▰ Also by checking all these reviews
customer can decide what kind of
properties they should opt for
whenever they are vacationing.
28
29. HOST REGISTERED PER YEAR
▰ This report is created keeping
company stakeholders in mind.
▰ From following report we can see
that how many host registered
each year.
▰ Depending on number of Host
registered company can make
decision whether in future they
can make business decision.
29
31. DATA VISUALISATION USING TABLEAU
▰ It is one of the best data visualization tool used in the
Business Intelligence Industry.
▰ Data analysis is very fast with Tableau .
▰ The created visualizations are in the form of dashboards
and worksheets
Steps:
After connecting to server we connected to our Database
using SQL server.
Using our Fact table we have created visualisation of
same in tableau
We are uploading data by using left join we are getting
Data from our dimension tables.
And finally after clicking on upload we are uploading all
data into tableau.
31
32. Tableau Reports
▰ Using Sheet we have created our first
report.
▰ The beside report shows the graphical
visualisation of total reviews gained
by each properties.
▰ From this graph we can say that
apartments were most preferred and
chalet was the least preferred by
customer.
▰ This report is created by using
attributes number of Reviews and
Property type .
32
33. Tableau Reports
▰ The following report is the
visualisation of all the properties who
are best rated considering all the
reviews received by customer.
▰ From the graphs we can say that
properties such as apartment , boat,
cabin hosted by Airbnb are one of the
top rated by customer.
▰ However property such as chalet are
least preferred by customer.
33
34. Tableau Reports
▰ The following graph shows the host
registered from year 2002 till 2019
year with Airbnb.
▰ From the graph we can say that year
there was dip in year 2007.
▰ However the graph remain constant by
following year.
▰ Each year we can say at least 300 host
are registering with Airbnb
considering it is one growing
marketplace for tourism.
34
35. Tableau Reports
▰ The following visualisation is the
representation of different types of Room
type preferred by customer for their stay
with Airbnb.
▰ The most reviewed Room type is entire
home/apartment as considering most
customer preferring privacy for their stay.
▰ Very few customers preferred shared room.
35
37. DATA SOURCE
▰ As directed in the question, for comparing the performance of graph database and relational
database, Adventure Works database is been used as our data source. The link the data source is
downloaded is https://github.com/Microsoft/sql-server-
samples/releases/download/adventureworks/AdventureWorks2017.bak.
▰ The Adventure Works database is been loaded from the link above.
37
38. GRAPH DATABASE
A graph database like NEO4J(which is one of the most used NOSQL application tool) is a
type of database that uses graph structures for queries with different nodes and
properties to represent and store data. The main properties of this graph database are
the nodes with the relationships that links them and also the edges(which represents
the relationships within the nodes).This also stores csv data files and a main source of
information to be worked on. Also the relationships allows this stored csv data file to
be linked together directly and retrieved with one or more query operations.
38
39. RESOURCE DESCRIPTION FRAMEWORK (RDF)
This model shows the sketch explanation of how the nodes are related to one another,
with the nodes also demonstrating the connections between them. Also gives the
highlight of what is happening .The diagram below is our RDF model for
Adventure Works database drawn using VISIO
39
40. CREATION OF MAIN USED NODES IN GRAPH DATABASE
CUSTOMER NODE :
LOAD csv WITH HEADERS FROM "file:///CustomerSales.csv" as row CREATE (c:Customer) SET c=
row{customer_id:row.CustomerID, territory:row.TerritoryID, accountNum:row.AccountNumber,
customerType:row.CustomerType, row_guid:row.rowguid,date:row.ModifiedDate} return c
CUSTOMER CONSTRAINT:
CREATE CONSTRAINT ON (c:Customer) ASSERT c.customer_id IS UNIQUE
SALES PERSON NODE:
LOAD csv WITH HEADERS FROM "file:///SalesPerson.csv" as row CREATE (sp:SalesPerson) SET sp=
row{salesperson_id:row.SalesPersonID, spterritory_id:row.TerritoryID, salesqu:row.SalesQuota,
spbonus:row.Bonus, spcomm_pst:row.CommissionPct, spsalesytd:row.SalesYTD,
spsaleslyr:row.SalesLastYear, sprow_guid:row.rowguid, spdate:row.ModifiedDate } return sp
SALESPERSON CONSTRAINT: CREATE CONSTRAINT ON (sp:SalesPerson) ASSERT sp.SalesPerson_id IS
UNIQUE 40
41. ORDER (SALES){SaleOrderHeader}:
LOAD csv WITH HEADERS FROM "file:///SalesOrderHeader.csv" as row CREATE (sh:SalesOrderHeader) SET sh=
row{salesorder_id:row.SalesOrderID, revision_n:row.RevisionNumber, order_d:row.OrderDate,
due_d:row.DueDate, ship_d:row.ShipDate,stat:row.Status, onlineorder_f:row.OnlineOrderFlag,
salesorder_n:row.SalesOrderNumber, purchaseorder_n:row.PurchaseOrderNumber,
account_n:row.AccountNumber, customer_id:row.CustomerID, contact_id:row.ContactID,
salesperson_id:row.SalesPersonID, territory:row.TerritoryID, billaddr_id:row.BillToAddressID,
shipaddr_id:row.ShipToAddressID, shipmeth_id:row.ShipMethodID, ccard_id:row.CreditCardID,
ccardA_code:row.CreditCardApprovalCode, currR_id:row.CurrencyRateID, s_total:row.SubTotal,
taxamt:row.TaxAmt, frht:row.Freight, totalD:row.TotalDue, com:row.Comment, row_guid:row.rowguid,
date:row.ModifiedDate } return sh
ORDER (SALES){SaleOrderHeader}:
CONSTRAINT: CREATE CONSTRAINT ON (sh:SalesOrderHeader) ASSERT sh.salesorder_id IS UNIQUE
42. 42
PRODUCT NODE: LOAD csv WITH HEADERS FROM "file:///Product.csv" as row CREATE (p:Product) SET p=
row{product_id:row.ProductID, pname:row.Name, pnum:row.ProductNumber, mkflag:row.MakeFlag,
fgflag:row.FinishedGoodsFlag, safetystockl:row.SafetyStockLevel, reorder_p:row.ReorderPoint,
standard_cost:row.StandardCost, plistprice:row.ListPrice, pdaysto_man:row.DaysToManufacture,
pss_date:row.SellStartDate, prow_guid:row.rowguid, pdate:row.ModifiedDate } return p
PRODUCT CONSTRAINT:
CREATE CONSTRAINT ON (p:Product) ASSERT p.product_id IS UNIQUE
ORDER DETAILS (SALES){SaleOrderDetails}:
LOAD csv WITH HEADERS FROM "file:///SalesOrderDetails.csv" as row CREATE (sd:SalesOrderDetails) SET sd=
row{salesorder_id:row.SalesOrderID, salesordD_id:row.SalesOrderDetailID,
carrierT_num:row.CarrierTrackingNumber, orderQ:row.OrderQty, prod_id:row.ProductID,
specialoffer_id:row.SpecialOfferID, uniprice:row.UnitPrice, unitpdisc:row.UnitPriceDiscount,
line_t:row.LineTotal, row_guid:row.rowguid, date:row.ModifiedDate } return sd
44. ESTABLISHMENT OF RELATIONSHIPS BETWEEN THE CREATED
NODES
CUSTOMER ORDER(SALES){SaleOrderHeader}:
MATCH(c:Customer), (sh:SalesOrderHeader) WHERE
c.customer_id=sh.customer_id CREATE (c)-
[r:purchased]- >(sh) RETURN c, sh, r
MATCH(p:SalesOrderHeader) REMOVE
p.customer_id RETURN p
The figure below shows the relationship between
customer and order i.e the customers that
purchased order:
44
45. SALESPERSON- ORDER (SALES){SaleOrderHeader}
RELATIONSHIP:
MATCH(sp:SalesPerson), (sh:SalesOrderHeader) WHERE
sp.salesperson_id=sh.salesperson_id CREATE (sp)-[r:sold]-
>(sh) RETURN sp, sh, r
MATCH(p:SalesOrderHeader) REMOVE p.salesperson_id RETURN
p
The figure shows the relationship between salesperson and order
i.e the salesperson sold order:
45
ESTABLISHMENT OF RELATIONSHIPS BETWEEN THE
CREATED NODES
46. ESTABLISHMENT OF RELATIONSHIPS BETWEEN THE
CREATED NODES
ORDER--- ORDER DETAILS-PRODUCT:
MATCH (sh:SalesOrderHeader), (sd:SalesOrderDetails),
(p:Product) WHERE sh.salesorder_id=sd.salesorder_id and
sd.prod_id=p.product_id CREATE (sh)-[r:orders
{carrierTnum:sd.carrierTnum, orderQ:sd.orderQ,
uniprice:sd.uniprice, unitpdisc:sd.unitpdisc, line_t:sd.line_t}]-
>(p) RETURN sh, p, r LIMIT 100
The figure below shows the relationship between order and
product linking by order details
46
47. INSIGHTS FROM ADVENTUREWORKS DATABASE
USING CQL CODES
This query we wrote helps the business to check
their first 20 products and first 20
products standard price.
MATCH(p:Product) RETURN p.pname,
p.standard_cost ORDER BY p.standard_cost
LIMIT 20
47
48. the query give information about the top
selling sales man
MATCH(s:SalesPerson)-[r:sold]-
>(sh:SalesOrderHeader) RETURN s,
sh, r limit 100
48
49. RELATIONAL DATABASE
This are simple referred to as data in a table ,they store
data in tables. They enforce the properties of
Atomicity, Consistency, Isolation, Durability
(ACID) and strictly on schema. RDBMSs
(Relational database management systems) makes
use of SQL to manage tables that’s are bulky.
49
50. RELATIONAL DATABASE
50
GRAPH RDBMS
1. This is in graph form This is in a tabular form
2. Grah is having a better performnce Data is normalized, meaning lots of joins,
thus affecting speed
3 This is highly scalable Cannot scale horizontally
4 Its deletes the need for an expensive
search or match computation
It is expensive with join expensive
5 Constrains can be represented usin
relationships
It depends on key constraint
6. No declarative query language Structured and organized data
7. Eventual consistency reather than
ACID property
Follows the ACID property
51. REFERANCE
Data warehouse data set download link: https://data.world/aewart/airbnb-raw-
data/workspace/file?filename=Unit_1_Project_Dataset+%281%29.csv
Data set download link for Graph Database: https://github.com/Microsoft/sql-
server-samples/releases/download/adventureworks/AdventureWorks2017.bak
51