SlideShare a Scribd company logo
1 of 56
Download to read offline
CONTENTS
1. Introduction.................................................................... Error! Bookmark not defined.
1.2. Reasons for selecting the subject area ....................................................................3
1.3. Vision and Goals ......................................................................................................3
1.4. Key StakeHolders.....................................................................................................3
1.5. Business requirements ............................................................................................4
2. SCHEMA.......................................................................................................................4
3. ETL ..............................................................................................................................12
4. VISUALIZATIONS AND REPORTS .......................... Error! Bookmark not defined.
4.1. Visualizations .........................................................................................................18
4.2. Reports...................................................................................................................22
5. Include XML and Schema ........................................................................................28
6. Graph Databases ......................................................................................................36
6.1. Comaprison to realtional databases.................. Error! Bookmark not defined.
7. Conclusions...................................................................................................................44
8. Bibliography..................................................................................................................45
Appendix A – VISUALIZATIONS Code ......................................................................46
Appendix B – Neo 4J code...........................................................................................51
1. INTRODUCTION
Today’s World is constantly changing at faster rate. When one decides of growing a
business or organization it is really a daunting task. As , enormous efforts are always
required to grow and run business successfully. For any company to flourish, the key
factor is the Customer satisfaction. The company’s ability to move with constant
changes is always tested and such situation creates innovation to the road of success.
Every company has an eye for the higher profit, thus data becomes an integral role.
Daily huge amount of data is created that too in nanoseconds. These data which is
generated is in gigabytes and often in unstructured form also such data is growing at an
exponential rate, and it is becoming uncontrollable. Thus, with growth of data, the
fetching process and studying the data has become a tedious job, when an algorithm is
created the data pattern tends to change. In such cases the accumulation team plays a
keen role and helps in portraying the insight to the head of company, its stakeholders
and business marketing and finance unit, so that they could aid or grow as per the
doings of the company as mentioned before due to increasing data, nothing can stay
constant, hence new algorithms for structuring the data has to be built, so that the
company can emphasize the customer’s needs more effectively.
Business intelligence (BI) is a technology-driven process for analyzing data and
presenting actionable information to help executives, managers and other corporate end
users make informed business decisions. BI consist of a wide variety of tools,
applications and methodologies that enable organizations to collect data from internal
systems and external sources, prepare it for analysis, develop and run queries against
that data, and create reports, dashboards and data visualizations to make the analytical
results available to corporate decision makers, as well as operational workers.
1.2. REASONS FOR SELECTING THE SUBJECT AREA AND DATA
The Flipkart dataset is used in the following assignment is a sample dataset from
SQLSERVER which contains all huge data on . The main target of this assignment is to
create a warehouse (data mart), and submit reports and series of dashboard and well-
defined visualizations and business conditions using ETL processes.
This is a pre-crawled dataset, taken as subset of a bigger dataset (more than 5.8 million
products) that was created by extracting data from a prominent Indian e-commerce
giant Flipkart. This e-commerce dataset contains product listings.
The data has been taken from data.world.
(Source: https://data.world/promptcloud/product-details-on-flipkart-com)
1.3. VISION AND GOALS
In this project we have chosen the Flipkart dataset to represent the Product sales on
their ecommerce website, according to its Category, Brand, Discount price and Time.
Here we are visualizing how every factor in the form of dimension has a direct or
indirect impact on their Product sales.
➢ Goals
To provide better discounts such that more customers can be lured and sales is
improved.
To manage the inventory as per the requirement of the customer's.
To analyze sales of the products depending upon its category and brand for better
marketing advertisements on the website.
1.4. KEY STAKEHOLDERS
• Brand Owner
• Manufacturer
• Customer
• Logistics and Shipping
1.5. BUSINESS REQUIREMENTS
Business requirements focuses on the information needs and to work on any dataset
need to identifying and analyzing data requirements ,What kind of data to be extract
from database and generate desired report according to the business requirement .
Hence we considered below Business Requirements to frame our dataset
1) What are revenue of all brands in 2015 and 2016 ?
2) How many Brands and its product category has the product price greate than 50000?
3) Total number of brands with its Revenue ?
4) Total unit sold and revenue with its product category in both the years?
5) identify quarter wise sales made by top 10 prodcut category across the brand?
Tools
A. Data Warehousing Tools
❖ Microsoft SQL Server Management Studio (SSMS)
❖ Microsoft SQL Server Integration Services. (SSIS)
B. Reporting Tool:
❖ Microsoft SQL Server Reporting Services. (SSRS)
C. Visualization Tool:
❖ R studio.
D. Graph data visualization
❖ Neo4j
2. SCHEMA
Dimensional Model
For our dimensional model we have chosen the star schema, because it is easy to generate an
ETL process from it, also in the star schema each dimension table is directly connected to Fact
table. It is looks like a star in which Fact Table act as a pivot as it resides at the center of the
schema, while multiple Dimensions are attached to the fact table in a star like structure having
relationship with all dimensions via Foreign key. Also it contains measurable quantities; these
computational columns help us to analyze the Business profit.
Dimensions of the Data Warehouse
Dimension table made up of descriptive columns such as brand_name, category_name etc.
Each dimension has their own primary keys which defines the uniqueness of that dimension
Fact table of the Data Warehouse
Fact table contains the quantitative data, which we are going to store for our dimensions. It is
the central point of the star schema which contains all the primary keys of dimensions and the
measurable quantities.
This fact table designed in such a way that, it gives insights about revenue hierarchy like which
is one is the best-selling brand or which category has the highest revenue, as well as how to
manage the multiple products inventory in a proper way. Moreover, we can also improve sales
by advertising and marketing on the basis of unit sold with respect to their category and brand.
Data Warehouse Design and Architecture:
To do the analysis of this Ecommerce website in different aspects like how much is the Products
revenue generation according to its Category, Brand and Discounted price also amount of
product sold in month, year or on a specific season.
CREATE DATABASE FKART_DW
# Creating Brand Dimension
CREATE TABLE Brand_Dim(
Brand_Key INT NOT NULL IDENTITY PRIMARY KEY,
Brand_ID VARCHAR(10),
Brand_Name VARCHAR(50));
GO
CREATE UNIQUE INDEX B_Index ON Brand_Dim(Brand_Key,Brand_ID)
GO
# Creating Category Dimension
CREATE TABLE Category_Dim(
Category_Key INT NOT NULL IDENTITY PRIMARY KEY,
Category_ID INT,
Category_Name VARCHAR(50));
Go
CREATE UNIQUE INDEX C_Index ON Category_Dim(Category_Key,Category_ID)
GO
#Creating Calender Dimension
CREATE TABLE Calender_Dim(
Calender_Key INT NOT NULL IDENTITY PRIMARY KEY,
Calender_ID INT,
Full_Date DATE,
Day_of_Week VARCHAR(20),
Day_of_Month INT,
Month_ INT,
Quarter_ VARCHAR(10),
Year_ INT)
GO
CREATE UNIQUE INDEX D_Index ON Calender_Dim(Calender_Key,Calender_ID)
GO
#Creating Order Dimension
CREATE TABLE Order_Dim
(Order_Key INT NOT NULL IDENTITY PRIMARY KEY,
Order_ID VARCHAR(50),
Order_date Date,
URL_ NVARCHAR(255),
Category_Name VARCHAR(150),
Order_Details NVARCHAR(255),
Retail_Cost INT,
Discounted_Cost INT);
GO
CREATE UNIQUE INDEX O_Index ON Order_Dim(Order_Key,Order_ID)
#Creating Fact Dimension
CREATE TABLE Ecom_Fact (
Order_Key INT REFERENCES Order_Dim(Order_Key),
Calender_Key INT REFERENCES Calender_Dim(Calender_Key) ,
Category_Key INT REFERENCES Category_Dim(Category_Key),
Brand_Key INT REFERENCES Brand_Dim(Brand_Key),
Retail_Cost INT
Discounted_Cost INT
CategoryWise_Rev INT,
3. ETL
Making of data warehouse through ETL
In our project we use Microsoft's Sql Server Integration Services i.e. (SSIS) to load our data into
the database.
To achieve this goal we created five SSIS package, ETL is the general procedure for loading
data from one or more sources into a destination, for this we can use any source as well as
destination format like Flat file, excel file, ADO.NET etc.
In this project ETL is applied on four dimensions which are in the CSV format (source), we
extracted these data into the staging table, from staging table data is populated into the
Dimensions table and finally with the help SSIS's lookup tool (join) data is being populated into
the fact table.
This process has been explained below with the screenshots.
Overall ETL Process:
FLAT_FILE SOURCE
STAGING_AREA
d
LOADING OF DATABASE
DIMENSIONS
MAKING OF FACT_TABLE
Brand_Dim:
Brand dimension consist of Brand_id, Brand_name and Brand_key. Brand key is the primary
key in this dimension. It is generated when we loading the Brand dimension into our databse by
entering the query Brand_Key INT NOT NULL IDENTITY PRIMARY KEY and then with the help
of advanced editor we enter the sort key position equal to 1, now you might be thinking why I
generated this, as I was already having Brand_id. As the primary key should be unique, i.e none
of the value should be repeated but as the order contains purchase of product which may have
similar brand so it will repeats their id as well and that won’t make the column distinct, so to
remove this redundancy we generated Brand_key as the primary key of this dimension.
Remaing is the Brand_name and Brand_ID which contains the name of brand and id's of the
brand respectively with the help of this we can analyse which one is the highest selling brand.
moreover, we can calculate Brandwise unit sold.
Category_Dim:
Category dimension has Category_key as the primary key. Category_id contain id of the
Category similarly, Category_name contain the name of that particular category, using this we
can analyze which is the highest revenue generation category as well as we can calculate
categorywise unit sold.
Order_Dim:
Order dimension contain Order_Key as primary key. Order_id is the id of a particular order
similarly, Order_date contains the dates on which order gets executed. Then the URL which
contains order details, this dimension is helpful for seeing the particular order with the price_tag,
image etc. Finally Category_Name, Order_Details i.e Product _name its Retail_Cost and
Discounted_Cost are present into this dimension. With the help of this we can create
hierarchical view of revenue as per the categoty.
Calender_Dim:
Calender dimension contain Calender_Key as primary key. Calender_id which is created from
order_date.similarly, Full_Date which contains the dates on which order gets executed,
Day_of_Week i.e the day on which this order gets executed likewise Day_of_Month, Month,
Quarter, Year.This dimension is helpful to calculate Yearly, Monthly and Quarterly revenue.
Fact_Dim:
To analyze the insights we have created one fact table which is connected with each dimension
table via foreign key relationship.
Here we have three columns to analyze the sales on this Ecommerce website.
1) CategoryWise_Revenue - It contains the Category wise generated revenue from executed
orders.
2) Brand_Rev- It contains the Brand wise generated revenue from excecuted orders.
3) CategoryUnit_sold- It contains the Category wise unit sold.
4. REPORTS AND VIZUALISATIONS
Rstudio is used to produce Data Visualization.
The FLIPKART data warehouse (Data Mart) is connected to the Rstudio with
RODBC package..
Multiple packages like ggplot2, readr,dplyr,plotxy, ggthemes is used to produce
Visualisation.
4.1. VISUALIZATIONS
Data Visualization is the technique which we used to encoding the information from the
data In visual objects (i.e : points , line , bar , or pie chart) because an object is worth
way more than 100 of words.
In this part of our project I am going to make some quick visualization to drill down the
data from our sales data set. For data visualization I used R studio with visualization
package ggplot2 which I connected to my SQL database with RODBC package.
Key components to consider in our flipkart sales data:
➢ Product category
➢ Product name
➢ Brand name
➢ Total sales ( by category /brand)
➢ Total revenue(by category / brand)
Reason behind the visualization for business requirement:
We can generate revenue by means of category or brand to justify the
company’s profit In long term growth . It is helpful for the company to determine
which product gives them more profit or which one has more demand on sale. In
the other hand We can demonstrate the highest selling product of a specific time
lap (i.e : month , quarter or year).Which can help to maintain the product stock for
customer satisfaction.
CATEGORY WISE UNIT SOLD
1. Here in this bar graph top 10 product category are visualize. However the plots
are not similar to Each other. The graph demonstrate that jewelry is in the top
selling category list with a selling count 3521. In the other hand kitchen and tools
category are comes in minimum selling product with a count 326 and 386
respectively.
HIRARCHY OF BRAND REVENUE
2. In this graph all the information describe about brand revenue here Karacraft
brand has highest revenue followed by radiant bay. By exploring this visualization
company can prepare Their future agenda for the development.
Jewellery = 21.48
Home decor= 16.12
Automotive = 2.14
mobiles=23.15
clothing=7.12
3. This pie chart describes about the proportion of product sold in the year 2015 In
the year 2015 the highest sold product is jewelry. and the lowest one is home
decor product. With discussing about the benefits of this visualization, a company
can predict the lowest selling Product from the product list . so company can add
some clearance offer or adapt other important strategy to attract new customer
for those specific products.
4.2. REPORTS
We use SSRS as the tool for producing reports.and displaying our results on our
business question.
The below report illustrates hierarchical distribution of the brand revenue according to its Brand
name.
BrandWise_Revenue
Brand Name
Brand Rev
(Rupee)
Karatcraft 5632898
Radiant Bay 3601945
BlueStone 2796709
Durian 1480125
ARRA 973097
Rakam 774110
Jewels5 713743
Fullcutdiamond 675287
Allure Auto 663819
Shashvat Jewels 544387
Diti Jewellery 374850
GAGA 370060
WearYourShine by PCJ 324096
JacknJewel 254698
Raymond 254438
Slim 244008
DailyObjects 242640
P.N.Gadgil Jewellers 208079
Highest_Sold_Order
Brand
Name
Category Name Order date
Discounted
Cost
Product
Price
ARRA Furniture 12/31/2015 12:00:00
AM
57500 57500
51400 51400
61800 61800
53300 53300
68400 68400
65900 65900
86500 86500
Audeze Mobiles &
Accessories
3/11/2016 12:00:00 AM 116292 116292
Durian Furniture 12/31/2015 12:00:00
AM
36660 56400
47775 73500
105300 162000
70200 108000
54795 84300
47970 73800
60840 93600
55575 85500
70785 108900
141375 217500
45045 69300
132990 204600
162825 250500
35295 54300
48945 75300
NITGEN Pens & Stationery 3/20/2016 12:00:00 AM 44804 71687
36575 58520
The above report contains product category along with his order for the product price greater
than 50,000
The below report described the information about quarterly revenue generated by category in
the span of year 2015 to 2016.
Quarterly_Revenue(Cat_wise)
Category
Name
Year Quarter
Quat Wise
Rev
Automotive 2015 Q4 886717
2016 Q1 180412
Q2 120520
Baby Care 2015 Q4 219256
2016 Q1 52669
Q2 123318
Clothing 2015 Q4 290292
2016 Q1 203585
Q2 543949
Computers 2015 Q4 834349
2016 Q1 311177
Q2 47208
Furniture 2015 Q4 2592373
2016 Q1 149487
Q2 337846
Home Decor &
Festive Needs
2015 Q4 652441
2016 Q1 876731
Q2 116100
Home
Furnishing
2015 Q4 562086
2016 Q1 258211
Q2 42985
Jewellery 2015 Q4 1416970
2016 Q1 18040871
Q2 117968
Mobiles &
Accessories
2015 Q4 212633
2016 Q1 805051
Q2 32547
Tools &
Hardware
2015 Q4 14131
2016 Q1 310358
Q2 9700
Yearly_Category_Revenue
Category Name Year Yearly Revenue
Automation & Robotics 2016 17000
Automotive 2015 886717
2016 300932
Baby Care 2015 219256
2016 175987
Bags 2016 184939
Beauty and Personal Care 2015 1687
2016 176954
Cameras & Accessories 2015 72329
2016 21108
Clothing 2015 290292
2016 747534
Computers 2015 834349
2016 358385
Eyewear 2016 12253
Food & Nutrition 2016 1955
Footwear 2016 129603
Furniture 2015 2592373
2016 487333
Gaming 2016 30714
Health & Personal Care
Appliances
2016 139669
Home & Kitchen 2015 4348
2016 79113
Home Decor & Festive Needs 2015 652441
2016 992831
Home Entertainment 2015 25620
The below report described
category wise revenue generated
in the year 2015 to 2016.
CatWise_UnitSold&Revenue
Category Name
Category Wise
Rev
Category Unit sold
Jewellery 19575809 3521
Furniture 3079706 179
Home Decor & Festive Needs 1645272 859
Computers 1192734 572
Automotive 1187649 1002
Mobiles & Accessories 1050231 1097
Clothing 1037826 887
Home Furnishing 863282 699
Baby Care 395243 455
Tools & Hardware 334189 386
Kitchen & Dining 297572 362
Toys & School Supplies 211869 101
Pens & Stationery 194837 173
Bags 184939 151
Beauty and Personal Care 178641 154
Health & Personal Care
Appliances
139669 43
Footwear 129603 191
Sports & Fitness 128618 107
Cameras & Accessories 93437 72
Home Improvement 87456 78
Home & Kitchen 83461 24
Home Entertainment 56301 19
Gaming 30714 35
Watches 24628 48
2016 30681
Automation & Robotics 17000 1
Eyewear 12253 10
Pet Supplies 12194 29
Sunglasses 10911 22
Food & Nutrition 1955 1
Household Supplies 1917 4
Wearable Smart Devices 978 2
The above report is the combined report which shows the category wise revenue along with its
unit sold.
5. XML AND SCHEMA
1.a. XML of Brand Dimension
1.b XSD document of Brand Dimension
1.c Validate XML against XSD
2. a. XML Calendar Dimension.
2.b. XSD of Calendar Dimension
2.c. XML and XSD Validation
3.a XML Category Dimension
3.b. XSD Category Dimension
3.c XML and XSD Validation
4.a XML Fact Dimension
4.b XSD Of Fact Dimension
4.c XML and XSD Validation
5 a. XML Order Dimension
5.b. XSD Order Dimension
5.c. XML and XSD Validation
6. GRAPH DATABASES
• Neo4j is often called as Graph database. Graphs are structures containing
vertices (denoted as entities) and edges (denotes connections between vertices).
• Neo4j permits storing data as a key value pairs that is, its properties can have
any value as string, number or Boolean.
• Graph Database most of the time are schema less, which allows flexibility of a
document or key/value Store database. Moreover, it supports Relationships in
similar manner as that of traditional Relational Database.
• Below are the graph and code written to load the dataset in Neo4j. We have
loaded the csv file of the required tables and match them according to our
corresponding data table.
We have first created nodes which are the required table in Neo4j, create constraints
and match them to the required table.
❖ Load Brand Dimension
❖ Load Calendar Dimension
❖ Load Category Dimension
❖ Load Fact Dimension
❖ Load Order Dimension
❖ Connect Category Dimension to Fact Dimension
❖ Connect Calendar Dimension to Fact Dimension
❖ Connect Order Dimension with Fact Dimension
❖ Graph to find count of brand name
❖ Graph to find Brand name starting from R
NEO4J VS RELATIONAL DATABASE
Neo4j code to create relation (join) between Brand-Order –Fact
Match(p:Fact_table),(b:Brand),(o:Od) where p.Brand_ID=b.Brand_ID and
p.Od_ID=o.Od_ID return p,b,o
SQL Query
The above SQL query demonstrates Brand_ID ,Order_ID and its Discounted_Cost , it is
observed that output in SQL is not graphical whereas it is quite interactive and attractive
in Neo4j.
Neo4j code looks much simple and easy to understand, whereas the code for SQL is
tedious and requires time to understand on how to relate a table or column. The
output for both is very different, Neo4j looks more presentable whereas only rows are
seen in SQL.
In Neo4j we can retrieve data quickly by accessing the respective nodes, whereas in
relational database we retrieve data by accessing select query on table.
While implementing we observed that cypher query in Neo4j are easier to work with in
comparison with SQL. As, in Neo4j relationships can be formed using constraints alone
rather in relational database consist of different concepts like foreign keys, surrogate
keys etc.
7. CONCLUSIONS
• In this project, we have concentrated briefly at the order transaction on
Ecommerce Website. Then, we extracted some of the major components by
cleaning the dataset.
• We analyze the Flipkart (Ecommerce Website) order transaction data in the year
2015 to2016, and find out, hierarchy of revenue generated by the brand as well
as product category.
• We visualized how every factor in the form of dimension has a direct or indirect
impact on their Product Sales.
8. BIBLIOGRAPHY
https://www.youtube.com/watch?v=J326LIUrZM8
https://www.youtube.com/watch?v=CHYPF7jxlik
https://www.youtube.com/watch?v=9gOw3joU4a8
https://www.youtube.com/watch?v=Go3P73-KV30
https://www.youtube.com/watch?v=49fADBfcDD4
https://www.youtube.com/watch?v=hsf9tpxe_LI
https://freevideolectures.com/course/3635/microsoft-business-intelligence
https://www.cathrinewilhelmsen.net/2014/05/27/sql-server-ssis-and-biml-data-types/
https://intellipaat.com/blog/tutorial/msbi-tutorial/
APPENDIX A – VISUALIZATIONS CODE
install.packages("RODBC")
library(RODBC)
library(ggplot2)
library(dplyr)
myconn <- odbcDriverConnect(connection = "Driver={SQL
Server};server=SHREEM;database=FKARTDW;trusted_connection=yes;")
myconn
######################################################################
# To get category wise Unit Sold (Bar_Chart)
rd <- sqlQuery(myconn,"
select distinct(b.Category_Name),a.CategoryWise_Rev,a.CategoryUnit_sold from
Ecom_Fact a, Category_Dim b, Calender_Dim c
where a.Category_Key=b.Category_Key
and a.Calender_Key=c.Calender_Key
order by a.CategoryWise_Rev desc")
rd
dt<- rd %>% top_n(10)
ggplot(dt,aes(Category_Name,CategoryUnit_sold,label=CategoryUnit_sold,color="red"))
+ geom_bar(stat="identity")+ geom_text(vjust=2)
# To get the Brand wise revenue (Bar_Chart)
rvn<- sqlQuery(myconn,"select distinct(b.Brand_Name),a.Brand_Rev from Ecom_Fact
a , Brand_Dim b
where a.Brand_Key=b.Brand_Key
order by Brand_Rev desc")
rvn
bo<-rvn%>%top_n(10)
barplot(bo$Brand_Rev,main ="Brand Wise Revenue",
xlab = "revenue",ylab = "brand", names.arg=bo$Brand_Name,
col = rainbow(length(unique(bo$Brand_Name))),
legend.text = unique(bo$Brand_Name),
args.legend = list(horiz=TRUE,x="topleft"))
######################################################################
# To get category wise Unit Sold (Bar_Chart)
tu<-sqlQuery(myconn,"select
distinct(b.Category_Name),a.CategoryWise_Rev,a.CategoryUnit_sold from Ecom_Fact
a, Category_Dim b, Calender_Dim c
where a.Category_Key=b.Category_Key
and a.Calender_Key=c.Calender_Key
order by a.CategoryWise_Rev desc")
tu
to<- tu%>%select(Category_Name,CategoryUnit_sold)%>%top_n(5)
to
slices<- c(to$CategoryUnit_sold)
lbls<-c(to$Category_Name)
pct<- round(slices/sum(slices)*100)
lbls<- paste(lbls,pct)
lbls <- paste(lbls,"%",sep = "")
pie(slices,labels = lbls,col = rainbow(length(lbls)),
main="pie chart of unit sold")
######################################################################
SSRS Quries
# Brand wise revenue in the year 2015 and 2016
select distinct(b.Brand_Name),a.Brand_Rev from Ecom_Fact a , Brand_Dim b
where a.Brand_Key=b.Brand_Key
order by Brand_Rev desc
# Total Unit sold and Revenue for all Categories in 2015 and 2016
select distinct(b.Category_Name),a.CategoryWise_Rev,a.CategoryUnit_sold from
Ecom_Fact a, Category_Dim b, Calender_Dim c
where a.Category_Key=b.Category_Key
and a.Calender_Key=c.Calender_Key
order by a.CategoryWise_Rev desc
#Year wise Category Revenue
select b.Category_Name,c.Year_,SUM(a.Discounted_Cost) yearWise_CatRev from
Ecom_Fact a, Category_Dim b, Calender_Dim c
where a.Category_Key=b.Category_Key
and a.Calender_Key=c.Calender_Key
group by b.Category_Name,c.Year_
order by b.Category_Name
##Weeek wise revenue of all Categories in the Year 2015 and 2016
select c.Category_Name,b.Year_,b.Day_of_Week,SUM(a.Discounted_Cost) AS
WeekWise_JewlREv from Ecom_Fact a, Calender_Dim b, Category_Dim c
where a.Calender_Key=b.Calender_Key
and a.Category_Key=c.Category_Key
and c.Category_ID in( select Category_ID from Calender_Dim )
group by c.Category_Name,b.Year_,b.Day_of_Week
order by c.Category_Name desc
# Quarter_ wise Revenue of top_10 Category in the Yr 2015 and 2016
select c.Category_Name,b.Year_,b.Quarter_,SUM(a.Discounted_Cost) AS
QuatWise_Rev from Ecom_Fact a, Calender_Dim b, Category_Dim c
where a.Calender_Key=b.Calender_Key
and a.Category_Key=c.Category_Key
and c.Category_ID in
(10015,10001,10006,10020,10007,10012,10004,10010,10011,10008)
group by c.Category_Name,b.Year_,b.Quarter_
order by c.Category_Name desc
Highest_Sold_Order
# which Brand has got the order for Product whose retail cost is more than 50000
select b.Brand_Name,c.Category_Name,c.Order_date,c.Order_Details,
c.Discounted_Cost ,MAX(a.Retail_Cost) AS Product_Price
from Ecom_Fact a, Brand_Dim b, Order_Dim c
where a.Brand_Key=b.Brand_Key
and a.Order_Key=c.Order_Key
group by
b.Brand_Name,c.Category_Name,c.Order_date,c.Order_Details,c.Discounted_Cost
having MAX(a.Retail_Cost) > 50000
order by Product_Price Desc
APPENDIX B – NEO 4J CODE
Queries for Neo4j
❖ Load Brand Dimension:
LOAD CSV WITH HEADERS FROM "file:///Brand_dim.csv" as row create(b:Brand) set
b=row{Brand_ID:row.Brand_ID,Brand_Name:row.Brand_Name} return b
Create CONSTRAINT on (b:brand) Assert b.Brand_ID IS UNIQUE
❖ Load Calendar Dimension
LOAD CSV WITH HEADERS FROM "file:///Calender_dim.csv" as row create(d:Date)
set
d=row{Calender_ID:row.Calender_ID,Full_Date:row.Full_Date,Day_of_Week:row.Day_
of_Week,Month_:row.Month_,Quarter_:row.Quarter_,Year_:row.Year_} return d
Create CONSTRAINT on (d:Date) Assert d.Calender_ID IS UNIQUE
❖ Load category Dimension
LOAD CSV WITH HEADERS FROM "file:///Category_dim.csv" as row
create(c:Category) set
c=row{Category_ID:row.Category_ID,Category_Name:row.Category_Name}return c
Create CONSTRAINT on (c:Category) Assert c.Category_ID IS UNIQUE
❖ Load Fact Dimension
LOAD CSV WITH HEADERS FROM "file:///Fact_dim.csv" as row CREATE(f:Fact_table)
SET
f=row{Od_ID:row.Od_ID,Calender_ID:row.Calender_ID,Category_ID:row.Category_ID,
Brand_ID:row.Brand_ID,Retail_Cost:row.Retail_Cost,Discounted_Cost:row.Discounted
_Cost,Brand_Rev:row.Brand_Rev,CategoryUnit_sold:row.CategoryUnit_sold}
❖ Load Order Dimension
LOAD CSV WITH HEADERS FROM "file:///Order_dim.csv" as row create(o:Od) set
o=row{Od_ID:row.Od_ID,
Od_date:row.Od_date,URL_:row.URL_,Category_Name:row.Category_Name,Od_Detai
ls:row.Od_Details,Retail_Cost:row.Retail_Cost,Discounted_Cost:row.Discounted_Cost}
return o
Create CONSTRAINT on (o:Od) Assert o.Od_ID IS UNIQUE
Relationship queries
❖ Connect Category Dimension to Fact Dimension
match(s:Category),(p:Fact_table) where s.Category_ID=p.Category_ID create(s)-
[r:Category_Name]- >(p) return s,p,r
❖ Connect Calendar Dimension to Fact Dimension
match(s:Calender),(p:Fact_table) where s.Calender_ID=p.Calender_ID create(s)-
[r:Quarter_]- >(p) return s,p,r
❖ Connect Order Dimension with Fact Dimension
match(s:Od),(p:Fact_table) where s.Od_ID=p.Od_ID create(s)- [r:Retail_Cost]- >(p)
return s,p,r
❖ Query to find count of brand name
match(n:Brand) Return count(n.Brand_Name) return n
❖ Query to find brand_name whose name starts with R.
match (c:Brand) where c.Brand_Name starts with "R" return c
❖ Interconnect query connect Order_Brand_Fact
Match(p:Fact_table),(b:Brand),(o:Od) where p.Brand_ID=b.Brand_ID and
p.Od_ID=o.Od_ID return p,b,o
Data warehousing
Data warehousing
Data warehousing

More Related Content

Similar to Data warehousing

Rick Watkins Power Point presentation
Rick Watkins Power Point presentationRick Watkins Power Point presentation
Rick Watkins Power Point presentation
rickwatkins
 
Rick Watkins Power Point Presentation on Automation efficiencies
Rick Watkins Power Point Presentation on Automation efficienciesRick Watkins Power Point Presentation on Automation efficiencies
Rick Watkins Power Point Presentation on Automation efficiencies
rickwatkins
 
Power Point Presentation
Power Point PresentationPower Point Presentation
Power Point Presentation
rickwatkins
 
Rick Watkins Powerpoint presentation
Rick Watkins Powerpoint presentationRick Watkins Powerpoint presentation
Rick Watkins Powerpoint presentation
rickwatkins
 
Team 8 Business Plan
Team 8 Business PlanTeam 8 Business Plan
Team 8 Business Plan
Emma Morgan
 
Integrated Analytical Model
Integrated Analytical ModelIntegrated Analytical Model
Integrated Analytical Model
Igor Panivko
 
Rick Watkins Power Point Resume
Rick Watkins Power Point ResumeRick Watkins Power Point Resume
Rick Watkins Power Point Resume
rickwatkins
 
The Business Plan Framework
The Business Plan FrameworkThe Business Plan Framework
The Business Plan Framework
Syaiful Rizal
 
SALES_FORECASTING of sparkflows.pdf
SALES_FORECASTING of sparkflows.pdfSALES_FORECASTING of sparkflows.pdf
SALES_FORECASTING of sparkflows.pdf
Sparkflows
 

Similar to Data warehousing (20)

Business Intelligence Challenges 2009
Business Intelligence Challenges 2009Business Intelligence Challenges 2009
Business Intelligence Challenges 2009
 
Rick Watkins Power Point presentation
Rick Watkins Power Point presentationRick Watkins Power Point presentation
Rick Watkins Power Point presentation
 
Rick Watkins Power Point Presentation on Automation efficiencies
Rick Watkins Power Point Presentation on Automation efficienciesRick Watkins Power Point Presentation on Automation efficiencies
Rick Watkins Power Point Presentation on Automation efficiencies
 
Power Point Presentation
Power Point PresentationPower Point Presentation
Power Point Presentation
 
Rick Watkins Resume
Rick Watkins ResumeRick Watkins Resume
Rick Watkins Resume
 
Rick Watkins Powerpoint presentation
Rick Watkins Powerpoint presentationRick Watkins Powerpoint presentation
Rick Watkins Powerpoint presentation
 
Content Marketing Analytics - What you should really be doing... and probably...
Content Marketing Analytics - What you should really be doing... and probably...Content Marketing Analytics - What you should really be doing... and probably...
Content Marketing Analytics - What you should really be doing... and probably...
 
Content marketing analytics: what you should really be doing
Content marketing analytics: what you should really be doingContent marketing analytics: what you should really be doing
Content marketing analytics: what you should really be doing
 
Team 8 Business Plan
Team 8 Business PlanTeam 8 Business Plan
Team 8 Business Plan
 
Integrated Analytical Model
Integrated Analytical ModelIntegrated Analytical Model
Integrated Analytical Model
 
Content marketing analytics: how to make your data work harder for your business
Content marketing analytics: how to make your data work harder for your businessContent marketing analytics: how to make your data work harder for your business
Content marketing analytics: how to make your data work harder for your business
 
Rick Watkins Power Point Resume
Rick Watkins Power Point ResumeRick Watkins Power Point Resume
Rick Watkins Power Point Resume
 
Do you know the real story your data is telling you?
Do you know the real story your data is telling you?Do you know the real story your data is telling you?
Do you know the real story your data is telling you?
 
The Business Plan Framework
The Business Plan FrameworkThe Business Plan Framework
The Business Plan Framework
 
Assignment johnson
Assignment johnsonAssignment johnson
Assignment johnson
 
SALES_FORECASTING of sparkflows.pdf
SALES_FORECASTING of sparkflows.pdfSALES_FORECASTING of sparkflows.pdf
SALES_FORECASTING of sparkflows.pdf
 
BI in FMCG
BI in FMCGBI in FMCG
BI in FMCG
 
Widget Fq Contact Research
Widget Fq Contact ResearchWidget Fq Contact Research
Widget Fq Contact Research
 
Competitive Analysis & Intelligence
Competitive Analysis & IntelligenceCompetitive Analysis & Intelligence
Competitive Analysis & Intelligence
 
Aligned Supply Chain Metrics
Aligned Supply Chain MetricsAligned Supply Chain Metrics
Aligned Supply Chain Metrics
 

Recently uploaded

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Recently uploaded (20)

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 

Data warehousing

  • 1. CONTENTS 1. Introduction.................................................................... Error! Bookmark not defined. 1.2. Reasons for selecting the subject area ....................................................................3 1.3. Vision and Goals ......................................................................................................3 1.4. Key StakeHolders.....................................................................................................3 1.5. Business requirements ............................................................................................4 2. SCHEMA.......................................................................................................................4 3. ETL ..............................................................................................................................12 4. VISUALIZATIONS AND REPORTS .......................... Error! Bookmark not defined. 4.1. Visualizations .........................................................................................................18 4.2. Reports...................................................................................................................22 5. Include XML and Schema ........................................................................................28 6. Graph Databases ......................................................................................................36 6.1. Comaprison to realtional databases.................. Error! Bookmark not defined. 7. Conclusions...................................................................................................................44 8. Bibliography..................................................................................................................45 Appendix A – VISUALIZATIONS Code ......................................................................46 Appendix B – Neo 4J code...........................................................................................51
  • 2. 1. INTRODUCTION Today’s World is constantly changing at faster rate. When one decides of growing a business or organization it is really a daunting task. As , enormous efforts are always required to grow and run business successfully. For any company to flourish, the key factor is the Customer satisfaction. The company’s ability to move with constant changes is always tested and such situation creates innovation to the road of success. Every company has an eye for the higher profit, thus data becomes an integral role. Daily huge amount of data is created that too in nanoseconds. These data which is generated is in gigabytes and often in unstructured form also such data is growing at an exponential rate, and it is becoming uncontrollable. Thus, with growth of data, the fetching process and studying the data has become a tedious job, when an algorithm is created the data pattern tends to change. In such cases the accumulation team plays a keen role and helps in portraying the insight to the head of company, its stakeholders and business marketing and finance unit, so that they could aid or grow as per the doings of the company as mentioned before due to increasing data, nothing can stay constant, hence new algorithms for structuring the data has to be built, so that the company can emphasize the customer’s needs more effectively. Business intelligence (BI) is a technology-driven process for analyzing data and presenting actionable information to help executives, managers and other corporate end users make informed business decisions. BI consist of a wide variety of tools, applications and methodologies that enable organizations to collect data from internal systems and external sources, prepare it for analysis, develop and run queries against that data, and create reports, dashboards and data visualizations to make the analytical results available to corporate decision makers, as well as operational workers.
  • 3. 1.2. REASONS FOR SELECTING THE SUBJECT AREA AND DATA The Flipkart dataset is used in the following assignment is a sample dataset from SQLSERVER which contains all huge data on . The main target of this assignment is to create a warehouse (data mart), and submit reports and series of dashboard and well- defined visualizations and business conditions using ETL processes. This is a pre-crawled dataset, taken as subset of a bigger dataset (more than 5.8 million products) that was created by extracting data from a prominent Indian e-commerce giant Flipkart. This e-commerce dataset contains product listings. The data has been taken from data.world. (Source: https://data.world/promptcloud/product-details-on-flipkart-com) 1.3. VISION AND GOALS In this project we have chosen the Flipkart dataset to represent the Product sales on their ecommerce website, according to its Category, Brand, Discount price and Time. Here we are visualizing how every factor in the form of dimension has a direct or indirect impact on their Product sales. ➢ Goals To provide better discounts such that more customers can be lured and sales is improved. To manage the inventory as per the requirement of the customer's. To analyze sales of the products depending upon its category and brand for better marketing advertisements on the website. 1.4. KEY STAKEHOLDERS • Brand Owner • Manufacturer • Customer • Logistics and Shipping
  • 4. 1.5. BUSINESS REQUIREMENTS Business requirements focuses on the information needs and to work on any dataset need to identifying and analyzing data requirements ,What kind of data to be extract from database and generate desired report according to the business requirement . Hence we considered below Business Requirements to frame our dataset 1) What are revenue of all brands in 2015 and 2016 ? 2) How many Brands and its product category has the product price greate than 50000? 3) Total number of brands with its Revenue ? 4) Total unit sold and revenue with its product category in both the years? 5) identify quarter wise sales made by top 10 prodcut category across the brand?
  • 5. Tools A. Data Warehousing Tools ❖ Microsoft SQL Server Management Studio (SSMS) ❖ Microsoft SQL Server Integration Services. (SSIS) B. Reporting Tool: ❖ Microsoft SQL Server Reporting Services. (SSRS) C. Visualization Tool: ❖ R studio. D. Graph data visualization ❖ Neo4j
  • 6. 2. SCHEMA Dimensional Model For our dimensional model we have chosen the star schema, because it is easy to generate an ETL process from it, also in the star schema each dimension table is directly connected to Fact table. It is looks like a star in which Fact Table act as a pivot as it resides at the center of the schema, while multiple Dimensions are attached to the fact table in a star like structure having relationship with all dimensions via Foreign key. Also it contains measurable quantities; these computational columns help us to analyze the Business profit.
  • 7. Dimensions of the Data Warehouse Dimension table made up of descriptive columns such as brand_name, category_name etc. Each dimension has their own primary keys which defines the uniqueness of that dimension Fact table of the Data Warehouse Fact table contains the quantitative data, which we are going to store for our dimensions. It is the central point of the star schema which contains all the primary keys of dimensions and the measurable quantities. This fact table designed in such a way that, it gives insights about revenue hierarchy like which is one is the best-selling brand or which category has the highest revenue, as well as how to manage the multiple products inventory in a proper way. Moreover, we can also improve sales by advertising and marketing on the basis of unit sold with respect to their category and brand. Data Warehouse Design and Architecture: To do the analysis of this Ecommerce website in different aspects like how much is the Products revenue generation according to its Category, Brand and Discounted price also amount of product sold in month, year or on a specific season. CREATE DATABASE FKART_DW # Creating Brand Dimension CREATE TABLE Brand_Dim( Brand_Key INT NOT NULL IDENTITY PRIMARY KEY, Brand_ID VARCHAR(10), Brand_Name VARCHAR(50)); GO CREATE UNIQUE INDEX B_Index ON Brand_Dim(Brand_Key,Brand_ID) GO
  • 8. # Creating Category Dimension CREATE TABLE Category_Dim( Category_Key INT NOT NULL IDENTITY PRIMARY KEY, Category_ID INT, Category_Name VARCHAR(50)); Go CREATE UNIQUE INDEX C_Index ON Category_Dim(Category_Key,Category_ID) GO #Creating Calender Dimension CREATE TABLE Calender_Dim( Calender_Key INT NOT NULL IDENTITY PRIMARY KEY, Calender_ID INT, Full_Date DATE, Day_of_Week VARCHAR(20), Day_of_Month INT, Month_ INT, Quarter_ VARCHAR(10), Year_ INT) GO CREATE UNIQUE INDEX D_Index ON Calender_Dim(Calender_Key,Calender_ID) GO
  • 9. #Creating Order Dimension CREATE TABLE Order_Dim (Order_Key INT NOT NULL IDENTITY PRIMARY KEY, Order_ID VARCHAR(50), Order_date Date, URL_ NVARCHAR(255), Category_Name VARCHAR(150), Order_Details NVARCHAR(255), Retail_Cost INT, Discounted_Cost INT); GO CREATE UNIQUE INDEX O_Index ON Order_Dim(Order_Key,Order_ID) #Creating Fact Dimension CREATE TABLE Ecom_Fact ( Order_Key INT REFERENCES Order_Dim(Order_Key), Calender_Key INT REFERENCES Calender_Dim(Calender_Key) , Category_Key INT REFERENCES Category_Dim(Category_Key), Brand_Key INT REFERENCES Brand_Dim(Brand_Key), Retail_Cost INT Discounted_Cost INT CategoryWise_Rev INT,
  • 10.
  • 11.
  • 12. 3. ETL Making of data warehouse through ETL In our project we use Microsoft's Sql Server Integration Services i.e. (SSIS) to load our data into the database. To achieve this goal we created five SSIS package, ETL is the general procedure for loading data from one or more sources into a destination, for this we can use any source as well as destination format like Flat file, excel file, ADO.NET etc. In this project ETL is applied on four dimensions which are in the CSV format (source), we extracted these data into the staging table, from staging table data is populated into the Dimensions table and finally with the help SSIS's lookup tool (join) data is being populated into the fact table. This process has been explained below with the screenshots. Overall ETL Process: FLAT_FILE SOURCE STAGING_AREA d LOADING OF DATABASE DIMENSIONS MAKING OF FACT_TABLE
  • 13. Brand_Dim: Brand dimension consist of Brand_id, Brand_name and Brand_key. Brand key is the primary key in this dimension. It is generated when we loading the Brand dimension into our databse by entering the query Brand_Key INT NOT NULL IDENTITY PRIMARY KEY and then with the help of advanced editor we enter the sort key position equal to 1, now you might be thinking why I generated this, as I was already having Brand_id. As the primary key should be unique, i.e none of the value should be repeated but as the order contains purchase of product which may have similar brand so it will repeats their id as well and that won’t make the column distinct, so to remove this redundancy we generated Brand_key as the primary key of this dimension. Remaing is the Brand_name and Brand_ID which contains the name of brand and id's of the brand respectively with the help of this we can analyse which one is the highest selling brand. moreover, we can calculate Brandwise unit sold.
  • 14. Category_Dim: Category dimension has Category_key as the primary key. Category_id contain id of the Category similarly, Category_name contain the name of that particular category, using this we can analyze which is the highest revenue generation category as well as we can calculate categorywise unit sold.
  • 15. Order_Dim: Order dimension contain Order_Key as primary key. Order_id is the id of a particular order similarly, Order_date contains the dates on which order gets executed. Then the URL which contains order details, this dimension is helpful for seeing the particular order with the price_tag, image etc. Finally Category_Name, Order_Details i.e Product _name its Retail_Cost and Discounted_Cost are present into this dimension. With the help of this we can create hierarchical view of revenue as per the categoty.
  • 16. Calender_Dim: Calender dimension contain Calender_Key as primary key. Calender_id which is created from order_date.similarly, Full_Date which contains the dates on which order gets executed, Day_of_Week i.e the day on which this order gets executed likewise Day_of_Month, Month, Quarter, Year.This dimension is helpful to calculate Yearly, Monthly and Quarterly revenue.
  • 17. Fact_Dim: To analyze the insights we have created one fact table which is connected with each dimension table via foreign key relationship. Here we have three columns to analyze the sales on this Ecommerce website. 1) CategoryWise_Revenue - It contains the Category wise generated revenue from executed orders. 2) Brand_Rev- It contains the Brand wise generated revenue from excecuted orders. 3) CategoryUnit_sold- It contains the Category wise unit sold.
  • 18. 4. REPORTS AND VIZUALISATIONS Rstudio is used to produce Data Visualization. The FLIPKART data warehouse (Data Mart) is connected to the Rstudio with RODBC package.. Multiple packages like ggplot2, readr,dplyr,plotxy, ggthemes is used to produce Visualisation. 4.1. VISUALIZATIONS Data Visualization is the technique which we used to encoding the information from the data In visual objects (i.e : points , line , bar , or pie chart) because an object is worth way more than 100 of words. In this part of our project I am going to make some quick visualization to drill down the data from our sales data set. For data visualization I used R studio with visualization package ggplot2 which I connected to my SQL database with RODBC package. Key components to consider in our flipkart sales data: ➢ Product category ➢ Product name ➢ Brand name ➢ Total sales ( by category /brand) ➢ Total revenue(by category / brand) Reason behind the visualization for business requirement: We can generate revenue by means of category or brand to justify the company’s profit In long term growth . It is helpful for the company to determine which product gives them more profit or which one has more demand on sale. In the other hand We can demonstrate the highest selling product of a specific time lap (i.e : month , quarter or year).Which can help to maintain the product stock for customer satisfaction.
  • 19. CATEGORY WISE UNIT SOLD 1. Here in this bar graph top 10 product category are visualize. However the plots are not similar to Each other. The graph demonstrate that jewelry is in the top selling category list with a selling count 3521. In the other hand kitchen and tools category are comes in minimum selling product with a count 326 and 386 respectively.
  • 20. HIRARCHY OF BRAND REVENUE 2. In this graph all the information describe about brand revenue here Karacraft brand has highest revenue followed by radiant bay. By exploring this visualization company can prepare Their future agenda for the development.
  • 21. Jewellery = 21.48 Home decor= 16.12 Automotive = 2.14 mobiles=23.15 clothing=7.12 3. This pie chart describes about the proportion of product sold in the year 2015 In the year 2015 the highest sold product is jewelry. and the lowest one is home decor product. With discussing about the benefits of this visualization, a company can predict the lowest selling Product from the product list . so company can add some clearance offer or adapt other important strategy to attract new customer for those specific products.
  • 22. 4.2. REPORTS We use SSRS as the tool for producing reports.and displaying our results on our business question. The below report illustrates hierarchical distribution of the brand revenue according to its Brand name. BrandWise_Revenue Brand Name Brand Rev (Rupee) Karatcraft 5632898 Radiant Bay 3601945 BlueStone 2796709 Durian 1480125 ARRA 973097 Rakam 774110 Jewels5 713743 Fullcutdiamond 675287 Allure Auto 663819 Shashvat Jewels 544387 Diti Jewellery 374850 GAGA 370060 WearYourShine by PCJ 324096 JacknJewel 254698 Raymond 254438 Slim 244008 DailyObjects 242640 P.N.Gadgil Jewellers 208079
  • 23. Highest_Sold_Order Brand Name Category Name Order date Discounted Cost Product Price ARRA Furniture 12/31/2015 12:00:00 AM 57500 57500 51400 51400 61800 61800 53300 53300 68400 68400 65900 65900 86500 86500 Audeze Mobiles & Accessories 3/11/2016 12:00:00 AM 116292 116292 Durian Furniture 12/31/2015 12:00:00 AM 36660 56400 47775 73500 105300 162000 70200 108000 54795 84300 47970 73800 60840 93600 55575 85500 70785 108900 141375 217500 45045 69300 132990 204600 162825 250500 35295 54300 48945 75300 NITGEN Pens & Stationery 3/20/2016 12:00:00 AM 44804 71687 36575 58520 The above report contains product category along with his order for the product price greater than 50,000
  • 24. The below report described the information about quarterly revenue generated by category in the span of year 2015 to 2016. Quarterly_Revenue(Cat_wise) Category Name Year Quarter Quat Wise Rev Automotive 2015 Q4 886717 2016 Q1 180412 Q2 120520 Baby Care 2015 Q4 219256 2016 Q1 52669 Q2 123318 Clothing 2015 Q4 290292 2016 Q1 203585 Q2 543949 Computers 2015 Q4 834349 2016 Q1 311177 Q2 47208 Furniture 2015 Q4 2592373 2016 Q1 149487 Q2 337846 Home Decor & Festive Needs 2015 Q4 652441 2016 Q1 876731 Q2 116100 Home Furnishing 2015 Q4 562086 2016 Q1 258211 Q2 42985 Jewellery 2015 Q4 1416970 2016 Q1 18040871 Q2 117968 Mobiles & Accessories 2015 Q4 212633 2016 Q1 805051 Q2 32547 Tools & Hardware 2015 Q4 14131 2016 Q1 310358 Q2 9700
  • 25. Yearly_Category_Revenue Category Name Year Yearly Revenue Automation & Robotics 2016 17000 Automotive 2015 886717 2016 300932 Baby Care 2015 219256 2016 175987 Bags 2016 184939 Beauty and Personal Care 2015 1687 2016 176954 Cameras & Accessories 2015 72329 2016 21108 Clothing 2015 290292 2016 747534 Computers 2015 834349 2016 358385 Eyewear 2016 12253 Food & Nutrition 2016 1955 Footwear 2016 129603 Furniture 2015 2592373 2016 487333 Gaming 2016 30714 Health & Personal Care Appliances 2016 139669 Home & Kitchen 2015 4348 2016 79113 Home Decor & Festive Needs 2015 652441 2016 992831 Home Entertainment 2015 25620
  • 26. The below report described category wise revenue generated in the year 2015 to 2016. CatWise_UnitSold&Revenue Category Name Category Wise Rev Category Unit sold Jewellery 19575809 3521 Furniture 3079706 179 Home Decor & Festive Needs 1645272 859 Computers 1192734 572 Automotive 1187649 1002 Mobiles & Accessories 1050231 1097 Clothing 1037826 887 Home Furnishing 863282 699 Baby Care 395243 455 Tools & Hardware 334189 386 Kitchen & Dining 297572 362 Toys & School Supplies 211869 101 Pens & Stationery 194837 173 Bags 184939 151 Beauty and Personal Care 178641 154 Health & Personal Care Appliances 139669 43 Footwear 129603 191 Sports & Fitness 128618 107 Cameras & Accessories 93437 72 Home Improvement 87456 78 Home & Kitchen 83461 24 Home Entertainment 56301 19 Gaming 30714 35 Watches 24628 48 2016 30681
  • 27. Automation & Robotics 17000 1 Eyewear 12253 10 Pet Supplies 12194 29 Sunglasses 10911 22 Food & Nutrition 1955 1 Household Supplies 1917 4 Wearable Smart Devices 978 2 The above report is the combined report which shows the category wise revenue along with its unit sold.
  • 28. 5. XML AND SCHEMA 1.a. XML of Brand Dimension 1.b XSD document of Brand Dimension
  • 29. 1.c Validate XML against XSD 2. a. XML Calendar Dimension.
  • 30. 2.b. XSD of Calendar Dimension 2.c. XML and XSD Validation
  • 31. 3.a XML Category Dimension 3.b. XSD Category Dimension
  • 32. 3.c XML and XSD Validation 4.a XML Fact Dimension
  • 33. 4.b XSD Of Fact Dimension 4.c XML and XSD Validation
  • 34. 5 a. XML Order Dimension 5.b. XSD Order Dimension
  • 35. 5.c. XML and XSD Validation
  • 36. 6. GRAPH DATABASES • Neo4j is often called as Graph database. Graphs are structures containing vertices (denoted as entities) and edges (denotes connections between vertices). • Neo4j permits storing data as a key value pairs that is, its properties can have any value as string, number or Boolean. • Graph Database most of the time are schema less, which allows flexibility of a document or key/value Store database. Moreover, it supports Relationships in similar manner as that of traditional Relational Database. • Below are the graph and code written to load the dataset in Neo4j. We have loaded the csv file of the required tables and match them according to our corresponding data table. We have first created nodes which are the required table in Neo4j, create constraints and match them to the required table. ❖ Load Brand Dimension
  • 37. ❖ Load Calendar Dimension ❖ Load Category Dimension
  • 38. ❖ Load Fact Dimension ❖ Load Order Dimension
  • 39. ❖ Connect Category Dimension to Fact Dimension ❖ Connect Calendar Dimension to Fact Dimension
  • 40. ❖ Connect Order Dimension with Fact Dimension ❖ Graph to find count of brand name
  • 41. ❖ Graph to find Brand name starting from R
  • 42. NEO4J VS RELATIONAL DATABASE Neo4j code to create relation (join) between Brand-Order –Fact Match(p:Fact_table),(b:Brand),(o:Od) where p.Brand_ID=b.Brand_ID and p.Od_ID=o.Od_ID return p,b,o
  • 43. SQL Query The above SQL query demonstrates Brand_ID ,Order_ID and its Discounted_Cost , it is observed that output in SQL is not graphical whereas it is quite interactive and attractive in Neo4j. Neo4j code looks much simple and easy to understand, whereas the code for SQL is tedious and requires time to understand on how to relate a table or column. The output for both is very different, Neo4j looks more presentable whereas only rows are seen in SQL. In Neo4j we can retrieve data quickly by accessing the respective nodes, whereas in relational database we retrieve data by accessing select query on table. While implementing we observed that cypher query in Neo4j are easier to work with in comparison with SQL. As, in Neo4j relationships can be formed using constraints alone rather in relational database consist of different concepts like foreign keys, surrogate keys etc.
  • 44. 7. CONCLUSIONS • In this project, we have concentrated briefly at the order transaction on Ecommerce Website. Then, we extracted some of the major components by cleaning the dataset. • We analyze the Flipkart (Ecommerce Website) order transaction data in the year 2015 to2016, and find out, hierarchy of revenue generated by the brand as well as product category. • We visualized how every factor in the form of dimension has a direct or indirect impact on their Product Sales.
  • 46. APPENDIX A – VISUALIZATIONS CODE install.packages("RODBC") library(RODBC) library(ggplot2) library(dplyr) myconn <- odbcDriverConnect(connection = "Driver={SQL Server};server=SHREEM;database=FKARTDW;trusted_connection=yes;") myconn ###################################################################### # To get category wise Unit Sold (Bar_Chart) rd <- sqlQuery(myconn," select distinct(b.Category_Name),a.CategoryWise_Rev,a.CategoryUnit_sold from Ecom_Fact a, Category_Dim b, Calender_Dim c where a.Category_Key=b.Category_Key and a.Calender_Key=c.Calender_Key order by a.CategoryWise_Rev desc") rd dt<- rd %>% top_n(10) ggplot(dt,aes(Category_Name,CategoryUnit_sold,label=CategoryUnit_sold,color="red")) + geom_bar(stat="identity")+ geom_text(vjust=2)
  • 47. # To get the Brand wise revenue (Bar_Chart) rvn<- sqlQuery(myconn,"select distinct(b.Brand_Name),a.Brand_Rev from Ecom_Fact a , Brand_Dim b where a.Brand_Key=b.Brand_Key order by Brand_Rev desc") rvn bo<-rvn%>%top_n(10) barplot(bo$Brand_Rev,main ="Brand Wise Revenue", xlab = "revenue",ylab = "brand", names.arg=bo$Brand_Name, col = rainbow(length(unique(bo$Brand_Name))), legend.text = unique(bo$Brand_Name), args.legend = list(horiz=TRUE,x="topleft")) ######################################################################
  • 48. # To get category wise Unit Sold (Bar_Chart) tu<-sqlQuery(myconn,"select distinct(b.Category_Name),a.CategoryWise_Rev,a.CategoryUnit_sold from Ecom_Fact a, Category_Dim b, Calender_Dim c where a.Category_Key=b.Category_Key and a.Calender_Key=c.Calender_Key order by a.CategoryWise_Rev desc") tu to<- tu%>%select(Category_Name,CategoryUnit_sold)%>%top_n(5) to slices<- c(to$CategoryUnit_sold) lbls<-c(to$Category_Name) pct<- round(slices/sum(slices)*100) lbls<- paste(lbls,pct) lbls <- paste(lbls,"%",sep = "") pie(slices,labels = lbls,col = rainbow(length(lbls)), main="pie chart of unit sold") ######################################################################
  • 49. SSRS Quries # Brand wise revenue in the year 2015 and 2016 select distinct(b.Brand_Name),a.Brand_Rev from Ecom_Fact a , Brand_Dim b where a.Brand_Key=b.Brand_Key order by Brand_Rev desc # Total Unit sold and Revenue for all Categories in 2015 and 2016 select distinct(b.Category_Name),a.CategoryWise_Rev,a.CategoryUnit_sold from Ecom_Fact a, Category_Dim b, Calender_Dim c where a.Category_Key=b.Category_Key and a.Calender_Key=c.Calender_Key order by a.CategoryWise_Rev desc #Year wise Category Revenue select b.Category_Name,c.Year_,SUM(a.Discounted_Cost) yearWise_CatRev from Ecom_Fact a, Category_Dim b, Calender_Dim c where a.Category_Key=b.Category_Key and a.Calender_Key=c.Calender_Key group by b.Category_Name,c.Year_ order by b.Category_Name ##Weeek wise revenue of all Categories in the Year 2015 and 2016
  • 50. select c.Category_Name,b.Year_,b.Day_of_Week,SUM(a.Discounted_Cost) AS WeekWise_JewlREv from Ecom_Fact a, Calender_Dim b, Category_Dim c where a.Calender_Key=b.Calender_Key and a.Category_Key=c.Category_Key and c.Category_ID in( select Category_ID from Calender_Dim ) group by c.Category_Name,b.Year_,b.Day_of_Week order by c.Category_Name desc # Quarter_ wise Revenue of top_10 Category in the Yr 2015 and 2016 select c.Category_Name,b.Year_,b.Quarter_,SUM(a.Discounted_Cost) AS QuatWise_Rev from Ecom_Fact a, Calender_Dim b, Category_Dim c where a.Calender_Key=b.Calender_Key and a.Category_Key=c.Category_Key and c.Category_ID in (10015,10001,10006,10020,10007,10012,10004,10010,10011,10008) group by c.Category_Name,b.Year_,b.Quarter_ order by c.Category_Name desc Highest_Sold_Order # which Brand has got the order for Product whose retail cost is more than 50000 select b.Brand_Name,c.Category_Name,c.Order_date,c.Order_Details, c.Discounted_Cost ,MAX(a.Retail_Cost) AS Product_Price from Ecom_Fact a, Brand_Dim b, Order_Dim c
  • 51. where a.Brand_Key=b.Brand_Key and a.Order_Key=c.Order_Key group by b.Brand_Name,c.Category_Name,c.Order_date,c.Order_Details,c.Discounted_Cost having MAX(a.Retail_Cost) > 50000 order by Product_Price Desc APPENDIX B – NEO 4J CODE Queries for Neo4j ❖ Load Brand Dimension: LOAD CSV WITH HEADERS FROM "file:///Brand_dim.csv" as row create(b:Brand) set b=row{Brand_ID:row.Brand_ID,Brand_Name:row.Brand_Name} return b Create CONSTRAINT on (b:brand) Assert b.Brand_ID IS UNIQUE ❖ Load Calendar Dimension LOAD CSV WITH HEADERS FROM "file:///Calender_dim.csv" as row create(d:Date) set d=row{Calender_ID:row.Calender_ID,Full_Date:row.Full_Date,Day_of_Week:row.Day_ of_Week,Month_:row.Month_,Quarter_:row.Quarter_,Year_:row.Year_} return d Create CONSTRAINT on (d:Date) Assert d.Calender_ID IS UNIQUE ❖ Load category Dimension
  • 52. LOAD CSV WITH HEADERS FROM "file:///Category_dim.csv" as row create(c:Category) set c=row{Category_ID:row.Category_ID,Category_Name:row.Category_Name}return c Create CONSTRAINT on (c:Category) Assert c.Category_ID IS UNIQUE ❖ Load Fact Dimension LOAD CSV WITH HEADERS FROM "file:///Fact_dim.csv" as row CREATE(f:Fact_table) SET f=row{Od_ID:row.Od_ID,Calender_ID:row.Calender_ID,Category_ID:row.Category_ID, Brand_ID:row.Brand_ID,Retail_Cost:row.Retail_Cost,Discounted_Cost:row.Discounted _Cost,Brand_Rev:row.Brand_Rev,CategoryUnit_sold:row.CategoryUnit_sold} ❖ Load Order Dimension LOAD CSV WITH HEADERS FROM "file:///Order_dim.csv" as row create(o:Od) set o=row{Od_ID:row.Od_ID, Od_date:row.Od_date,URL_:row.URL_,Category_Name:row.Category_Name,Od_Detai ls:row.Od_Details,Retail_Cost:row.Retail_Cost,Discounted_Cost:row.Discounted_Cost} return o Create CONSTRAINT on (o:Od) Assert o.Od_ID IS UNIQUE Relationship queries ❖ Connect Category Dimension to Fact Dimension match(s:Category),(p:Fact_table) where s.Category_ID=p.Category_ID create(s)- [r:Category_Name]- >(p) return s,p,r ❖ Connect Calendar Dimension to Fact Dimension match(s:Calender),(p:Fact_table) where s.Calender_ID=p.Calender_ID create(s)- [r:Quarter_]- >(p) return s,p,r ❖ Connect Order Dimension with Fact Dimension match(s:Od),(p:Fact_table) where s.Od_ID=p.Od_ID create(s)- [r:Retail_Cost]- >(p) return s,p,r ❖ Query to find count of brand name match(n:Brand) Return count(n.Brand_Name) return n
  • 53. ❖ Query to find brand_name whose name starts with R. match (c:Brand) where c.Brand_Name starts with "R" return c ❖ Interconnect query connect Order_Brand_Fact Match(p:Fact_table),(b:Brand),(o:Od) where p.Brand_ID=b.Brand_ID and p.Od_ID=o.Od_ID return p,b,o