SlideShare a Scribd company logo
1 of 41
DATA WAREHOUSING
AND
DATA MINING
UNIT – 1
Prepared by
Mr. P. Nandakumar
Assistant Professor,
Department of IT, SVCET
DBMS Schemas for Decision Support
Schema is a logical description of the entire database.
It includes the name and description of records of all
record types including all associated data-items and
aggregates.
Much like a database, a data warehouse also requires to
maintain a schema.
A database uses relational model, while a data warehouse
uses Star, Snowflake, and Fact Constellation schema.
Star Schema
• Each dimension in a star schema is represented with
only one-dimension table.
• This dimension table contains the set of attributes.
• The following diagram shows the sales data of a
company with respect to the four dimensions,
namely time, item, branch, and location.
• There is a fact table at the center. It contains the keys
to each of four dimensions.
• The fact table also contains the attributes, namely
dollars sold and units sold.
Star Schema
Snowflake Schema
• Some dimension tables in the Snowflake schema are normalized.
• The normalization splits up the data into additional tables.
• Unlike Star schema, the dimensions table in a snowflake schema are
normalized.
• For example, the item dimension table in star schema is normalized
and split into two dimension tables, namely item and supplier table.
• Now the item dimension table contains the attributes item_key,
item_name, type, brand, and supplier-key.
• The supplier key is linked to the supplier dimension table. The
supplier dimension table contains the attributes supplier_key and
supplier_type.
Snowflake Schema
Fact Constellation Schema
 A fact constellation has multiple fact tables. It is also known as galaxy
schema.
 The following diagram shows two fact tables, namely sales and shipping.
 The sales fact table is same as that in the star schema.
 The shipping fact table has the five dimensions, namely item_key, time_key,
shipper_key, from_location, to_location.
 The shipping fact table also contains two measures, namely dollars sold and
units sold.
 It is also possible to share dimension tables between fact tables. For example,
time, item, and location dimension tables are shared between the sales and
shipping fact table.
Fact Constellation Schema
Schema Definition
Multidimensional schema is defined using Data Mining
Query Language (DMQL).
The two primitives, cube definition and dimension
definition, can be used for defining the data warehouses
and data marts.
Data extraction, clean up and
transformation tools
1. Tools requirements:
 The tools enable sourcing of the proper data contents and formats
from operational and external data stores into the data warehouse.
The task includes:
Data transformation from one format to another
Data transformation and calculation based on the application of the
business rules. Eg : age from date of birth.
Data consolidation (several source records into single records) and
integration
Data extraction, clean up and
transformation tools
 Meta data synchronizations and management include storing or updating
metadata definitions.
 When implementing datawarehouse, several selections criteria that affect the
tools ability to transform, integrate and repair the data should be considered.
 The ability to identify the data source
 Support for flat files, Indexed files
 Ability to merge the data from multiple data source
 Ability to read information from data dictionaries
 The code generated tool should be maintained in the development
environment
 The ability to perform data type and character set translation is requirement
when moving data between incompatible systems.
Data extraction, clean up and
transformation tools
 The ability to summarization and aggregations of records
 The data warehouse database management system should be able to perform
the load directly from the tool using the native API.
2. Vendor approaches:
 The tasks of capturing data from a source data system, cleaning transforming
it and the loading the result into a target data system.
 It can be a carried out either by separate product or by single integrated
solutions. the integrated solutions are described below:
Code generators:
 Create tailored 3GL/4GL transformation program based on source and target
data definitions.
 The data transformations and enhancement rules defined by developer and it
employ data manipulation language.
Data extraction, clean up and
transformation tools
Database data Replication tools:
 It employs changes to a single data source on one system and apply the
changes to a copy of the source data that are loaded on a different systems.
 Rule driven dynamic transformations engines (also known as data mart
builders)
 Capture the data from a source system at user defined interval, transforms the
data, then send and load the result in to a target systems.
 Data Transformation and enhancement is based on a script or function logic
defined to the tool.
Data extraction, clean up and
transformation tools
3. Access to legacy data:
 Today many businesses are adopting client/server technologies and data
warehousing to meet customer demand for new products and services to
obtain competitive advantages.
 Majority of information required supporting business application and
analytical power of data warehousing is located behind mainframe based
legacy systems.
 While many organizations protecting their heavy financial investment in
hardware and software to meet this goal many organization turn to
middleware solutions.
 Middleware strategy is the foundation for the enterprise/access. it is designed
for scalability and manageability in a data warehousing environment.
Data extraction, clean up and
transformation tools
4. Vendor solutions :
4.1 Prism solutions:
 Prism manager provides a solution for data warehousing by mapping source
data to target database management system.
 The prism warehouse manager generates code to extract and integrate data,
create and manage metadata and create subject oriented historical database.
 It extracts data from multiple sources –DB2, IMS, VSAM, RMS &sequential
files.
Data extraction, clean up and
transformation tools
4.2 SAS institute:
 SAS data access engines serve as a extraction tools to combine common
variables, transform data
 Representations forms for consistency.
 it support for decision reporting ,graphing .so it act as the front end.
4.3 Carleton corporations PASSPORT and metacenter:
 Carleton’s PASSPORT and the MetaCenter fulfill the data extraction and
transformation need of data warehousing.
Metadata
1.Metadata defined
 Data about data, It contains Location and description of dw.
 Names, definition, structure and content of the dw.
 Identification of data sources.
 Integration and transformation rules to populate dw and end user.
 Information delivery information
 Data warehouse operational information
 Security authorization
 Metadata interchange initiative
 It is used for develop the standard specifications to exchange metadata
Metadata
2. Metadata Interchange initiative
 It used for develop the standard specifications for metadata interchange
format it will allow Vendors to exchange common metadata for avoid
difficulties of exchanging, sharing and Managing metadata
 The initial goals include
 Creating a vendor-independent, industry defined and maintained standard access
mechanism and standard API
 Enabling individual tools to satisfy their specific metadata for access
requirements, freely and easily within the context of an interchange model.
 Defining a clean simple, interchange implementation infrastructure.
 Creating a process and procedures for extending and updating.
Metadata
 Metadata Interchange initiative have define two distinct Meta models
 The application Metamodel- it holds the metadata for particular application
 The metadata Metamodel- set of objects that the metadata interchange
standard can be used to describe
 The above models represented by one or more classes of tools (data extraction,
cleanup, replication)
 Metadata interchange standard framework
 Metadata itself store any type of storage facility or format such as relational tables,
ASCII files ,fixed format or customized formats the Metadata interchange standard
framework will translate the an access request into interchange standard syntax and
format
Metadata
 Metadata interchange standard framework - Accomplish following
approach
Procedural approach-
ASCII batch approach-ASCII file containing metadata standard schema
and access parameters is reloads when over a tool access metadata
through API
Hybrid approach-it follow a data driven model by implementing table
driven API, that would support only fully qualified references for each
metadata
The Components of the metadata interchange standard frame work.
The standard metadata model-which refer the ASCII file format used to
represent the metadata
Metadata
The standard access framework-describe the minimum number of API
function for communicate metadata.
Tool profile-the tool profile is a file that describes what aspects of the
interchange standard metamodel a particular tool supports.
The user configuration-which is a file describing the legal interchange
paths for metadata in the users environment.
Metadata
3. Metadata Repository
 It is implemented as a part of the data warehouse frame work it following
benefits
 It provides a enterprise wide metadata management.
 It reduces and eliminates information redundancy, inconsistency
 It simplifies management and improves organization control
 It increase flexibility, control, and reliability of application development
 Ability to utilize existing applications
 It eliminates redundancy with ability to share and reduce metadata
Metadata
4. Metadata Management
 The collecting, maintain and distributing metadata is needed for a successful
data warehouse implementation so these tool need to be carefully evaluated
before any purchasing decision is made
5. Implementation Example
 Implementation approaches adopted by
 platinum technology,
 R&O,
 prism solutions, and
 logical works
Metadata
6. Metadata trends
 The process of integrating external and external data into the warehouse faces
a number of challenges
 Inconsistent data formats
 Missing or invalid data
 Different level of aggregation
 Semantic inconsistency
 Different types of database (text, audio, full-motion, images, temporal
databases, etc..)
 The above issues put an additional burden on the collection and management
of common metadata definition this is addressed by Metadata Coalition’s
metadata interchange specification
Reporting, Query Tools and
Applications
Tool Categories: There are five categories of decision support tools
 Reporting
 Managed query
 Executive information system
 OLAP
 Data Mining
Reporting Tools
 Production Reporting Tools
 Companies generate regular operational reports or support high volume batch
jobs, such as calculating and printing pay checks
 Report writers
 Crystal Reports/Accurate reporting system
 User design and run reports without having to rely on the IS department
Reporting, Query Tools and
Applications
Managed query tools
 Managed query tools shield end user from the Complexities of SQL and database
structures by inserting a metalayer between user and the database
 Metalayer :Software that provides subject oriented views of a database and
supports point and click creation of SQL
Executive information system
 First deployed on main frame system
 Predate report writer and managed query tools
 Build customized, graphical decision support apps or briefing books
 Provides high level view of the business and access to external sources eg
custom, on-line news feed
 EIS Apps highlight exceptions to business activity or rules by using color-coded
graphics
Reporting, Query Tools and
Applications
OLAP Tools
 Provide an intuitive way to view corporate data
 Provide navigation through the hierarchies and dimensions with the single click
 Aggregate data along common business subjects or dimensions
 Users can drill down across ,or up levels
Data mining Tools
 Provide insights into corporate data that are nor easily discerned with managed
query or OLAP tools
 Use variety of statistical and AI algorithm to analyze the correlation of variables
in data
Data Warehousing - OLAP
 OLAP stands for Online Analytical Processing.
 It uses database tables (fact and dimension tables) to enable multidimensional
viewing, analysis and querying of large amounts of data.
 E.g. OLAP technology could provide management with fast answers to complex
queries on their operational data or enable them to analyze their company’s
historical data for trends and patterns.
 Online Analytical Processing (OLAP) applications and tools are those that are
designed to ask ―complex queries of large multidimensional collections of
data. Due to that OLAP is accompanied with data warehousing.
Data Warehousing - OLAP
Need
 The key driver of OLAP is the multidimensional nature of the business
problem.
 These problems are characterized by retrieving a very large number of
records that can reach gigabytes and terabytes and summarizing this data into
a form information that can by used by business analysts.
 One of the limitations that SQL has, it cannot represent these complex
problems.
 A query will be translated in to several SQL statements. These SQL
statements will involve multiple joins, intermediate tables, sorting,
aggregations and a huge temporary memory to store these tables.
Data Warehousing - OLAP
 Online Analytical Processing Server (OLAP) is based on the
multidimensional data model.
 It allows managers, and analysts to get an insight of the information through
fast, consistent, and interactive access to information.
 Provide an intuitive way to view corporate data.
Types of OLAP Servers:
We have four types of OLAP servers −
 Relational OLAP (ROLAP)
 Multidimensional OLAP (MOLAP)
 Hybrid OLAP (HOLAP)
 Specialized SQL Servers
OLAP Vs OLTP
Sr.No. Data Warehouse (OLAP) Operational Database (OLTP)
1 Involves historical processing of
information.
Involves day-to-day processing.
2 OLAP systems are used by knowledge
workers such as executives, managers
and analysts.
OLTP systems are used by clerks, DBAs,
or database professionals.
3 Useful in analyzing the business. Useful in running the business.
4 It focuses on Information out. It focuses on Data in.
5 Based on Star Schema, Snowflake,
Schema and Fact Constellation Schema.
Based on Entity Relationship Model.
6 Contains historical data. Contains current data.
OLAP Vs OLTP
Sr.No. Data Warehouse (OLAP) Operational Database (OLTP)
7 Provides summarized and
consolidated data.
Provides primitive and highly detailed
data.
8 Provides summarized and
multidimensional view of data.
Provides detailed and flat relational
view of data.
9 Number or users is in hundreds. Number of users is in thousands.
10 Number of records accessed is in
millions.
Number of records accessed is in tens.
11 Database size is from 100 GB to 1 TB Database size is from 100 MB to 1 GB.
12 Highly flexible. Provides high performance.
Multidimensional Data Model
 The multidimensional data model is an integral part of On-Line Analytical
Processing, or OLAP.
 Multidimensional data model is to view it as a cube. The cable at the left
contains detailed sales data by product, market and time. The cube on the
right associates sales number (unit sold) with dimensions-product type,
market and time with the unit variables organized as cell in an array.
 This cube can be expended to include another array-price-which can be
associates with all or only some dimensions. As number of dimensions
increases number of cubes cell increase exponentially.
ETL Process in Data Warehouse
 ETL stands for Extract, Transform, Load and it is a process used in data
warehousing to extract data from various sources, transform it into a format
suitable for loading into a data warehouse, and then load it into the
warehouse. The process of ETL can be broken down into the following three
stages:
 Extract: The first stage in the ETL process is to extract data from various
sources such as transactional systems, spreadsheets, and flat files. This step
involves reading data from the source systems and storing it in a staging area.
 Transform: In this stage, the extracted data is transformed into a format that is
suitable for loading into the data warehouse. This may involve cleaning and
validating the data, converting data types, combining data from multiple
sources, and creating new data fields.
ETL Process in Data Warehouse
 Load: After the data is transformed, it is loaded into the data warehouse. This
step involves creating the physical data structures and loading the data into
the warehouse.
 The ETL process is an iterative process that is repeated as new data is added
to the warehouse. The process is important because it ensures that the data in
the data warehouse is accurate, complete, and up-to-date. It also helps to
ensure that the data is in the format required for data mining and reporting.
 Additionally, there are many different ETL tools and technologies available,
such as Informatica, Talend, DataStage, and others, that can automate and
simplify the ETL process.
 ETL is a process in Data Warehousing and it stands for Extract, Transform
and Load. It is a process in which an ETL tool extracts the data from various
data source systems, transforms it in the staging area, and then finally, loads it
into the Data Warehouse system.
ETL Process in Data Warehouse
ETL Tools: Most
commonly used
ETL tools
are Hevo,
Sybase, Oracle
Warehouse
builder,
CloverETL, and
MarkLogic.
Data
Warehouses: M
ost commonly
used Data
Warehouses
are Snowflake,
Redshift,
BigQuery, and
Overall, ETL process is an essential process in data
warehousing that helps to ensure that the data in the data
warehouse is accurate, complete, and up-to-date.
ETL Process
ADVANTAGES and DISADVANTAGES
Advantages of ETL process in data warehousing:
 Improved data quality: ETL process ensures that the data in the data
warehouse is accurate, complete, and up-to-date.
 Better data integration: ETL process helps to integrate data from multiple
sources and systems, making it more accessible and usable.
 Increased data security: ETL process can help to improve data security by
controlling access to the data warehouse and ensuring that only authorized
users can access the data.
 Improved scalability: ETL process can help to improve scalability by
providing a way to manage and analyze large amounts of data.
 Increased automation: ETL tools and technologies can automate and simplify
the ETL process, reducing the time and effort required to load and update data
in the warehouse.
ETL Process
ADVANTAGES OR DISADVANTAGES
Disadvantages of ETL process in data warehousing:
 High cost: ETL process can be expensive to implement and maintain,
especially for organizations with limited resources.
 Complexity: ETL process can be complex and difficult to implement,
especially for organizations that lack the necessary expertise or resources.
 Limited flexibility: ETL process can be limited in terms of flexibility, as it
may not be able to handle unstructured data or real-time data streams.
 Limited scalability: ETL process can be limited in terms of scalability, as it
may not be able to handle very large amounts of data.
 Data privacy concerns: ETL process can raise concerns about data privacy, as
large amounts of data are collected, stored, and analyzed.
10 Best Data Warehouse Tools to Explore
in 2023
1. Hevo Data
2. Amazon Web Services Data Warehouse Tools
3. Google Data Warehouse Tools
4. Microsoft Azure Data Warehouse Tools
5. Oracle Autonomous Data Warehouse
6. Snowflake
7. IBM Data Warehouse Tools
8. Teradata Vantage
9. SAS Cloud
10. SAP Data Warehouse Cloud
IMPORTANT WEBSITE LINKS
1. AWS Redshift: Best for real-time and predictive analytics
2. Oracle Autonomous Data Warehouse: Best for autonomous management
capabilities
3. Azure Synapse Analytics: Best for intelligent workload management
4. IBM Db2 Warehouse: Best for fully managed cloud versions
5. Teradata Vantage: Best for enhanced analytics capabilities
6. SAP BW/4HANA: Best for advanced analytics and tailored applications
7. Google BigQuery: Best for built-in query acceleration and serverless
architecture
8. Snowflake for Data Warehouse: Best for separate computation and storage
IMPORTANT WEBSITE LINKS
9. Cloudera Data Platform: Best for faster scaling
10. Micro Focus Vertica: Best for improved query performance
11. MarkLogic: Best for complex data challenges
12. MongoDB: Best for sophisticated access management
13. Talend: Best for simplified data governance
14. Informatica: Best for intelligent data management
15. Arm Treasure Data: Best for connected customer experience

More Related Content

Similar to UNIT - 1 Part 2: Data Warehousing and Data Mining

11667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect411667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect4ambujm
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptPalaniKumarR2
 
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxUNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxshruthisweety4
 
Unit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxUnit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxHarsha Patel
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345AkhilSinghal21
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDhilsath Fathima
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSumathiG8
 
Data warehousing
Data warehousingData warehousing
Data warehousingkeeyre
 
Process management seminar
Process management seminarProcess management seminar
Process management seminarapurva_naik
 
Chapter 2 - Enterprise Application Integration.pdf
Chapter 2 - Enterprise Application Integration.pdfChapter 2 - Enterprise Application Integration.pdf
Chapter 2 - Enterprise Application Integration.pdfKhairul Anwar Sedek
 
Decoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDecoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDatavalley.ai
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and ImplementationSHIKHA GAUTAM
 

Similar to UNIT - 1 Part 2: Data Warehousing and Data Mining (20)

Advanced Database System
Advanced Database SystemAdvanced Database System
Advanced Database System
 
DW 101
DW 101DW 101
DW 101
 
11667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect411667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect4
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxUNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
 
Unit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxUnit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptx
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data Mining
Data MiningData Mining
Data Mining
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousing
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Unit 5
Unit 5 Unit 5
Unit 5
 
Process management seminar
Process management seminarProcess management seminar
Process management seminar
 
Chapter 2 - Enterprise Application Integration.pdf
Chapter 2 - Enterprise Application Integration.pdfChapter 2 - Enterprise Application Integration.pdf
Chapter 2 - Enterprise Application Integration.pdf
 
Decoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDecoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdf
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and Implementation
 

More from Nandakumar P

UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningUNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningNandakumar P
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningNandakumar P
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningNandakumar P
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningNandakumar P
 
UNIT 2: Part 1: Data Warehousing and Data Mining
UNIT 2: Part 1: Data Warehousing and Data MiningUNIT 2: Part 1: Data Warehousing and Data Mining
UNIT 2: Part 1: Data Warehousing and Data MiningNandakumar P
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningNandakumar P
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONNandakumar P
 
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONNandakumar P
 
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON Nandakumar P
 
Python Course for Beginners
Python Course for BeginnersPython Course for Beginners
Python Course for BeginnersNandakumar P
 
CS6601-Unit 4 Distributed Systems
CS6601-Unit 4 Distributed SystemsCS6601-Unit 4 Distributed Systems
CS6601-Unit 4 Distributed SystemsNandakumar P
 
Unit-4 Professional Ethics in Engineering
Unit-4 Professional Ethics in EngineeringUnit-4 Professional Ethics in Engineering
Unit-4 Professional Ethics in EngineeringNandakumar P
 
Unit-3 Professional Ethics in Engineering
Unit-3 Professional Ethics in EngineeringUnit-3 Professional Ethics in Engineering
Unit-3 Professional Ethics in EngineeringNandakumar P
 
Naming in Distributed Systems
Naming in Distributed SystemsNaming in Distributed Systems
Naming in Distributed SystemsNandakumar P
 
Unit 3.1 cs6601 Distributed File System
Unit 3.1 cs6601 Distributed File SystemUnit 3.1 cs6601 Distributed File System
Unit 3.1 cs6601 Distributed File SystemNandakumar P
 
Unit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsUnit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsNandakumar P
 
Professional Ethics in Engineering
Professional Ethics in EngineeringProfessional Ethics in Engineering
Professional Ethics in EngineeringNandakumar P
 

More from Nandakumar P (17)

UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningUNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data Mining
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data Mining
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
 
UNIT 2: Part 1: Data Warehousing and Data Mining
UNIT 2: Part 1: Data Warehousing and Data MiningUNIT 2: Part 1: Data Warehousing and Data Mining
UNIT 2: Part 1: Data Warehousing and Data Mining
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data Mining
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
Python Course for Beginners
Python Course for BeginnersPython Course for Beginners
Python Course for Beginners
 
CS6601-Unit 4 Distributed Systems
CS6601-Unit 4 Distributed SystemsCS6601-Unit 4 Distributed Systems
CS6601-Unit 4 Distributed Systems
 
Unit-4 Professional Ethics in Engineering
Unit-4 Professional Ethics in EngineeringUnit-4 Professional Ethics in Engineering
Unit-4 Professional Ethics in Engineering
 
Unit-3 Professional Ethics in Engineering
Unit-3 Professional Ethics in EngineeringUnit-3 Professional Ethics in Engineering
Unit-3 Professional Ethics in Engineering
 
Naming in Distributed Systems
Naming in Distributed SystemsNaming in Distributed Systems
Naming in Distributed Systems
 
Unit 3.1 cs6601 Distributed File System
Unit 3.1 cs6601 Distributed File SystemUnit 3.1 cs6601 Distributed File System
Unit 3.1 cs6601 Distributed File System
 
Unit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsUnit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed Systems
 
Professional Ethics in Engineering
Professional Ethics in EngineeringProfessional Ethics in Engineering
Professional Ethics in Engineering
 

Recently uploaded

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 

Recently uploaded (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 

UNIT - 1 Part 2: Data Warehousing and Data Mining

  • 1. DATA WAREHOUSING AND DATA MINING UNIT – 1 Prepared by Mr. P. Nandakumar Assistant Professor, Department of IT, SVCET
  • 2. DBMS Schemas for Decision Support Schema is a logical description of the entire database. It includes the name and description of records of all record types including all associated data-items and aggregates. Much like a database, a data warehouse also requires to maintain a schema. A database uses relational model, while a data warehouse uses Star, Snowflake, and Fact Constellation schema.
  • 3. Star Schema • Each dimension in a star schema is represented with only one-dimension table. • This dimension table contains the set of attributes. • The following diagram shows the sales data of a company with respect to the four dimensions, namely time, item, branch, and location. • There is a fact table at the center. It contains the keys to each of four dimensions. • The fact table also contains the attributes, namely dollars sold and units sold.
  • 5. Snowflake Schema • Some dimension tables in the Snowflake schema are normalized. • The normalization splits up the data into additional tables. • Unlike Star schema, the dimensions table in a snowflake schema are normalized. • For example, the item dimension table in star schema is normalized and split into two dimension tables, namely item and supplier table. • Now the item dimension table contains the attributes item_key, item_name, type, brand, and supplier-key. • The supplier key is linked to the supplier dimension table. The supplier dimension table contains the attributes supplier_key and supplier_type.
  • 7. Fact Constellation Schema  A fact constellation has multiple fact tables. It is also known as galaxy schema.  The following diagram shows two fact tables, namely sales and shipping.  The sales fact table is same as that in the star schema.  The shipping fact table has the five dimensions, namely item_key, time_key, shipper_key, from_location, to_location.  The shipping fact table also contains two measures, namely dollars sold and units sold.  It is also possible to share dimension tables between fact tables. For example, time, item, and location dimension tables are shared between the sales and shipping fact table.
  • 9. Schema Definition Multidimensional schema is defined using Data Mining Query Language (DMQL). The two primitives, cube definition and dimension definition, can be used for defining the data warehouses and data marts.
  • 10. Data extraction, clean up and transformation tools 1. Tools requirements:  The tools enable sourcing of the proper data contents and formats from operational and external data stores into the data warehouse. The task includes: Data transformation from one format to another Data transformation and calculation based on the application of the business rules. Eg : age from date of birth. Data consolidation (several source records into single records) and integration
  • 11. Data extraction, clean up and transformation tools  Meta data synchronizations and management include storing or updating metadata definitions.  When implementing datawarehouse, several selections criteria that affect the tools ability to transform, integrate and repair the data should be considered.  The ability to identify the data source  Support for flat files, Indexed files  Ability to merge the data from multiple data source  Ability to read information from data dictionaries  The code generated tool should be maintained in the development environment  The ability to perform data type and character set translation is requirement when moving data between incompatible systems.
  • 12. Data extraction, clean up and transformation tools  The ability to summarization and aggregations of records  The data warehouse database management system should be able to perform the load directly from the tool using the native API. 2. Vendor approaches:  The tasks of capturing data from a source data system, cleaning transforming it and the loading the result into a target data system.  It can be a carried out either by separate product or by single integrated solutions. the integrated solutions are described below: Code generators:  Create tailored 3GL/4GL transformation program based on source and target data definitions.  The data transformations and enhancement rules defined by developer and it employ data manipulation language.
  • 13. Data extraction, clean up and transformation tools Database data Replication tools:  It employs changes to a single data source on one system and apply the changes to a copy of the source data that are loaded on a different systems.  Rule driven dynamic transformations engines (also known as data mart builders)  Capture the data from a source system at user defined interval, transforms the data, then send and load the result in to a target systems.  Data Transformation and enhancement is based on a script or function logic defined to the tool.
  • 14. Data extraction, clean up and transformation tools 3. Access to legacy data:  Today many businesses are adopting client/server technologies and data warehousing to meet customer demand for new products and services to obtain competitive advantages.  Majority of information required supporting business application and analytical power of data warehousing is located behind mainframe based legacy systems.  While many organizations protecting their heavy financial investment in hardware and software to meet this goal many organization turn to middleware solutions.  Middleware strategy is the foundation for the enterprise/access. it is designed for scalability and manageability in a data warehousing environment.
  • 15. Data extraction, clean up and transformation tools 4. Vendor solutions : 4.1 Prism solutions:  Prism manager provides a solution for data warehousing by mapping source data to target database management system.  The prism warehouse manager generates code to extract and integrate data, create and manage metadata and create subject oriented historical database.  It extracts data from multiple sources –DB2, IMS, VSAM, RMS &sequential files.
  • 16. Data extraction, clean up and transformation tools 4.2 SAS institute:  SAS data access engines serve as a extraction tools to combine common variables, transform data  Representations forms for consistency.  it support for decision reporting ,graphing .so it act as the front end. 4.3 Carleton corporations PASSPORT and metacenter:  Carleton’s PASSPORT and the MetaCenter fulfill the data extraction and transformation need of data warehousing.
  • 17. Metadata 1.Metadata defined  Data about data, It contains Location and description of dw.  Names, definition, structure and content of the dw.  Identification of data sources.  Integration and transformation rules to populate dw and end user.  Information delivery information  Data warehouse operational information  Security authorization  Metadata interchange initiative  It is used for develop the standard specifications to exchange metadata
  • 18. Metadata 2. Metadata Interchange initiative  It used for develop the standard specifications for metadata interchange format it will allow Vendors to exchange common metadata for avoid difficulties of exchanging, sharing and Managing metadata  The initial goals include  Creating a vendor-independent, industry defined and maintained standard access mechanism and standard API  Enabling individual tools to satisfy their specific metadata for access requirements, freely and easily within the context of an interchange model.  Defining a clean simple, interchange implementation infrastructure.  Creating a process and procedures for extending and updating.
  • 19. Metadata  Metadata Interchange initiative have define two distinct Meta models  The application Metamodel- it holds the metadata for particular application  The metadata Metamodel- set of objects that the metadata interchange standard can be used to describe  The above models represented by one or more classes of tools (data extraction, cleanup, replication)  Metadata interchange standard framework  Metadata itself store any type of storage facility or format such as relational tables, ASCII files ,fixed format or customized formats the Metadata interchange standard framework will translate the an access request into interchange standard syntax and format
  • 20. Metadata  Metadata interchange standard framework - Accomplish following approach Procedural approach- ASCII batch approach-ASCII file containing metadata standard schema and access parameters is reloads when over a tool access metadata through API Hybrid approach-it follow a data driven model by implementing table driven API, that would support only fully qualified references for each metadata The Components of the metadata interchange standard frame work. The standard metadata model-which refer the ASCII file format used to represent the metadata
  • 21. Metadata The standard access framework-describe the minimum number of API function for communicate metadata. Tool profile-the tool profile is a file that describes what aspects of the interchange standard metamodel a particular tool supports. The user configuration-which is a file describing the legal interchange paths for metadata in the users environment.
  • 22. Metadata 3. Metadata Repository  It is implemented as a part of the data warehouse frame work it following benefits  It provides a enterprise wide metadata management.  It reduces and eliminates information redundancy, inconsistency  It simplifies management and improves organization control  It increase flexibility, control, and reliability of application development  Ability to utilize existing applications  It eliminates redundancy with ability to share and reduce metadata
  • 23. Metadata 4. Metadata Management  The collecting, maintain and distributing metadata is needed for a successful data warehouse implementation so these tool need to be carefully evaluated before any purchasing decision is made 5. Implementation Example  Implementation approaches adopted by  platinum technology,  R&O,  prism solutions, and  logical works
  • 24. Metadata 6. Metadata trends  The process of integrating external and external data into the warehouse faces a number of challenges  Inconsistent data formats  Missing or invalid data  Different level of aggregation  Semantic inconsistency  Different types of database (text, audio, full-motion, images, temporal databases, etc..)  The above issues put an additional burden on the collection and management of common metadata definition this is addressed by Metadata Coalition’s metadata interchange specification
  • 25. Reporting, Query Tools and Applications Tool Categories: There are five categories of decision support tools  Reporting  Managed query  Executive information system  OLAP  Data Mining Reporting Tools  Production Reporting Tools  Companies generate regular operational reports or support high volume batch jobs, such as calculating and printing pay checks  Report writers  Crystal Reports/Accurate reporting system  User design and run reports without having to rely on the IS department
  • 26. Reporting, Query Tools and Applications Managed query tools  Managed query tools shield end user from the Complexities of SQL and database structures by inserting a metalayer between user and the database  Metalayer :Software that provides subject oriented views of a database and supports point and click creation of SQL Executive information system  First deployed on main frame system  Predate report writer and managed query tools  Build customized, graphical decision support apps or briefing books  Provides high level view of the business and access to external sources eg custom, on-line news feed  EIS Apps highlight exceptions to business activity or rules by using color-coded graphics
  • 27. Reporting, Query Tools and Applications OLAP Tools  Provide an intuitive way to view corporate data  Provide navigation through the hierarchies and dimensions with the single click  Aggregate data along common business subjects or dimensions  Users can drill down across ,or up levels Data mining Tools  Provide insights into corporate data that are nor easily discerned with managed query or OLAP tools  Use variety of statistical and AI algorithm to analyze the correlation of variables in data
  • 28. Data Warehousing - OLAP  OLAP stands for Online Analytical Processing.  It uses database tables (fact and dimension tables) to enable multidimensional viewing, analysis and querying of large amounts of data.  E.g. OLAP technology could provide management with fast answers to complex queries on their operational data or enable them to analyze their company’s historical data for trends and patterns.  Online Analytical Processing (OLAP) applications and tools are those that are designed to ask ―complex queries of large multidimensional collections of data. Due to that OLAP is accompanied with data warehousing.
  • 29. Data Warehousing - OLAP Need  The key driver of OLAP is the multidimensional nature of the business problem.  These problems are characterized by retrieving a very large number of records that can reach gigabytes and terabytes and summarizing this data into a form information that can by used by business analysts.  One of the limitations that SQL has, it cannot represent these complex problems.  A query will be translated in to several SQL statements. These SQL statements will involve multiple joins, intermediate tables, sorting, aggregations and a huge temporary memory to store these tables.
  • 30. Data Warehousing - OLAP  Online Analytical Processing Server (OLAP) is based on the multidimensional data model.  It allows managers, and analysts to get an insight of the information through fast, consistent, and interactive access to information.  Provide an intuitive way to view corporate data. Types of OLAP Servers: We have four types of OLAP servers −  Relational OLAP (ROLAP)  Multidimensional OLAP (MOLAP)  Hybrid OLAP (HOLAP)  Specialized SQL Servers
  • 31. OLAP Vs OLTP Sr.No. Data Warehouse (OLAP) Operational Database (OLTP) 1 Involves historical processing of information. Involves day-to-day processing. 2 OLAP systems are used by knowledge workers such as executives, managers and analysts. OLTP systems are used by clerks, DBAs, or database professionals. 3 Useful in analyzing the business. Useful in running the business. 4 It focuses on Information out. It focuses on Data in. 5 Based on Star Schema, Snowflake, Schema and Fact Constellation Schema. Based on Entity Relationship Model. 6 Contains historical data. Contains current data.
  • 32. OLAP Vs OLTP Sr.No. Data Warehouse (OLAP) Operational Database (OLTP) 7 Provides summarized and consolidated data. Provides primitive and highly detailed data. 8 Provides summarized and multidimensional view of data. Provides detailed and flat relational view of data. 9 Number or users is in hundreds. Number of users is in thousands. 10 Number of records accessed is in millions. Number of records accessed is in tens. 11 Database size is from 100 GB to 1 TB Database size is from 100 MB to 1 GB. 12 Highly flexible. Provides high performance.
  • 33. Multidimensional Data Model  The multidimensional data model is an integral part of On-Line Analytical Processing, or OLAP.  Multidimensional data model is to view it as a cube. The cable at the left contains detailed sales data by product, market and time. The cube on the right associates sales number (unit sold) with dimensions-product type, market and time with the unit variables organized as cell in an array.  This cube can be expended to include another array-price-which can be associates with all or only some dimensions. As number of dimensions increases number of cubes cell increase exponentially.
  • 34. ETL Process in Data Warehouse  ETL stands for Extract, Transform, Load and it is a process used in data warehousing to extract data from various sources, transform it into a format suitable for loading into a data warehouse, and then load it into the warehouse. The process of ETL can be broken down into the following three stages:  Extract: The first stage in the ETL process is to extract data from various sources such as transactional systems, spreadsheets, and flat files. This step involves reading data from the source systems and storing it in a staging area.  Transform: In this stage, the extracted data is transformed into a format that is suitable for loading into the data warehouse. This may involve cleaning and validating the data, converting data types, combining data from multiple sources, and creating new data fields.
  • 35. ETL Process in Data Warehouse  Load: After the data is transformed, it is loaded into the data warehouse. This step involves creating the physical data structures and loading the data into the warehouse.  The ETL process is an iterative process that is repeated as new data is added to the warehouse. The process is important because it ensures that the data in the data warehouse is accurate, complete, and up-to-date. It also helps to ensure that the data is in the format required for data mining and reporting.  Additionally, there are many different ETL tools and technologies available, such as Informatica, Talend, DataStage, and others, that can automate and simplify the ETL process.  ETL is a process in Data Warehousing and it stands for Extract, Transform and Load. It is a process in which an ETL tool extracts the data from various data source systems, transforms it in the staging area, and then finally, loads it into the Data Warehouse system.
  • 36. ETL Process in Data Warehouse ETL Tools: Most commonly used ETL tools are Hevo, Sybase, Oracle Warehouse builder, CloverETL, and MarkLogic. Data Warehouses: M ost commonly used Data Warehouses are Snowflake, Redshift, BigQuery, and Overall, ETL process is an essential process in data warehousing that helps to ensure that the data in the data warehouse is accurate, complete, and up-to-date.
  • 37. ETL Process ADVANTAGES and DISADVANTAGES Advantages of ETL process in data warehousing:  Improved data quality: ETL process ensures that the data in the data warehouse is accurate, complete, and up-to-date.  Better data integration: ETL process helps to integrate data from multiple sources and systems, making it more accessible and usable.  Increased data security: ETL process can help to improve data security by controlling access to the data warehouse and ensuring that only authorized users can access the data.  Improved scalability: ETL process can help to improve scalability by providing a way to manage and analyze large amounts of data.  Increased automation: ETL tools and technologies can automate and simplify the ETL process, reducing the time and effort required to load and update data in the warehouse.
  • 38. ETL Process ADVANTAGES OR DISADVANTAGES Disadvantages of ETL process in data warehousing:  High cost: ETL process can be expensive to implement and maintain, especially for organizations with limited resources.  Complexity: ETL process can be complex and difficult to implement, especially for organizations that lack the necessary expertise or resources.  Limited flexibility: ETL process can be limited in terms of flexibility, as it may not be able to handle unstructured data or real-time data streams.  Limited scalability: ETL process can be limited in terms of scalability, as it may not be able to handle very large amounts of data.  Data privacy concerns: ETL process can raise concerns about data privacy, as large amounts of data are collected, stored, and analyzed.
  • 39. 10 Best Data Warehouse Tools to Explore in 2023 1. Hevo Data 2. Amazon Web Services Data Warehouse Tools 3. Google Data Warehouse Tools 4. Microsoft Azure Data Warehouse Tools 5. Oracle Autonomous Data Warehouse 6. Snowflake 7. IBM Data Warehouse Tools 8. Teradata Vantage 9. SAS Cloud 10. SAP Data Warehouse Cloud
  • 40. IMPORTANT WEBSITE LINKS 1. AWS Redshift: Best for real-time and predictive analytics 2. Oracle Autonomous Data Warehouse: Best for autonomous management capabilities 3. Azure Synapse Analytics: Best for intelligent workload management 4. IBM Db2 Warehouse: Best for fully managed cloud versions 5. Teradata Vantage: Best for enhanced analytics capabilities 6. SAP BW/4HANA: Best for advanced analytics and tailored applications 7. Google BigQuery: Best for built-in query acceleration and serverless architecture 8. Snowflake for Data Warehouse: Best for separate computation and storage
  • 41. IMPORTANT WEBSITE LINKS 9. Cloudera Data Platform: Best for faster scaling 10. Micro Focus Vertica: Best for improved query performance 11. MarkLogic: Best for complex data challenges 12. MongoDB: Best for sophisticated access management 13. Talend: Best for simplified data governance 14. Informatica: Best for intelligent data management 15. Arm Treasure Data: Best for connected customer experience