Data Governance is often referred to as the people, processes, and policies around data and information, and these aspects are critical to the success of any data governance implementation. But just as critical is the technical infrastructure that supports the diverse data environments that run the business. Data models can be the critical link between business definitions and rules and the technical data systems that support them. Without the valuable metadata these models provide, data governance often lacks the “teeth” to be applied in operational and reporting systems.
Join Donna Burbank and her guest, Nigel Turner, as they discuss how data models & metadata-driven data governance can be applied in your organization in order to achieve improved data quality.
Unraveling Multimodality with Large Language Models.pdf
Data Modeling, Data Governance, & Data Quality
1. Data Modeling, Data Governance & Data Quality
Donna Burbank & Nigel Turner
Global Data Strategy Ltd.
Lessons in Data Modeling DATAVERSITY Series
December 5th, 2017
2. Global Data Strategy, Ltd. 2017
Donna Burbank
Donna is a recognised industry expert in
information management with over 20
years of experience in data strategy,
information management, data modeling,
metadata management, and enterprise
architecture. Her background is multi-
faceted across consulting, product
development, product management, brand
strategy, marketing, and business
leadership.
She is currently the Managing Director at
Global Data Strategy, Ltd., an international
information management consulting
company that specializes in the alignment
of business drivers with data-centric
technology. In past roles, she has served in
key brand strategy and product
management roles at CA Technologies and
Embarcadero Technologies for several of
the leading data management products in
the market.
As an active contributor to the data
management community, she is a long
time DAMA International member, Past
President and Advisor to the DAMA Rocky
Mountain chapter, and was recently
awarded the Excellence in Data
Management Award from DAMA
International in 2016. She was on the
review committee for the Object
Management Group’s Information
Management Metamodel (IMM) and the
Business Process Modeling Notation
(BPMN). Donna is also an analyst at the
Boulder BI Train Trust (BBBT) where she
provides advices and gains insight on the
latest BI and Analytics software in the
market.
She has worked with dozens of Fortune
500 companies worldwide in the Americas,
Europe, Asia, and Africa and speaks
regularly at industry conferences. She has
co-authored two books: Data Modeling for
the Business and Data Modeling Made
Simple with ERwin Data Modeler and is a
regular contributor to industry
publications. She can be reached at
donna.burbank@globaldatastrategy.com
Donna is based in Boulder, Colorado, USA.
2
Follow on Twitter @donnaburbank
Today’s hashtag: #LessonsDM
3. Global Data Strategy, Ltd. 2017
Nigel Turner
Nigel Turner has worked in Information
Management (IM) and related areas for
over 20 years. This experience has
embraced Data Governance, Information
Strategy, Data Quality, Data Governance,
Master Data Management, & Business
Intelligence.
He spent much of his career in British
Telecommunications Group (BT) where he
led a series of enterprise wide IM & data
governance initiatives.
After leaving BT in 2010 Nigel became VP
of Information Management Strategy at
Harte Hanks Trillium Software, a leading
global provider of Data Quality & Data
Governance tools and consultancy. Here
he engaged with over 150 customer
organizations from all parts of the globe.
Currently Principal Consultant for EMEA at
Global Data Strategy, Ltd, he has been a
principal consultant at such firms as
FromHereOn and IPL, where he has led
Data Governance engagement with
customers such as First Great Western.
Nigel is a well known thought leader in
Information Management and has
presented at many international
conferences. He has also lectured part
time at Cardiff University, where he taught
Data Governance modules to both
undergraduate and graduate students. In
addition he was a part time Associate
Lecturer at the UK Open University where
he taught Systems & Management.
Nigel is very active in professional Data
Management organizations and is an
elected Data Management Association
(DAMA) UK Committee member. He was
the joint winner of DAMA International’s
2015 Community Award for the work he
initiated and led in setting up a mentoring
scheme in the UK where experienced
DAMA professionals coach and support
newer data management professionals.
Nigel is based in Cardiff, Wales, UK.
Follow on Twitter @NigelTurner8
Today’s hashtag: #LessonsDM
4. Global Data Strategy, Ltd. 2017
DATAVERSITY Lessons in Data Modeling Series
• January - on demand How Data Modeling Fits Into an Overall Enterprise Architecture
• February - on demand Data Modeling and Business Intelligence
• March - on demand Conceptual Data Modeling – How to Get the Attention of Business Users
• April - on demand The Evolving Role of the Data Architect – What does it mean for your Career?
• May - on demand Data Modeling & Metadata Management
• June - on demand Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling
• July - on demand Data Modeling & Metadata for Graph Databases
• August - on demand Data Modeling & Data Integration
• Sept - on demand Data Modeling & Master Data Management (MDM)
• October - on demand Agile & Data Modeling – How Can They Work Together?
• December Data Modeling, Data Quality & Data Governance
4
This Year’s Line Up
5. Global Data Strategy, Ltd. 2017
DATAVERSITY Data Architecture Strategies
• January Panel: Emerging Trends in Data Architecture – What’s the Next Big Thing?
• February Building an Enterprise Data Strategy – Where to Start?
• March Modern Metadata Strategies
• April The Rise of the Graph Database: Practical Use Cases & Approaches to Benefit your Business
• May Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
• June Artificial Intelligence: Real-World Applications for Your Organization
• July Panel: Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic Asset
• August Data Lake Architecture – Modern Strategies & Approaches
• Sept Master Data Management: Practical Strategies for Integrating into Your Data Architecture
• October Business-Centric Data Modeling: Strategies for Maximizing Business Benefit
• December Panel: Self-Service Reporting and Data Prep – Benefits & Risks
5
Next Year’s Line Up for 2018 – New, Broader Focus
6. Global Data Strategy, Ltd. 2017
What We’ll Cover Today
• Data Governance is often referred to as the people, processes, and policies around data and
information, and these aspects are critical to the success of any data governance implementation.
• But just as critical is the technical infrastructure that supports the diverse data environments that
run the business.
• Data models can be the critical link between business definitions and rules and the technical
data systems that support them. Without the valuable metadata these models provide, data
governance often lacks the “teeth” to be applied in operational and reporting systems.
• Self Service data prep and analytics add additional complexity, as a more diverse set of users has
access to manipulate, model, and report on enterprise data
• This presentation will offer some practical guidance on how to integrate governance to balance
Enterprise Standards with Self-Service Agility
6
7. Global Data Strategy, Ltd. 2017
Business Drivers for Data Architecture
• As more organizations see
data as a strategic asset, and
with the drive towards Digital
Business Transformation on
the rise, the need to analyze,
understand & govern core
data assets continue to be a
key goal.
7
What’s Driving the Need?
From Trends in Data Architecture 2017, by Donna
Burbank & Charles Roe
8. Global Data Strategy, Ltd. 2017
Who is Responsible for Creating a Data Architecture?
• With a greater business focus on
data and a wider range of
technologies associated with Data
Management…
• … it is not surprising that there is a
concomitant rise in the diversity of
roles responsible for developing a
Data Architecture.
• … the role of the data architect, not
surprisingly, continues to play a
large role.
8
Wide Range of Responses shows Need for Collaboration
Collaboration is Key
From Trends in Data Architecture 2017, by Donna
Burbank & Charles Roe
Wide range
of roles
9. Global Data Strategy, Ltd. 2017
Data
Modeling
Data
Quality
Data
Governance
Data Modeling, Data Governance & Data Quality – the Virtuous Circle
What is Data Quality?
Data that is demonstrably fit for
business purposes
Provides the
means to
deliver
Drives the
need for
What is Data Governance?
A continuous process of managing
and improving data for the benefit
of all stakeholders
What is Data Modeling?
A process for translating business rules
& definitions to the technical data
systems & structures that support them
Scopes & helps
prioritize
10. Global Data Strategy, Ltd. 2017
How Data Modeling, Governance & Quality Interact
DATA MODELING DATA QUALITY DATA GOVERNANCE
Maps out the overall relationships
between data entities and their
attributes
Data profiling identifies & baselines the
current state of key data entities and
attributes
Provides an overarching strategic
framework for data improvement
Helps to scope and prioritize the data
that really matters for Governance and
DQ improvement
Raises awareness of DQ issues and
problems in source data, and their
impact
Assigns accountable data owners and
data stewards to lead data
improvement efforts
Starts to identify the key data
stakeholders who may become data
owners & data stewards
Delivers the real benefits of better data
through data cleanse, enrichment &
sustenance
Ensures the business knowledge to
define business rules and DQ thresholds
Acts as a communication tool to
improve understanding of the data
estate
Enables automation of business rules
enforcement via the deployment of
data quality tools
Ensures data improvement aligns and
evolves with changing business needs
First step in defining DQ KPIs and
metrics
Provides an empirical foundation for
action and improvement – KPIs and
metrics
Creates the cross-business teams
needed to tackle data problems &
issues
Creates the link from business rules >
data definitions > database design &
implementation
Helps build the business case for
investment in a more strategic approach
Helps to build and deliver the business
case for improvement
10
11. Global Data Strategy, Ltd. 2017
Data Governance – Overarching Framework
Organization &
People
Process &
Workflows
Data Management &
Measures
Culture &
Communication
Vision & Strategy
Tools & Technology
Business Goals &
Objectives
Data Issues &
Challenges
Managing the Complex Interactions between Technology, Process and People
12. Global Data Strategy, Ltd. 2017
Data Improvement - From Firefighting to Fire Prevention
12
13. Global Data Strategy, Ltd. 2017
What is a Data Model?
13
Translates Business Rules & Definitions… …to the Technical Data Systems & Structures that Support Them
14. Global Data Strategy, Ltd. 2017
Data Modeling is Hotter than Ever
14
In a recent DATAVERSITY survey,
over 96% of were engaged in Data
Modeling in their organizations.
15. Global Data Strategy, Ltd. 2017
What is a Data Model?
15
Translates Regulations, Policies & Procedures… …to the Technical Data Systems & Structures that Support Them
Regulation -
e.g. GDPR
Policy
“All Personally Identifiable
Information (PII) must be
anonymized for the purpose
of information sharing
between departments. “
Which data fields constitute PII
in our databases?
16. Global Data Strategy, Ltd. 2017
Technical & Business Metadata
• Technical Metadata describes the structure, format, and rules for storing data
• Business Metadata describes the business definitions, rules, and context for data.
• Data represents actual instances (e.g. John Smith)
16
CREATE TABLE EMPLOYEE (
employee_id INTEGER NOT NULL,
department_id INTEGER NOT NULL,
employee_fname VARCHAR(50) NULL,
employee_lname VARCHAR(50) NULL,
employee_ssn CHAR(9) NULL);
CREATE TABLE CUSTOMER (
customer_id INTEGER NOT NULL,
customer_name VARCHAR(50) NULL,
customer_address VARCHAR(150) NULL,
customer_city VARCHAR(50) NULL,
customer_state CHAR(2) NULL,
customer_zip CHAR(9) NULL);
Technical Metadata
John Smith
Business Metadata
Data
Term Definition
Employee
An employee is an individual who currently
works for the organization or who has been
recently employed within the past 6 months.
Customer
A customer is a person or organization who
has purchased from the organization within
the past 2 years and has an active loyalty card
or maintenance contract.
17. Global Data Strategy, Ltd. 2017
Business vs. Technical Metadata
• The following are examples of types of business & technical metadata.
17
Business Metadata Technical Metadata
• Definitions & Glossary
• Data Steward
• Organization
• Privacy Level
• Security Level
• Acronyms & Abbreviations
• Business Rules
• Etc.
• Column structure of a database table
• Data Type & Length (e.g. VARCHAR(20))
• Domains
• Standard abbreviations (e.g. CUSTOMER ->
CUST)
• Nullability
• Keys (primary, foreign, alternate, etc.)
• Validation Rules
• Data Movement Rules
• Permissions
• Etc.
18. Global Data Strategy, Ltd. 2017
Human Metadata
• Much business metadata and the history of the business exists in employee’s heads.
• It is important to capture this metadata in an electronic format for sharing with others.
• Avoid the dreaded “I just know”
18
Avoid the dreaded “I just know”
Part Number is what used to
be called Component
Number before the
acquisition.
Business Glossary
Metadata Repository
Data Models
Etc.
Collaboration Tools
19. Global Data Strategy, Ltd. 2017
Business Definitions
From Data Modeling for the Business by
Hoberman, Burbank, Bradley, Technics
Publications, 2009
20. Global Data Strategy, Ltd. 2017
Publishing Business Definitions in a Data Model
20
• Data Models are a great place to store business definitions
• Display them on the model for a business audience
• Store them in the model repository for reuse across the organization (various users, tools, etc.)
21. Global Data Strategy, Ltd. 2017
Marketing Database
Netezza
Creating a Technical Data Inventory
• Data models & the associated metadata can create a real-world inventory of the data storage
associated with key business data domains in the control of a data governance program.
21
Linking business definitions to technical implementations
Customer
Customer Database
Oracle
Sales Database
DB2
SAP
Data Lake on
Hadoop
Customer Database
SQL Server
CRM Database
POS Data Store
22. Global Data Strategy, Ltd. 2017
Data Lineage
• In the data warehouse example below, metadata for CUSTOMER exists in a
number tools & data stores.
• This lineage can be tracked in many data modeling tools & associated metadata &
governance solutions.
22
Sales Report
CUSTOMER
Database Table
CUST
Database Table
CUSTOMER
Database Table
CUSTOMER
Database Table
TBL_C1
Database Table
Business Glossary
ETL Tool ETL Tool
Physical Data Model
Physical Data Model
Logical Data Model
Dimensional
Data Model
BI Tool
23. Global Data Strategy, Ltd. 2017
Technical Metadata Makes Data Governance Actionable
• Data models can help take the business rules & definitions defined in policies and make them
actionable in physical systems, maintaining a lineage & audit trail.
23
Data models are a good vehicle for this
Policies & Procedures Business Rules & Definitions Technical Implementation Audit & Lineage
24. Global Data Strategy, Ltd. 2017
Data Quality Improvement
24
Why bother?
90% OF ALL DATA HAS BEEN
CREATED IN THE LAST 2 YEARS
AVERAGE BUSINESS DATA
VOLUMES DOUBLE EVERY
1.2 YEARS
2.5 QUINTILLION
GRAINS OF SAND
ON EARTH
7.5 QUINTILLION
BYTES OF NEW DATA
CREATED EVERY DAY
25. Global Data Strategy, Ltd. 2017
Data Quality Problems - Recent Evidence
25
Source:
Only 3% of Companies’ Data
Meets Basic Quality Standards
Tadhg Nagle, Thomas C. Redman
& David Sammon
Harvard Business Review
September 11 2017
26. Global Data Strategy, Ltd. 2017
Some Industry Statistics
Raw data used in Self-Service Analytics and BI environments is
often so poor that many data scientists and BI professionals
spend an estimated 50 – 90% of their time cleaning and
reformatting data to make it fit for purpose.
Source: DataCenterJournal.com
Correcting poor data quality is a Data Scientist’s least favorite
task, consuming on average 80% of their working day
Source: Forbes 2016
Lack of effective Data Governance and the absence of shared
data definitions and metadata cited as main impediments to
the success of Data Lakes
Source: Radiant Advisors 2015
The US economy loses $3.1 trillion a year
because poor data quality
Source: Artemis Ventures
27. Global Data Strategy, Ltd. 2017
Traps for the Unwary – Why DQ & Data Governance Can Fail
Lack of business leadership and commitment
Failure to link DQ / DG to organizational goals and
benefits
Failure to focus on the data that really matters
Giving people data responsibility but not equipping
them to succeed
Placing too much emphasis on data monitoring and not
data improvement
Thinking new technology alone will solve the problems
Forgetting DQ / DG must embrace all who use data
across an organization
Not delivering business value early and regularly
28. Global Data Strategy, Ltd. 2017
Why It Can Be Hard - the Horizontal Data Flow
Sales Operations Dispatch Finance
CUSTOMER DATA
PRODUCT DATA
FINANCE DATA
EMPLOYEE DATA
29. Global Data Strategy, Ltd. 2017
The Newton’s Cradle Effect
29
Problems often emerge far away from the cause
30. Global Data Strategy, Ltd. 2017
Creating the Data Improvement ‘Sweet Spot’ – Focus on Key Data
30
Data
Governance
Data
Modeling
Data
Quality
Improving core data through Data Modeling, Data Governance & Data Quality
Core
Data
‘Sweet Spot’
DATA GOVERNANCE
A management
framework for data
accountability & data
improvement
DATA QUALITY
Approaches & tools for
improving data accuracy,
completeness &
consistency
DATA MODELING
The visual representation
of data relationships &
their physical storage in
technical platforms
CORE DATA
Data which is widely used by
many people & processes
across the business and which
is critical to business success
31. Global Data Strategy, Ltd. 2017
Implement “Just Enough” Data Governance
• Know what to manage closely and what to leave alone
• As a general rule, the more the data is shared across & beyond the organization, the more formal
governance needs to be
31
Core Enterprise
Data
Functional & Operational
Data
Exploratory Data
Reference &
Master Data
Core Enterprise Data
• Common data elements used by multiple
stakeholders across Bus, LOBs, functional areas,
applications, etc.
• Highly governed
• Highly published & shared
Functional & Operational Data
• Lightly modeled & prepared data for
limited sharing & reuse
• Collaboration-based governance
• May be future candidates for core data
Exploratory Data
• Raw or lightly prepped data for
exploratory analysis
• Mainly ad hoc, one-off analysis
• Light touch governance
Examples
• Operational Reporting
• Non-productionized analytical model data
• Ad hoc reporting & discovery
Examples
• Raw data sets for exploratory analytics
• External & Open data sources
Examples
• Common Financial Metrics: for Financial & Regulatory Reporting
• Common Attributes: Core attributes reused across multiple areas
(e.g. Customer name, Account ID, Address)
Master & Reference Data
• Common data elements used by multiple stakeholders
across functional areas, applications, etc.
• Highly governed
• Highly published & shared
Examples
• Reference Data: Procedure codes, Country Codes, etc.
• Master Data: Location, Customer, Product
32. Global Data Strategy, Ltd. 2017
The Rise of Self-Service BI, Analytics, & Data Prep
• The interest in self-service data reporting has increased among data-savvy
business users.
• The availability of tools & data sets has made it easier for business people to do their
own data manipulation & reporting
• Self Service BI & Data Manipulation – the tools are slick!
• Accessible Data & Open Data Sets – the amount of data available is amazing!
• Tech-Savvy Business Users – this isn’t any harder than a spreadsheet!
• While this offers great opportunities, it can also be fraught with challenges.
• Data modelers and the models & metadata they create can make the job of business
intelligence easier for both BI professionals and the casual BI reporting user
• Particularly for enterprise-wide, standardized data
• But what about non-standard, non-relational, and discovery data?
32
33. Global Data Strategy, Ltd. 2017
The Self-Service User
33
“If there are standardized
data sets, I’d love to use
them!”
e.g. Master Data, Data Warehouse
“Published documentation,
metadata, & standard
definitions are super-helpful!”
e.g. Glossaries, data models, etc.
“I want to integrate these data
sets with my own exploratory
data for analysis & modeling!”
e.g. Self-Service Data Prep & Analysis Tools
“How can I leverage what other
people have done, and see
what is most relevant?
e.g. Data Cataloguing & Crowdsourcing
34. Global Data Strategy, Ltd. 2017
Crowdsourcing Governance & Metadata Definitions
• Many data governance projects (& vendors) are embracing the concept of “crowdsourcing”. i.e. The
Wikipedia vs. Encyclopedia approach
• Open editing
• Popularity & Usage Rankings
• Dynamically changing
34
Encyclopedia Wikipedia
• Created by a few, then published as read-only
• Single source of “vetted” truth
• Static
• Created by a by many, edited by many
• Eventual consistency with multiple inputs
• Dynamic
For Standardized, Enterprise Data Sets For Self-Service Data Prep & Analytics
35. Global Data Strategy, Ltd. 2017
Harnessing “Tribal Knowledge”
35
Usage Ranking
• Which:
• Definitions are most
complete & helpful?
• Algorithms offer a helpful
starting point?
• Queries offer great logic
to share?
• Etc.
Helpfulness Ranking
• Which:
• Queries are others using?
• Tables are accessed the
most?
• Glossary terms are most
often searched?
• Etc.
Collaboration & Crowdsourcing
Term: Part Number
Alternate Names: Component Number
Definition:
A part number is an 8 digit alphanumeric field that uniquely
identifies a machine part used in the manufacturing process.
Is this truly the same as the old Component
Number? That was a 10 digit numeric field. It
didn’t have letters.
Yes, it is. I had the same problem for the
finance app, and I wrote a quick program to
convert the numbers. We just strip off the first
two chars now. Click here to find it.
36. Global Data Strategy, Ltd. 2017
Finding the Right Balance
36
• When implementing successful data governance in today’s rapidly-changing, self-service data
landscape, it is important to find a balance between:
Standards-based
Governance
The two methods work well together, using the right
approached depending on the data usage.
Collaboration-based
Governance
• Well-suited for enterprise-wide
data standards • Well-suited for self-service data
preparation & analytics
37. Global Data Strategy, Ltd. 2017
Summary
• Data governance requires a mix of people, processes, and technologies
• Data models & metadata support the policies & procedures defined by data governance
• Data model metadata supports actionable data governance through
• Linking business & technical definitions & business rules
• Providing standardization & consistency
• Supporting data lineage & audit trails
• It is important to establish the right level of governance for each unique data use case
• Self-Service data prep & analytics require a new paradigm for “crowdsourcing” metadata
• A combination of standards-driven + collaborative governance provides a powerful mix that offers
value across the organization.
38. Global Data Strategy, Ltd. 2017
About Global Data Strategy, Ltd
• Global Data Strategy is an international information management consulting company that specializes
in the alignment of business drivers with data-centric technology.
• Our passion is data, and helping organizations enrich their business opportunities through data and
information.
• Our core values center around providing solutions that are:
• Business-Driven: We put the needs of your business first, before we look at any technology solution.
• Clear & Relevant: We provide clear explanations using real-world examples.
• Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s
size, corporate culture, and geography.
• High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of
technical expertise in the industry.
38
Data-Driven Business Transformation
Business Strategy
Aligned With
Data Strategy
Visit www.globaldatastrategy.com for more information
39. Global Data Strategy, Ltd. 2017
DATAVERSITY Data Architecture Strategies
• January Panel: Emerging Trends in Data Architecture – What’s the Next Big Thing?
• February Building an Enterprise Data Strategy – Where to Start?
• March Modern Metadata Strategies
• April The Rise of the Graph Database: Practical Use Cases & Approaches to Benefit your Business
• May Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
• June Artificial Intelligence: Real-World Applications for Your Organization
• July Panel: Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic Asset
• August Data Lake Architecture – Modern Strategies & Approaches
• Sept Master Data Management: Practical Strategies for Integrating into Your Data Architecture
• October Business-Centric Data Modeling: Strategies for Maximizing Business Benefit
• December 5 Panel: Self-Service Reporting and Data Prep – Benefits & Risks
39
Next Year’s Line Up for 2018 – New, Broader Focus
40. Global Data Strategy, Ltd. 2017
White Paper: Trends in Data Architecture
40
Free Download
• Available for download on dataversity.net
41. Global Data Strategy, Ltd. 2017
White Paper: Emerging Trends in Metadata Management
• Download from
www.globaldatastrategy.com
• Under ‘Whitepapers’
41
Free Download