Do you have data? Do you have users? Do they use that data to solve problems? Then you have a data architecture. Maybe your architecture is organic and accidental, or maybe it’s an accumulation of the latest practices and technologies you heard about on Stack Overflow.
Spoiler: data architecture is about people and how they use data, not the latest pipeline framework or AI model. Data architecture is about enabling users to be productive, not adding the next “shiny object” and then blaming the users for using it wrong. What you design needs to focus on a different subject than either technology or data.
Join Kevin Bogusch, Ecosystem Architect, as he talks with Mark Madsen, Fellow at the Technology Innovation Office, on the crucial elements you’re missing in a successful data architecture: people and process. Find out why Mark says, “don’t buy one problem to solve another problem.”
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Data Architecture - Focus on People
1. Data
Architecture
OMG – It’s Made of People!
Mark Madsen, Teradata
@markmadsen
https://www.linkedin.com/in/markmadsen/
2. The Man. The Myth. The Mark.
Fellow
Technology & Innovation Office
President
Autonomous Robotics
Artificial Intelligence
3. Data work is not easy. Ask any user.
Technology exists to help the
organization to be more productive
Organizations are made of people
Our goal is to make it easy for
organizations (people) to use data
Data architecture is the foundation
on which this work depends
Why This Topic? Have you tried
turning it off
and on again?
4. What Do We Mean By Data Architecture?
Data Storage? Data Models? Data Technologies?
5. What Do We Mean By Data Architecture?
Data Storage? Data Models? Data Technologies?
Data Architecture is
Processes, Standards, and Policies
that address an organization’s collection, storage,
management, and use of data.
It tells you something about what and how
but doesn’t dictate implementation.
You should be able to answer these
key questions:
1. What do you collect, and why?
2. Where do you keep data, and why?
3. How do you organize, curate, and
integrate data?
7. Where to focus?
Do you focus on organizing
books?
That’s the data-first
approach. Organize
everything up front without
knowing how it is used.
Organize the data wrong
and nobody can find or use
anything.
8. Where to focus?
Do you focus on the
building that stores books?
That’s the technology-first
approach. Don’t organize
anything in advance. Use
technology to sort it out.
You may have a catalog of
all the contents. Good luck
finding what you need.
9. Focus on the people and what they do.
Not the books.
Not the building.
10. What people say
I want self-service!
What they mean
Users think “self-service” in
terms of a finished data
product – self service equals
an answer to a question.
11. What people say
I want self-service!
What developers
hear
Developers think
“self-service” is data access,
which means the user must
be self-reliant.
12. Hearing a need, ask:
“Why is this an unmet need?”
Bad IT and organizational policies cause more problems
than technology failures or bad data.
Policy is a part of architecture that is ignored.
14. • Get a quick answer
• Solve a one-off problem
• Analyze causes of a problem
• Build a predictive model
• Make repetitive decisions
• Use data in a routine process
• Make a complex decision
• Do experiments and analyze results
• Explain a situation to someone else
• Choose a course of action
• Convince others to take action
Architecture focuses on
what people want to do
15. How To Understand What Data Is Being Used?
Monitor the data environments.
Capture what data is used.
Catalogs of data don’t tell you anything
about use – and use changes over time.
This means users shouldn’t control
storage. Copies they make outside your
view are invisible.
So: you must give them a place to work
and not restrict it.
Focus on visibility of use
16. Different Views – Data and Users
The value of data is tied to its use.
This shows relationships between
people and data used.
70% of the data is used and reused
constantly. 30% of the data is used
by one or a few people, often new
data with undetermined value.
Usage information shows where and
how you should focus curation –
what you need to manage based on
the people using data.
17. Finally: establish curation practices based on data use
Curation is about what data is used, by whom, and for what purposes
Collect, Label, Link Categorize, Organize Index, Catalog, Place
The amount of available data
is vast. You can’t store it all.
You can’t analyze it all.
Choose wisely.
There’s a difference between
organizing datasets and data
modeling. One is oriented to
datasets and their use, and
one to the contents of the
datasets.
An important and oft-ignored
element of data architecture is
making sure the data is
findable and accessible by the
people who need it. This is a
curation task, not a data
management task