COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
Taking advantage of Big Data analytics
1. TAKING ADVANTAGE
OF BIG DATA ANALYTICS
Vaults of structured and unstructured data can point the way to higher
revenue and competitive advantages. But efforts to capture and analyze
big data need careful planning and firm shepherding. BY RICK SHERMAN
UNLOCKING THE BUSINESS BENEFITS IN BIG DATA
2
SMALL STEPS BRING
BIG REWARDS
3
ARCHITECTING
A SUCCESSFUL
DEPLOYMENT
4
WHO’S ON
THE TEAM?
1
BIG DATA
QUESTION TIME
2. HOME
BIG DATA
QUESTION TIME
SMALL STEPS
BRING BIG
REWARDS
ARCHITECTING
A SUCCESSFUL
DEPLOYMENT
WHO’S ON
THE TEAM?
Numerous stories have examined
its use in applications from tracking
customer sentiment and identifying
social media trends to successfully
predicting the outcome of the 2012
U.S. presidential election. Based on
the amount of attention—and yes,
hype—that big data technologies
are receiving, one would be forgiven
for thinking that their adoption and
deployment is already pervasive.
But the fact is that most companies
are still trying to get a handle on
what big data is, how to effectively
manage it and how to get tangible
business benefits from their invest-
ments in big data tools.
The first of those three questions
is easy to answer: Big data envi-
ronments consist of high-volume
pools of information, often includ-
ing a variety of structured and
unstructured data types that are
updated frequently. For example,
data captured from social media
sites, Internet clickstreams, server
logs, sensors and mobile networks
is commonly found in big data sys-
tems. The goal is finding business
value in that information—analytical
insights that point to new revenue
opportunities and ways to improve
internal processes and operations.
But managing and using big data
isn’t so easy. In order to plan and
implement a successful big data
analytics project, an organization
needs to consider a range of dif-
ferent technologies and determine
what kind of architecture it is going
to deploy. Resource requirements
are another key factor to take into
account, as are the scope of the
project and how it should be struc-
tured and managed. Let’s take a
closer look at those four elements
and how best to approach them to
put deployments of big data analyt-
ics tools and applications on the
right track.
Initially, many big data projects
flew under IT’s radar; they were
launched independently by data
analysts, programmers and technol-
ogy-savvy users taking advantage of
TAKING ADVANTAGE OF BIG DATA ANALYTICS 2
“BIG DATA” IS A HOT TOPIC NOT ONLY IN IT CIRCLES AND
TECHNOLOGY PUBLICATIONS BUT ALSO IN BUSINESS
MAGAZINES AND OTHER MAINSTREAM MEDIA OUTLETS.
The fact is that most
companies are still
trying to get a handle
on what big data is.
3. TAKING ADVANTAGE OF BIG DATA ANALYTICS 3
HOME
BIG DATA
QUESTION TIME
SMALL STEPS
BRING BIG
REWARDS
ARCHITECTING
A SUCCESSFUL
DEPLOYMENT
WHO’S ON
THE TEAM?
the open source nature of Hadoop
and other components of the big
data technology stack. But now that
big data is squarely in the spotlight,
projects often start off like the first
generation of data warehouse,
enterprise reporting and business
intelligence (BI) dashboard projects
did—with IT saying, “If we build it,
they will come.” Whenever a new
wave of technology is promoted so
extensively, there’s a tendency for
enterprises to buy into the hype and
assume that the new technology fits
their needs. Frequently, the result is
expensive projects that fail to meet
expectations and set back future
efforts to invest in, and benefit from,
the technology in question.
1
BIG DATA
QUESTION TIME
Before blithely beginning a big data
project, get answers to the following
questions:
D Why is the business interested in
big data? What are the long-term
business objectives for implement-
ing big data analytics applications?
Is it, for example, to track what
is trending on social networks?
Increase the effectiveness of mar-
keting campaigns? Improve supply
chain performance? Knowing the
“why” is essential to establishing
the business scope and determining
the expected return on investment
(ROI) for these projects.
D Where in the organization is big
data going to be used? Once you
know why you’re building a big data
analytics system, you need to cata-
log the business processes, applica-
tions and data sources that will be
involved. That information is essen-
tial to assessing the impact not just
from a technology perspective but
also from the standpoint of people,
processes and the corporate culture
so you can develop a change man-
agement plan up front. Not doing
so can imperil efforts to unlock the
business value of big data.
D What kinds of information need
to be included in your big data imple-
mentation? Discussions about big
data often concentrate on data from
social media sites such as Facebook,
LinkedIn and Twitter, but as men-
tioned above, there’s a lot more to
it than that. To begin the process
of planning a big data analytics
deployment, project managers need
to determine which of the various
types of data that could be captured
are wanted for analysis by business
users. Answering that question will
also help identify applicable big data
BIG DATA QUESTION TIME
4. TAKING ADVANTAGE OF BIG DATA ANALYTICS 4
HOME
BIG DATA
QUESTION TIME
SMALL STEPS
BRING BIG
REWARDS
ARCHITECTING
A SUCCESSFUL
DEPLOYMENT
WHO’S ON
THE TEAM?
applications designed to handle
specific data types.
A critical factor that many orga-
nizations ignore at this stage is inte-
grating structured transaction data
with unstructured forms of informa-
tion as part of an overall data ware-
housing and big data architecture.
It’s terrific, for example, to use tex-
tual data from social networks and
other sources to analyze how well
your marketing campaigns are being
received by customers and pro-
spective buyers. But even greater
business value can be derived by
correlating that information with
analytical findings on how valu-
able individual customers are—how
much they’ve bought, what the prof-
it margins were, whether they’re
repeat buyers and how much it
costs to retain them. Big data sys-
tems can become big data silos if
they’re designed solely for analyzing
certain information for its own sake,
without a broader focus.
D How big does your big data sys-
tem need to be? Once the required
data types have been identified,
the anticipated data volumes and
update frequency—that is, veloc-
ity—need to be factored into your
planning. Those two characteristics
are often coupled with data variety
and referred to as the three V’s of
big data. Although rapid updates
and significant data volumes are
commonly assumed, the real-
ity is that the needs of companies
vary widely based on size and the
intensity of information usage.
Accurately assessing your organi-
zation’s requirements will help you
determine the architecture and the
technology investments needed to
effectively capture, manage and
analyze big data.
2SMALL STEPS
BRING
BIG REWARDS
It’s tempting to believe that big data
analytics success is within your
grasp provided you buy the right
technology and commit enough
resources to the project. In real-
ity, a big data deployment typically
requires significant systems and
data integration work; introduces
new tools and analytics techniques;
and calls for new skills on both the
systems management and analytics
sides. Trying to boil the ocean will
result only in doing too much, too
fast—a recipe for frustration and
failure.
For better results, an organization
should plan to build its big data envi-
ronment incrementally and iterative-
ly. An incremental program is the
most cost- and resource-effective
SMALL STEPS BRING BIG REWARDS
5. TAKING ADVANTAGE OF BIG DATA ANALYTICS 5
HOME
BIG DATA
QUESTION TIME
SMALL STEPS
BRING BIG
REWARDS
ARCHITECTING
A SUCCESSFUL
DEPLOYMENT
WHO’S ON
THE TEAM?
approach; it also reduces risks com-
pared with an all-at-once project,
and it enables the organization to
grow its skills and experience levels
and then apply the new capabilities
to the next part of the overall project.
An architectural framework still
needs to be established early on to
help guide the plans for individual
elements of a big data program. But
because the initial big data efforts
likely will be a learning experience,
and because technology is rapidly
advancing and business require-
ments are all but sure to change, the
architectural framework will need to
be adaptive.
3
ARCHITECTING
A SUCCESSFUL
DEPLOYMENT
Hadoop, MapReduce, NoSQL data-
bases and other big data technolo-
gies initially were developed by
companies looking to store and
analyze large amounts of unstruc-
tured and semi-structured data that
weren’t a good fit for mainstream
relational databases—Google and
Yahoo, for example. The open
source technologies have been
used successfully by those organi-
zations and other early adopters,
and they’re now widely available in
commercial versions supported by
big data software vendors. But a key
issue to consider in designing a big
data architecture is how much of
your data analysis needs can be met
by Hadoop and its cohorts on their
own.
As I wrote earlier, combining the
unstructured data prevalent in big
data systems with structured trans-
action data provides the most com-
plete view of a company’s business
operations, enabling it to deploy
analytics applications that can yield
valuable insights to aid in improving
business processes and increas-
ing revenue. This data integration
requirement drives the need to cre-
ate an enterprisewide architecture
that includes both types of data.
In such cases, the architectural
options include moving all of the
relevant data to either a big data
platform or a traditional enterprise
data warehouse for analysis, or
building a hybrid architecture that
incorporates and ties together the
two kinds of systems.
Ultimately, because of the fun-
damental differences between
ARCHITECTING A SUCCESSFUL DEPLOYMENT
An architectural
framework needs to
be established early on
to help guide the plans
for individual elements
of a big data program.
6. TAKING ADVANTAGE OF BIG DATA ANALYTICS 6
HOME
BIG DATA
QUESTION TIME
SMALL STEPS
BRING BIG
REWARDS
ARCHITECTING
A SUCCESSFUL
DEPLOYMENT
WHO’S ON
THE TEAM?
structured and unstructured data,
it doesn’t make sense to try to
host both types of data on either
of the different platforms. The best
approach is a mixed architecture
that could also include data marts
and specialized analytical data-
bases, such as columnar systems.
Choosing the hybrid option creates
a logical infrastructure that lever-
ages existing IT investments in data
warehouses and relational databas-
es while enabling organizations to
channel data processing and analyt-
ics workloads to the most appropri-
ate platforms.
Preconfigured appliance systems
are also emerging from a variety of
vendors for use in big data analyt-
ics applications. The appliances mix
hardware and software components
and offer the promise of lower costs
and shorter implementation times
compared with manually piecing
together big data systems; they can
also reduce deployment risks and
minimize the level of new develop-
ment and management skills that
are needed in organizations.
In addition, database and data
integration vendors have added
capabilities for exchanging data
between big data systems, data
warehouses and analytical databas-
es, eliminating the need for exten-
sive amounts of custom integration
coding. For example, connector
software for linking Hadoop
ARCHITECTING A SUCCESSFUL DEPLOYMENT
MIX IT UP
a hybrid architecture for big data analytics can include the following
components:
n Hadoop and other big data tools for storing, managing and analyzing
unstructured data;
n A data warehouse and data marts for storing transaction data and the
aggregated results of unstructured data analysis processes;
n Standalone analytical databases for doing heavy-duty data analysis;
n Data integration technologies—such as extract, transform and load tools,
data virtualization software and Hadoop connectors—for tying together
information on different platforms and delivering it to data analysts and
business users; and
n Business intelligence and analytics tools.
7. TAKING ADVANTAGE OF BIG DATA ANALYTICS 7
HOME
BIG DATA
QUESTION TIME
SMALL STEPS
BRING BIG
REWARDS
ARCHITECTING
A SUCCESSFUL
DEPLOYMENT
WHO’S ON
THE TEAM?
clusters and relational databases
has become widely available.
Because of the relative immatu-
rity of big data technology, and the
under-the-radar nature of many
big data projects, implementations
often have been treated as the Wild
West of analytics application devel-
opment and management, with no
rules or corporate standards. But
as the focus of big data projects
shifts to producing tangible and sus-
tainable business value, more dis-
cipline is needed. Building a hybrid
architecture to support big data
analytics processes also makes it
easier to apply internal policies and
procedures on data management,
governance, quality, security and
privacy.
4
WHO’S ON
THE TEAM?
An often-overlooked aspect of suc-
cessful big data analytics projects
is the importance of getting the
right people with the right skills in
place, both to develop and man-
age the systems and to use them.
Assembling a project team is com-
plicated by a shortage of technical
and analytics professionals with big
data experience. As a result, orga-
nizations likely will need to train
existing employees to handle roles
they can’t fill through hiring. That’s
another good reason to adopt a
strategy of incrementally building a
big data environment.
The required IT resources include
a mix of architects, developers and
business analysts, the latter to help
identify relevant data and develop
project requirements. On the user
side, data scientists and other ana-
lytics professionals with skills in
realms such as predictive and sta-
tistical modeling as well as text ana-
lytics are needed to do the heavy
lifting on analyzing data. In addition
to their analytics skills, those work-
ers must have extensive business
and industry knowledge, or work
side by side with business users
who can provide that know-how,
in order to generate useful insights
from big data analytics tools.
In the past, predictive analytics,
data mining and statistical analysis
applications often were constrained
by limited data volumes and an
inability to include nontransactional
data types. With the advance of
big data technologies, analytics
WHO’S ON THE TEAM?
With the advance of
big data technologies,
analytics pros have been
able to expand the breadth
and depth of their work.
8. TAKING ADVANTAGE OF BIG DATA ANALYTICS 8
HOME
BIG DATA
QUESTION TIME
SMALL STEPS
BRING BIG
REWARDS
ARCHITECTING
A SUCCESSFUL
DEPLOYMENT
WHO’S ON
THE TEAM?
pros have been able to expand the
breadth and depth of their work,
increasing its potential business
value. Data scientists don’t come
cheap; if your organization doesn’t
already have people who can ana-
lyze big data in-house, hiring them
can be a big budget item—assuming
you’re able to find candidates in the
first place. But the ROI they make
possible can easily justify their
salaries.
There’s no doubt that big data
technologies are currently at the
peak of hyped expectations. And
although there certainly is signifi-
cant business value to be gained
from them, there are also significant
risks because of technology imma-
turity, still-developing deployment
and management methodologies,
and the shortage of available
expertise.
In addition, big data systems run
the risk of being the next data silo
if they’re developed in isolation
from existing BI, analytics and data
warehouse systems. Don’t turn a
blind eye to the challenges and let
your big data analytics initiatives go
down the wrong path. With big data
now on the radar screens not only
of IT managers but also of corporate
and business executives, the suc-
cess—or failure—of projects surely
won’t go unnoticed. n
WHO’S ON THE TEAM?
BIG DATA ANALYTICS ROSTER
The project team for a deployment of big data analytics tools should include
these members:
n Development manager
n Data and systems architects
n Big data developers
(experienced with Hadoop,
NoSQL and other big data
tools)
n Data integration developers
n BI and analytics developers
n Business analysts
n Data scientists or analytics
professionals