Intelligent search-in-the-age-of-big-data-may-2013 (Source: KM World)
1. May 2013
Best Practices in Intelligent Search
in the Age of Big Data
KMWorldSupplement to
Premium Sponsor
Andy Moore . . . . . . . . . . . . . . . . . . . . 2 The Purpose-Driven Search Life
Writing about enterprise search is not the cakewalk it used to be. With customers demanding
more business value, and vendors responding by becoming more “purpose-driven” and
specialized, the search market has fragmented into a series of business applications that only
opaquely rely on “the search engine” to accomplish their tasks. I often call it “the technology
arc.” At first, all you have to do is say “enterprise search,” and you have the attention of the
users and the investors. Then after a while, you have to ask, “What can this new technology
do for me?” Then after a while and the shine off the lily (or however that expression goes),
you need to ask, “Where is the business process improvement...”
Jerome Levadoux, HP Autonomy . . . 4 Revolutionize Your Approach to Knowledge Management
Ninety percent of the world’s data has been created in just the last two years. But when it comes to
information, there is no immediate benefit to simply amassing exabytes of content. The real gains
come when organizations are able to translate, understand and apply the insight that is contained
within this flood of information.
One of the main challenges posed by today’s information explosion is that content is fragmented into
disparate “silos,” such as file servers, collaboration suites, email systems and other repository types.
Also, more information is quickly migrating to the cloud. In this scenario, traditional KM systems fall
short because they are not equipped to derive the intelligence contained in this information. . . .
. . . . . . . . . . . . . . . . . . . . . . . . . 6 Advanced Indexing Technology
Big data. Unstructured data. Semi-structured data. Data is all over the technology news, and
for good reason. It is overwhelming organizations, requiring them to find new ways to operate,
stay competitive, better serve their customers and bring new products to market faster.
Companies are finding themselves with piles of information within multiple channels, locked
away in silos-different systems, different departments, different geographies and different data
types, making it impossible to connect the dots and make sense of critical business information.
Hidden inside streams of structured and unstructured data across cloud, social and on-premise
systems are information relationships that answer questions employees haven't even thought to
ask, but need to be asking. . . .
Martin Garland, . . . . . . . . . . . . . . . . . . 7 Solving the Inadequacies and Failures in Enterprise Search
The inability to identify the value in unstructured content is the primary challenge in any
application that requires the use of metadata. If you aren’t managing it, you won’t find it. At the
most basic level, enterprise search has become inadequate. Bells and whistles abound but the
unsolved problem still exists. Search cannot find and deliver relevant information in the right
context, at the right time. This laissez-faire approach, starting with executive management
on down, illustrates the inability of organizations to elevate search to a key component and
critical enabler for improving business outcomes. An information governance approach that
creates the infrastructure framework to encompass automated intelligent metadata generation,
auto-classification, and the use of goal- and mission-aligned taxonomies is required. . . .
Concept Searching, Inc.
Excerpted from “Measuring
Return on Knowledge in a
Big Data World,”
Coveo
2. business... new entries into the space (as I
said before…) — I wanted to know from
Jerome which of these things seemed to
matter most to him?
“The enterprise search market is enter-
ing its third wave,” he began. And once
Jerome begins, it’s best just to lean in and
listen. “The market was created in the early
2000s, and was driven by the adoption of
portals and the arrival of more and more
sophisticated websites. You need search
engines for those things! The second wave
happened around information compliance
and e-discovery... there was a recession
around 2007, and compliance was consid-
ered a great way (by the vendors) to drive
business. There was a need that could be
matched with a budget,” he explained.
“We are now at the beginning of the
third wave, and it is being driven by two
things: One is big data. I know it’s a buzz-
word, but every buzzword reflects an
underlying truth. And the underlying trend
to big data is that people are trying now to
analyze news kinds of data in novel ways.
And the old techniques don’t work to get
real insights into data,” he said.
“The second truth—and we’re really just
at the beginning of this—is the appearance
of mobile, social and cloud. Information is
becoming much more abundant on one hand,
but also much more siloed, and a lot harder
to find, and that’s causing a lot of headache
in terms of productivity. That’s driving a
whole new need to integrate those silos and
allow people to get value from the informa-
tion. That’s a great new role for search.
“The thing is this,” he continued. “As an
information worker, I have my usual
SharePoint and fileshares and content man-
agement systems, etc., but on top of that
I have SalesForce and WorkDay and
DropBox and Box.net and GoogleDocs...
it’s a flood of silos! I can’t connect those
sources of information. The same is true of
all my social media and collaboration apps.
Yammer and LinkedIn... I’m using all these
things at once, trying to extract knowledge
from this very siloed world. That is the next
opportunity for search.”
He continued, and I’m still leaned in.
“And on the subject of big data... big data
is such a big deal because there are novel
forms of information that people want to
analyze. In the ‘old days’ (he means like 10
years ago, tops, I’m thinking) people
would look primarily at financial data in
rows and columns, and use BI to draw pret-
ty charts, and maybe use some basic level
of analytics to gain value from the infor-
mation. Today, for many different reasons,
and mainly because of the proliferation of
non-database data, people now feel the
need to apply analytics to things like social
data, data on the Web, input from cus-
tomers on their websites, and things like
that. It is coming in totally unstructured, in
random formats and random languages,
sometimes in slang...”
He thought for a minute... “Oh, and
then there’s video,” he added. “Whether it’s
security footage or cameras from surveillance
drones, the amount of video is beyond any-
body’s ability to process. Universities are
streaming their courses, and they need that
material to be usable (and thus searchable).
Same goes for images and voice. So all these
forms of data that nobody bothered to analyze
before is suddenly very important to look at,
analyze and unlock the value hidden within.
The Purpose-Driven
Search Life
Writing about enterprise search is not the
cakewalk it used to be. With customers de-
manding more business value, and vendors
responding by becoming more “purpose-
driven” and specialized, the search market
has fragmented into a series of business ap-
plications that only opaquely rely on “the
search engine” to accomplish their tasks.
I often call it “the technology arc.” At
first, all you have to do is say “enterprise
search,” and you have the attention of the
users and the investors. Then after a while,
you have to ask, “What can this new tech-
nology do for me?” Then after a while and
the shine is off the lily (or however that
expression goes), you need to ask, “Where
is the business process improvement I
should expect for my (fill in the blank)
financial services/manufacturing/health-
care/gardening shop... it no longer is about
the technology underpinnings. It’s about
the work you need to get done.”
No better example is enterprise search
over the last few years. And no better inter-
view could have fallen into my lap than
the opportunity to speak with Jerome
Levadoux, senior vice president for prod-
ucts at (what is now officially known as)
HP Autonomy.
Now, here’s what I can tell you about
that. Not much. The little bit I know, any-
way. Autonomy was once the powerhouse
software license godhead for enterprise
search. Still is an impressive player in the
market, for certain. But in the meantime,
many smaller startups, many of them based
on open-source software and thus carrying a
“hipper than thou” aesthetic, rolled onto the
scene. Autonomy (like FAST Search) found
itself members of a much larger, much
more complex marketplace. And, being as
respected and enduring as they are, they
were natural targets for acquisition.
Which they became. I will not dwell
here on any of the fall-out regarding that
acquisition. It remains to smarter people to
sort that out. In fact, when Jerome and I
talked, the subject didn’t even come up.
But what DID come up was the vastly
and rapidly changing role that search plays
in the information management landscape.
New markets opening up... new companies
experiencing happy upticks in their
May 2013S2 KMWorld
By Andy Moore, Editorial Director, KMWorld Specialty Publishing Group
Andy Moore is the
publisher of KMWorld
Magazine. In addition,
as the editorial
director of the
KMWorld Specialty
Publishing Group,
Andy Moore oversees
the content of the
monthly “KMWorld
Best Practices White
Paper series,” in print
and online, as well as
assisting with the creation and content of several
single-sponsored “positioning papers” per year.
He is also the host and moderator of the popular
KMWorld Web event online broadcast series.
Moore is based in Camden, Maine, and can be
reached at andy_moore@kmworld.com
Andy Moore
“You can’t swing
a cat in the average
organization without
hitting a ‘content
provider.’”
3. May 2013 S3
It makes former database operations pale
in comparison. “For example, ERP databases
are not really that big. After they’re com-
pressed, they’re usually less than a terabyte,”
he claims. (Jerome comes from a background
at SAP, so I take his word on this.) “I don’t
think there’s any company in the world that
has a petabyte of ERP data. So the real ‘big’
data these days are things like sensor data
from machines, or click-stream data from the
Web at large. Then there’s also what I call
‘human data’—text, social feeds, etc. That’s
where big data really comes into play. The
irony is that the online repositories are actual-
ly easier for knowledge workers to access than
many of the legacy tools that were never easy
to access!”And in this I agree. BI systems and
financial analytic tools have always been
cumbersome and “non-democratic.” And that
was a problem. But now we have the opposite
problem—information is now TOO damn
democratic.You can’t swing a cat in the aver-
age organization without hitting a “content
provider” of one kind or another.
A Brave New Market
Here’s how Jerome describes it in his
great article on the following pages:
“Today’s workers are increasingly on-
the-go, embracing the newest mobile tech-
nologies to stay connected and productive,
as they continue to fuel the migration of
content to the cloud. In addition to driving
the great shift of data to the cloud, there
is also fragmentation of knowledge
among multiple systems and repositories.
Information today has many addresses. It
lives in email, on mobile devices, in
Dropbox and Evernote, and in whatever
applications people may choose to install.
The consumerization of content has also
meant that devices and applications are
used for both professional and personal
purposes,” he writes.
“An important distinction to remember
about information growth is that it is not just
about documents. We are becoming a multi-
media-focused world watching and listening
more and reading less. Many meetings are
now conducted remotely over video, and
training sessions are often recorded. This
means your search technology is required to
handle these new and pervasive content for-
mats. The number of files, images, records
and other digital information is predicted to
grow by a factor of 67 from 2009 to 2020,
with corresponding growth of IT profession-
als globally by a wimpy 1.4.”
Coming from a guy from HP
Autonomy, the next part of our conversa-
tion was rather revealing. I asked Jerome
whether it was ironic that Autonomy, that
once wanted to be the omnipotent search
engine for the masses, was now somewhat
softer about that, and was willing to admit
that “enterprise search” was kind of a non
sequitur... that in fact, enterprise search
was more of a strategy than a product, and
the key to success was to develop a plan
that made it all work together.
“Yes, there are specialized search engines
for specialized search problems,” he admit-
ted readily. “For example, we are developing
specialized analytics tools for processing
unstructured data for healthcare applications.
There will be such specialized tools for cer-
tain markets. But at the end of the day, for
personal productivity, people are still looking
for a single way to navigate and access all
their data. They don’t have that today.
There’s an opportunity there.”
Isn’t this problem being addressed by
SharePoint, I wondered, where the solution
solves 80% of the problem, and that’s good
enough for most people?
“That’s what the IT people are always
hoping for... a neat solution where the user
can put everything into a nice little send
box and control what he’s doing. But that
world is no more. I use DropBox and
SendIt and SharePoint, too... IT would like
me to use only one solution, but that just
isn’t the way it is anymore. SharePoint is
only one of the many things I use.”
The same goes for search in SharePoint.
“Microsoft has bundled FAST Search into
SharePoint, but that’s all you can search...
SharePoint! That’s ignoring the fundamental
problem. Information is very distributed,”
he exclaimed.
“The way we look at it is this: We want to
connect people with their networks of people,
associates and repositories, regardless of who
they are and what tool they’re using. That was
the origin of enterprise search, but it will soon
look very different than the original enterprise
search because it’s consumed in such a very
different way. It has to address information
that really didn’t exist 10 years ago, such as
mobile and social. Every vendor of content
management repositories, whether it’s
Microsoft or Google or whomever, all assume
that every user is going to put all their infor-
mation in those repositories. That’s just not
gonna happen.”
The Road Ahead
He couldn’t resist putting his marketing
hat on for a minute: “We currently have
many customers who combine tools for col-
laboration and information management,
and use Autonomy to search across those.
We are still developing other connectors.
But we are now able to look for data across
many different silos, on-premises as well as
in the cloud.
“We have customers who have us host
their search for cloud-based repositories.
But the reality is that most organizations
have some information they prefer to keep
on premises, behind a firewall, and some
they have in the cloud. That hybrid
approach covers about everyone. Except
for some new start-ups maybe, I know of
no company that is willing to put every-
thing in the cloud. So we have to provide a
means to search both on-premise informa-
tion as well as ‘outside’ information. We
also maintain the search engines on behalf
of many companies. We will soon have a
version of IDOL that will run in the cloud-
only, and we expect that will be the trend
that most organizations will follow.”
The next challenge, insists Jerome, will
be how companies deal with unstructured
data. Companies have a lot of it, but
haven’t spent much time thinking about
how to use it. How can we extract value
from it? How can we add this data to
improve a business process?
“As a best practice, you first have to
have a strategy,” he said. “Instead of index-
ing every single piece of data and hoping it
might be useful someday, you first have to
think about: ‘What kind of data do I have?
What can be the value of this? What can I
get rid of?’There’s not a universal solution.
It depends on what kinds of business the
companies are in... what kind of vertical
market do they service... what kinds of
problems are they trying to solve...?”
Jerome talks about a really brave new
world. So do the other writers in this White
Paper. Please read on and join in. T
KMWorld
“We want to connect people with their networks of people,
associates and repositories, regardless of who they are and what tool
they’re using. That was the origin of enterprise search.”
4. In today’s information-rich organization,
there are three key capabilities that an
intelligent search technology must support
to deliver effective knowledge management
in the era of big data:
Build a knowledge graph of the organ-
ization by analyzing social networks and
deriving people’s expertise based on
employee behavior. This will expedite
knowledge transfer, reduce duplicate
efforts, and encourage a collaborative work
environment. Generating a knowledge
graph is a complex process, in the same way
that people’s relationships are multi-faceted
and ever-evolving—it cannot be owned by a
single content management system. That’s
why your search technology must under-
stand relationships by analyzing a variety
of information such as communication pat-
terns, work groups, project hierarchies and
other attributes.
Deliver contextualized search results
personalized to the user. Without a
context-aware solution, the same search
query will mean different things to different
people. The same search query could even
mean different things to the same person
when executed at different points of the
day or on different devices. Your search
technology should use context and profile
data to not only personalize the delivery of
content, but anticipate your needs and
proactively push information.
Search across any repository from any
device. Information is becoming increasingly
fragmented, and the boundaries between per-
sonal and work productivity is blurring. In
our BYOD world, people are putting person-
al and work items in Dropbox, Evernote,
Yammer, Salesforce, Google Drive, and
accessing content from desktops, mobile
devices, tablets, “phablets” and any of the lat-
est devices in the market. They are merging
personal and work identities in their social
networks. Your search technology must be
able to access data from all systems, and
understand the data in all its disparate forms.
Search for Today’s Worker On-the-Go
Today’s workers are increasingly on-the-
go, embracing the newest mobile technolo-
gies to stay connected and productive, as
they continue to fuel the migration of con-
tent to the cloud. In addition to driving the
great shift of data to the cloud, there is also
fragmentation of knowledge among multi-
ple systems and repositories. Information
today has many addresses. It lives in email,
on mobile devices, in Dropbox and
Evernote, and in whatever applications peo-
ple may choose to install. The consumeriza-
tion of content has also meant that devices
and applications are used for both profes-
sional and personal purposes.
An important distinction to remember
about information growth is that it is not
just about documents. We are becoming a
multimedia-focused world—watching and
listening more and reading less. Many
meetings are now conducted remotely over
video, and training sessions are often
recorded. This means your search technol-
ogy is required to handle these new and
Intelligent Search for Big Data
Revolutionize Your
Approach to Knowledge
Management
Ninety percent of the world’s data has
been created in just the last two years. But
when it comes to information, there is no
immediate benefit to simply amassing ex-
abytes of content. The real gains come when
organizations are able to translate, under-
stand and apply the insight that is contained
within this flood of information.
One of the main challenges posed by
today’s information explosion is that con-
tent is fragmented into disparate “silos,”
such as file servers, collaboration suites,
email systems and other repository types.
Information is also being migrated to cloud
deployments, effectively creating another
silo. Traditional KM systems, however, are
not equipped to derive intelligence from
information scattered across different sys-
tems. Unfortunately, when an organization
is unable to leverage information for its
highest value, this can hinder its competi-
tive advantage.
Getting the Most Value
From Information
Within the volumes of big data, busi-
nesses today have more information than
ever before about their employees, their
competitors and their customers. This puts
a greater emphasis on search capabilities to
not only understand information generated
by users, but also understand how the
information flows and connects between
users. In essence, users today each create
their own unique social network. Systems
that can leverage this shift enable businesses
to get more from their information.
Traditional KM vendors have focused
on the capture side of the equation: making
people enter their information into docu-
ment management systems, and relying on
that process to provide intelligence. But
that is not how people work today. To
accommodate these changes, a different
system is needed—one that mirrors the
way people work and think.
May 2013S4 KMWorld
As senior vice
president of products
for HP Autonomy,
Jerome Levadoux is
responsible for the
development and
execution of strategy
and product offerings
in the areas of big data,
content analytics and
content management.
Prior to joining
Autonomy, Jerome was
senior vice president and general manager at SAP,
where he was responsible for product direction,
marketing, partnerships and business development
for the IT Management Suite.
Jerome Levadoux
By Jerome Levadoux, Senior Vice President of Products, HP Autonomy
“The inability to
leverage information
ultimately reduces
its value—and hinders
the business’ ability
to compete.”
5. May 2013 S5
pervasive content formats. The number of
files, images, records and other digital
information is predicted to grow by a factor
of 67 from 2009 to 2020, with correspon-
ding growth of IT professionals globally by
a wimpy 1.4.
By eliminating barriers between reposito-
ries, devices, communication channels and
deployment choices, you free the worker to be
fully engaged and productive wherever they
are. This search experience should be seam-
less and yield consistent results.To experience
real agility, you must be able to search any
repository from any device, and then search
any file, regardless of its format.
This means your search technology
must first have access to the system. If you
can’t search it, you can’t find it. Secondly,
your search technology should be advanced
enough to derive contextual and conceptu-
al signals. If you store all facets of your life
in the cloud, for instance, is your search
technology smart enough to return work-
related items at the top when you query a
common keyword? Or at least categorize
them according to concept? Can it separate
the relevant items from the noise?
There are many advantages to uniting a
fragmented data landscape. Most obvious-
ly, the ability to find an item quickly will
increase a worker’s productivity. But it can
also help organizations monitor social
media or their email systems—even ana-
lyze embedded or attached rich media, to
flag anomalies and find confidential data.
Successful search technologies should
operate like a conversation, tailored to the
user. Much like in the real world, the phrase
“tips for conflict resolution” can mean dif-
ferent things depending on the location or
context. For instance, at a customer site, you
would want to learn more about customer
service-oriented advice. At the office, tips
geared toward coworker or managerial reso-
lutions. And at the vendor site, better com-
munication of goals. For this reason, search
technology should also respond differently
based on your context. Given today’s para-
digm of the ever-mobile professional, it’s
easy to see how big data can use a wide
range of contextual elements—time, loca-
tion, content, even weather—when deliver-
ing search results. This idea is an important
one for today’s organizations—one that can
be applied in a wide range of business
applications.
There is currently a movement under-
way in the search community to deliver
more personalized, intent-based search. But
what about going one step beyond and
anticipating people’s needs? Here is an
example: You arrive in the morning to find
an email from your manager asking for a
presentation about a corporate alliance
formed before you joined the company.You
see that your search engine has already
begun working by locating and presenting
conceptually relevant pieces of information
on your desktop. The first piece is a video
file. When you click to view the video, you
are taken to the exact point in the video that
discusses the alliance; there’s no sifting or
time-wasting through irrelevant frames.You
are also presented with pertinent press
releases, a PowerPoint covering key points,
and the partnership contract. At your finger-
tips is information you had no idea existed,
which may not have included the exact
metadata to produce the same result using a
keyword search. Without an intelligent
search technology, you may have asked
colleagues, searched for files, and then
viewed the entire video to get the kernel of
information you needed.
Own Your Social Network and Build
your Knowledge Graph
Knowledge is not counted by what is
produced, but what is shared. Content rep-
resents only a subset of that knowledge. To
truly maximize an organization’s intellectu-
al assets, you must build a knowledge graph
by identifying and continually updating
people’s respective skills and social net-
works. When this happens, people can
quickly leverage and share expertise, and
connect and collaborate on similar projects.
They can leverage existing work that might
not live in an enterprise system, but perhaps
in the expert’s personal storage. This type of
socializing and accessibility encourages
mentorship and leverages knowledge across
the organization.
A knowledge graph is not something
that a single content management system
can own. People interact across different
systems, and relationships are constantly
evolving. But it is something that an intel-
ligent search technology can infer. In the
era of big data, we have abundant informa-
tion regarding people’s content browsing,
consumption and contribution habits. We
know who emails with which group and
which individuals. We know who is work-
ing cross-functionally on a certain project.
We know who works on the same account
team. Using this type of contextual data to
deliver precise search results can change
the course of business—if it can be under-
stood quickly enough to make a difference.
People often exaggerate or misrepresent
their level of expertise, or they just fail to
keep their profile updated. But sophisticated
technology can properly combine the self-
professed profile with an automated analysis
of content. Data will tell you not what some-
one says they know, but what they actually
know; not who they claim to know, but who
they actually know well. These types of
connections can be used to add another layer
of context to provide a better, more person-
alized search experience.
But understanding a knowledge network
requires the ability to interact with influ-
encers—those individuals who shape opin-
ion within communities—the people you
can turn to for support and insight. While
this may sound simple, the challenge lies in
analyzing significant volumes of data in
near-real time to determine these relationships
and influences in a manner that optimizes an
employee’s search experience.
Search Should be Data-Driven
Data-driven predictions and decisions
are gaining a lot of momentum and visibil-
ity. From Nate Silver’s accurate prediction
of all 50 states’ results in the last presiden-
tial race, to the increasing use of statistics
by law enforcement to combat crime,
people are finding new ways to apply a
data-focused methodology to conventional
thinking. This same data-driven rigor
should be applied to search. Search should
not be a static experience, but constantly
deliver customized conversations based on
the data related to the user, the environment
and the context.
When organizations choose search
technology that understands the concepts
and context of all information—regardless
of where it resides, how it is accessed, or its
format—it is possible to remove the irrele-
vant noise contained in much of big data.
Employees that are able to leverage
the wide array of information that exists
inside and outside their enterprise can gain
an immediate competitive advantage that
you can only get from one source—your
information, in all its forms. T
KMWorld
“Today’s workers
are increasingly
on-the-go, embracing
the newest mobile
technologies to stay
connected and
productive.”
6. A recent Coveo survey of 120 executives
shows only 13% said employees can effec-
tively tap into the collective knowledge of
their organizations.
There are three main reasons why:
Information overload. Knowledge work-
ers, says IDC, spend 15% to 35% of their time
searching for information. And, most people
don’t know where to look, or how to ask
for what they are seeking. This challenge is
heightened by the explosion of big data.
Inability to find information.With infor-
mation scattered across an organization’s
repositories, directories and intranets, employ-
ees cannot easily locate information they need
inordertomakecriticalbusinessdecisions.The
result? Wasted search efforts and decisions
made in the absence of information.
Recreating knowledge that already
exists. Knowledge workers spend more time
recreating existing information than they do
turning out information that does not already
exist. IDC suggests that 90% of the time
knowledge workers spend in creating new
reports or other products is spent in recreat-
ing information that already exists.
Knowledge workers spend a lot of time
looking for and processing information, at a
high cost. If we can make it easier for
employees to find the information they need,
and gain new insights from it, organizations
will get a higher return on their greatest
intangible—knowledge. Employee produc-
tivity will rise and profits will soar.
The Path to Return on Knowledge
Return on knowledge is linked to your
people’s ability to access collective knowl-
edge efficiently.
Think of a world in which every piece of
information all employees need, from any
and all systems, is instantly organized,
indexed and combined in ways consumers
find commonplace on the Internet, and yet
companies have been unable to achieve.
Unified indexing technology makes this
a reality. It is the least disruptive and most
effective path to on-demand access to
relevant knowledge. The technology democ-
ratizes knowledge access by providing con-
textually relevant content to every user and
ensuring that companies build on past
knowledge rather than recreating the wheel
90% of the time.
Advanced indexing technology ties
together the vast variety of systems, both on-
premise and in the cloud—email, databases,
CRM, ERP, social media, file shares, etc.—
to unify, normalize and enrich the informa-
tion to uncover hidden relationships for
new insights.
Here’s an example: A customer service
team with on-demand, actionable insight
can troubleshoot and solve customer prob-
lems quickly and consistently—helping cus-
tomers to get more value from your products
and services, and retaining their loyalty.
There is fierce international competition
for every dollar of profit in today’s global
economy. Organizations must treat knowl-
edge and knowledge workers as strategic
assets in order to compete and meet the chal-
lenges of the future. Very few organizations
are far along the maturity curve in dealing
with big data, but the incentive is there.
Increasingly, companies will differenti-
ate themselves on the basis of what they
know—by tapping into their return on col-
lective knowledge, and by unlocking the
value inherent in every company’s disparate
sources of data. Gaining real-time, relevant
insight and knowledge is the best way to
empower employees and help them perform
their jobs exceedingly well.
Coveo’s advanced, Unified Indexing and
Insight platform redefines how people
access and share fragmented knowledge
around the social enterprise. Coveo brings
together the collective and yet fragmented
information from cloud-based, social and
on-premise systems, and injects it into the
context of every user, every time. More than
2 million people globally and more than 500
companies use Coveo to achieve their busi-
ness goals.Among Coveo customers are CA
Technologies, L’Oreal Switzerland, Lock-
heed Martin, YUM! Brands, GEICO and
SunGard. T
For more information, visit www.coveo.com, follow us
on Twitter @coveo or like us on Facebook.
The Key to Return on Knowledge in a Big Data World
Advanced Indexing
Technology
The following is an excerpt from the
Coveo eBook, “Measuring Return on
Knowledge in a Big Data World.” Visit
www.coveo.com for your free copy.
Big data. Unstructured data. Semi-struc-
tured data. Data is all over the technology
news, and for good reason. It is over-
whelming organizations, requiring them to
find new ways to operate, stay competitive,
better serve their customers and bring new
products to market faster.
Companies are finding themselves with
piles of information within multiple channels,
locked away in silos—different systems, dif-
ferent departments, different geographies and
different data types, making it impossible to
connect the dots and make sense of critical
business information.
Hidden inside streams of structured and
unstructured data across cloud, social and
on-premise systems are information rela-
tionships that answer questions employees
haven’t even thought to ask, but need to
be asking.
The speed at which business moves
today, combined with the sheer volume of
data created by the digitized world, requires
new approaches to deriving value from
data and knowledge.
In this big data world, knowledge is your
company’s greatest asset and best differentia-
tor. Sohowdoyougetareturnonknowledge?
Leveraging Knowledge
How is collective knowledge being
leveraged today? In a word: poorly.
Over the past decade, research firm IDC
has regularly conducted research on what
NOT finding information might cost an
organization. In IDC’s most recent survey
of more than 700 knowledge workers, the
firm found that most companies are losing
more than $50,000 per employee per year
in lost productivity.
What’s more, according to a 2000 study
by University of Southern California’s Mar-
shall School of Business, just over 10% of
people reported having access to “lessons
learned” in other parts of their organization.
May 2013S6 KMWorld
“How is collective
knowledge being
leveraged today?
In a word: poorly.”
Excerpted from “Measuring Return on Knowledge in a Big Data World,”
7. May 2013 S7
identified. The elimination of end-user
tagging and the resulting organizational
ambiguity enables the enriched metadata to
be used by any search engine index, for
example, conceptSearch, SharePoint, Solr,
Autonomy or Google Search Appliance.
Only when metadata is consistently
accurate and trusted by the organization can
improvements be achieved in text analytics,
e-discovery and litigation support. In the
exploding age of big data, and more specif-
ically text analytics, sentiment analysis and
even open source intelligence, the ability to
harness the meaning of unstructured content
in real time improves decision-making and
enables organizations to proactively act
with greater certainty on rapidly changing
business complexities. To achieve an effec-
tive information governance strategy for
unstructured content, results are predicated
on the ability to find information and elim-
inate inappropriate information. The core
enterprise search component must be able to
incorporate and digest content from any
repository, including faxes, scanned content,
social sites (blogs, wikis, communities of
interest, Twitter), emails, and websites. This
provides a 360-degree corporate view of
unstructured content, regardless of where it
resides or how it was acquired.
Ensuringthattherightinformationisavail-
able to end users and decision makers is fun-
damental to trusting the accuracy of the
information, another key requirement in intel-
ligent search. Organizations can then find the
descriptiveneedlesinthehaystacktogaincom-
petitiveadvantageandincreasebusinessagility.
An intelligent metadata enabled solution for
text analytics analyzes and extracts highly
correlated concepts from very large document
collections.Thisenablesorganizationstoattain
an ecosystem of semantics that delivers under-
standable and trusted results that is continually
updated in realtime.
Applying the concept of intelligent
search to e-discovery and litigation, tradi-
tional information retrieval systems use
“keyword searches” of text and metadata as
a means of identifying and filtering docu-
ments. The challenges and escalating costs
of e-discovery and litigation support con-
tinue to increase. The use of intelligent
search reduces costs and alleviates many of
the challenges. Content can be presented to
knowledge professionals in a manner that
enables them to more rapidly identify rele-
vant information and increase accuracy. This
approach has also been proven to reduce
time and effort in collection and forensic
investigations, early-case assessment, ESI
processing and Web-based document
review. Significant benefits can be achieved
by removing the ambiguity in content and
the identification of concepts within a large
corpus of information. This methodology
delivers expediencies, and reduces costs,
offering an effective solution that overcomes
many of the challenges typically not solved
in e-discovery and litigation support.
The need for organizations to access and
fully exploit the use of their unstructured con-
tent won’t happen overnight. Organizations
must incorporate an approach that addresses
the lack of an intelligent metadata infrastruc-
ture, which is the fundamental problem.
Intelligent search, a by-product of the infra-
structure, must encourage, not hamper, the
use and reuse of information and be rapidly
extendable to address text mining, sentiment
analysis, e-discovery and litigation support.
The additional components of auto-classifi-
cation and taxonomies complete the core
infrastructure to deploy intelligent metadata
enabled solutions, including records man-
agement, data privacy, and migration. Search
can no longer be evaluated on features, but
on proven results that deliver insight into all
unstructured content. T
Concept Searching specializes in metadata generation,
auto-classification and taxonomy management, and is a
Microsoft managed partner with a Gold competency in
Application Development. Its technologies encompass
the entire portfolio of unstructured information, in
on-premise, cloud or hybrid environments. Clients
are using the technologies to improve search, records
management, data privacy, migration and text analytics.
Solving the Inadequacies
and Failures in
Enterprise Search
Theinabilitytoidentifythevalueinunstruc-
tured content is the primary challenge in any
application that requires the use of metadata.
Ifyouaren’tmanagingit,youwon’tfindit.At
the most basic level, enterprise search has
becomeinadequate.Bellsandwhistlesabound
but the unsolved problem still exists. Search
cannot find and deliver relevant information
intherightcontext,attherighttime.Thislais-
sez-faire approach, starting with executive
management on down, illustrates the inability
of organizations to elevate search to a key
component and critical enabler for improving
business outcomes. An information gover-
nance approach that creates the infrastructure
framework to encompass automated intelli-
gent metadata generation, auto-classification,
and the use of goal- and mission-aligned tax-
onomies is required. From this framework,
intelligent metadata enabled solutions can be
rapidly developed and implemented. Only
then can organizations leverage their knowl-
edgeassetstosupport search,litigation,e-dis-
covery, text mining, sentiment analysis and
open source intelligence.
Manual tagging is still the primary
approach used to identify the description of
content, and often lacks any alignment with
enterprise business goals. This subjectivity
and ambiguity is applied to search, result-
ing in inaccuracy and the inability to find
relevant information across the enterprise.
Metadata used by search engines may be
comprised of end user tags, pre-defined
tags, or generated using system defined
metadata, keyword and proximity matching,
extensive rule building, end-user ratings, or
artificial intelligence. Typically, search
engines provide no way to rapidly adapt to
meet organizational needs or account for an
organization’s unique nomenclature.
More effective is implementing an enter-
prise metadata infrastructure that consis-
tently generates intelligent metadata using
concept identification.A profoundly differ-
ent approach, relevant documents, regard-
less of where they reside, will be retrieved
even if they don’t contain the exact search
terms, because the concepts and relation-
ships between similar content has been
KMWorld
One of the founders of
Concept Searching,
Martin Garland has
more than 21 years’
experience in ECM. His
understanding of the
information
management
landscape and his
business acumen
provide a foundation
for guiding
organizations to
achieve their business objectives using best
practices, industry experience and technology.
Martin’s expertise has been instrumental in
assisting multinational clients in diverse industries
to understand the value of managing unstructured
content to improve business processes.
Martin Garland
By Martin Garland, CEO, Concept Searching, Inc.
8. www.infotoday.com
Produced by:
KMWorld Magazine
Specialty Publishing Group
For information on participating in the next white paper in the “Best Practices” series, contact:
paul_rosenlund@kmworld.com or kathy_rogals@kmworld.com • 561-483-5190
Kathryn Rogals Paul Rosenlund Andy Moore
561-483-5190 561-483-5190 207-236-8524 Ext. 309
kathy_rogals@kmworld.com paul_rosenlund@kmworld.com andy_moore@kmworld.com
For more information on the companies who contributed to
this white paper, visit their websites or contact them directly:
www.kmworld.com
Concept Searching, Inc.
8300 Greensboro Drive, Suite 800
McLean VA 22102
PH: 703.531.8567
Twitter: @conceptsearch
Contact: info-usa@conceptsearching.com
Web: www.conceptsearching.com
Coveo
Contact: info@coveo.com
Web: www.coveo.com
HP Autonomy
One Market Plaza
Spear Tower, Suite 1900
San Francisco CA 94105
PH: 415.243.9955
Contact: autonomyinfo@hp.com
Web: www.autonomy.com