2. Cloudera Roadmap & Release Plan - DISCLAIMER
The information in this document is proprietary to Cloudera. No part of this document may be reproduced,
copied or transmitted in any form for any purpose without the express prior written permission of Cloudera.
This document is not subject to any agreement with Cloudera. This document contains only intended
strategies, developments and functionalities of Cloudera products and is not intended to be binding upon
Cloudera to any particular course of business, product strategy and/or development. Please note that this
document is subject to change and may be changed by Cloudera at any time without notice.
This document is provided without a warranty of any kind, either express or implied, including but not limited
to the implied warranties of merchantability, fitness for a particular purpose or non-infringement.
Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect or
consequential damages that may result from the use of these materials. The limitation shall not apply in
cases of gross negligence.
3. Goals for this session
• Give you visibility into where Cloudera is
going next
• Help explain why Cloudera is investing
where it is
• Get you to be a part of it
Copyright 2010 Cloudera Inc.
4. Cloudera’s product strategy
• Provide the reference distribution for the Hadoop
platform
• Functionally complete
• Performant and secure
• Integrated & tested
• Easy to trial & consume
• 100% Apache licensed
• Open to partners and the extended IT
ecosystem
• Provide a commercial solution to helps enterprises
run Hadoop in production
• Software & services
• Increase transparency, consistency & reliability
• Lower the cost & complexity of administration
• Improved compliance to policies & processes
Copyright 2010 Cloudera Inc.
Cloudera’s
Distribution
for Hadoop
Cloudera
Enterprise
5. Cloudera’s release strategy
• The platform release
• Releases annually
• Public beta
• Comprised of several open source software
component versions & patches
• The applications release
• Releases semi-annually
• Private beta
• Has dependencies to specific platform (CDH)
version(s)
Copyright 2010 Cloudera Inc.
Cloudera’s
Distribution
for Hadoop
Cloudera
Enterprise
6. CDH themes
• CDH3 – move from kernel to platform
• Provide the features of a platform - expansion of the functional
footprint & take ownership of combining & integrating a single
platform versus the current industry practice of “roll your own”
• Enable others to build on the platform - better functionality for
integration of RDBMS and BI
• Centralize basic functions of the platform - configuration and
service management (more on this…)
• Incremental enhancements to existing components
• CDH4 – consistency and performance
• Extend high availability & authorization throughout the platform
• Rationalize duplicate functions across the platform
• Improve performance throughout
• Incremental enhancements to existing components
Copyright 2010 Cloudera Inc.
7. What’s CMF?
Datanodes
Region
Servers
Collectors
Flume
Processors
MastersJob
Trackers
Name
Node
Secondary
NN
Task
Trackers
Zookeep
er Quora
Workflow
Servers
Hue
CMF CMF Agent CMF Agent CMF Agent
CMF AgentCMF AgentCMF Agent
CMF Agent CMF AgentCMF Agent
CMF Agent
CMF Agent
Systems Monitor
(Hyperic, Zenoss,
Nagios, etc.)
Configuration
Management
(Puppet, Chef,
cfEngine)
Server / VM
Provisioning
(Bladelogic,
HP, IBM,
Eucalyptus)
• Service &
process mgt
• Unified
config
• Monitor
• Binary
distribution
(optional)
CMF Agent
A framework that helps organizations to operate Hadoop services and resources as a
unified system
In scope Out of scope
• Governance of distributed services and
individual daemons (start, stop, restart, flag)
• Service configuration
• “Movement” of services across physical hosts
• Change management database
• Cross-system issues (e.g. dev – test – prod)
• Operating system and / or JVM
configuration management
• Resource (e.g. server, network, VM)
provisioning
8. Cloudera Enterprise
• Reduces the risks of running Hadoop in production
• Improves consistency, compliance and administrative overhead
Management applications
• Authorization mgmt &
provisioning
• Monitoring
• Resource mgmt
• System lifecycle
(planned)
• Production support for CDH & certified integrations (Oracle,
Netezza, Teradata, Greenplum, Aster Data)
8Copyright 2010 Cloudera Inc. All rights reserved
Applications
9. Enterprise management applications themes
• Enterprise 3.0 – cover some immediate enterprise needs
• Extend authorization management & administration to meet the
needs of more complex organizations
• Track the usage of scarce cluster resources
• Monitor incoming data via Flume
• Enterprise 3.5 – improve transparency & automation
• Real-time activity monitor (more on this…)
• Expand file browser to show provenance & ownership of data
including multi-parameter search
• Extended authorization management & administration
• Enhancement to existing components
Copyright 2010 Cloudera Inc.
10. Why an activity monitor?
• SLA’s are typically measured as
activity completion time or slot
availability rate
• Four different trackers to log
into (Hive, Pig, Oozie,
MapReduce) all with different
and incomplete metrics
• No means of setting policies to
correct or fix misbehaving
activities
• Currently no data available to
drive continuous improvement
Copyright 2010 Cloudera Inc.
• Frustrating that ops can’t reliably
measure what they are supposed
to be measured on!
• Incomplete and inconsistent
metrics, measured differently by
activity
• Even with proper use of the
scheduler, misbehaving activities
can drag down a cluster
• “Just add more boxes?”
11. Get involved!
• The best Cloudera products and features are built in
partnership with customers
• Contact charles@cloudera.com if you are interested
Copyright 2010 Cloudera Inc.