Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar

Deploy and Manage :
Oracle NoSQL Database and
Hadoop Cluster
using Ankush

1

© 2013 Impetus Technologies - Confidential

Agenda
•
•
•
•
•
•
•

2

Overview of the big data
Introduction to NoSQL Database
Use Cases for Oracle NoSQL Database
Oracle NoSQL Database Overview
Introducing Ankush
Ankush : Demo
Q&A


Definition
You have a Big Data situation…

When traditional information systems cannot store process
or analyze the volume, variety or velocity of data in a costeffective and timely manner

Store
Process
Analyze

3


Volume
Velocity
Variety

COST
TIME

Where to look for the value of Big
Data?
• If you could test all of your decisions, how would
that change the way you compete?
• How would your business change if you used

data for widespread in-time customization?
• Could you create a new business model based
on data?

4


Agenda
•
•
•
•
•
•
•

5

Introducing Ankush
Ankush : Demo
Q&A


Big Data Acquisition Characteristics
Where should we put all that data?

Batch-Oriented

Real-Time

Process data to use

Deliver a service

Bulk storage
Write once, read all

6


Fast access to specific
record
Read, write, delete,
update

Big Data Storage Choices

Hadoop Distributed File
System (HDFS)
File System

Database

Parallel scanning

Indexed storage

No inherent structure

Simple data structure

High volume writes

High volume random reads and writes

Batch Oriented

7

Oracle NoSQL Database

Real-Time


Challenges NoSQL Databases address
• Performance
– High rate of data capture
– High volume of simple queries

• Flexible schema – Diverse, changing data sets
• Horizontal Scalability – Scale out, don’t scale up
• Availability
– Low cost highly available, distributed data store

8


Agenda
•
•
•
•
•
•
•

9

Introducing Ankush
Ankush : Demo
Q&A


Sample of Big Data Use Cases Today
AUTOMOTIVE
Auto sensors
reporting
location,
problems

COMMUNICATIONS
Location-based
advertising

CONSUMER
PACKAGED
GOODS
Sentiment analysis
of what’s hot,
problems

FINANCIAL
SERVICES
Risk & portfolio
analysis
New products

EDUCATION &
RESEARCH
Experiment
sensor analysis

HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg quality
Warranty analysis

LIFE
SCIENCES
Clinical trials
Genomics

MEDIA/
ENTERTAINMENT
Viewers / advertising
effectiveness

ON-LINE
SERVICES /
SOCIAL
MEDIA
People & career
matching
Web-site
optimization

HEALTH CARE
Patient sensors,
monitoring, EHRs
Quality of care

OIL & GAS
Drilling
exploration
sensor analysis

RETAIL
Consumer
sentiment
Optimized
marketing

TRAVEL &
TRANSPORTATION
Sensor analysis for
optimal traffic flows
Customer sentiment

UTILITIES
Smart Meter
analysis for
network
capacity,

Challenged by: Data Volume, Velocity, Variety
Oracle NoSQL Database is typically a component of a
Big Data Solution

10


LAW
ENFORCEMENT
& DEFENSE
Threat analysis social media
monitoring, photo
analysis

Use Case – Online Display Advertising
• Problem
– Very low latency requirements – Publishers require < 75 ms response
time from the ad serving platform
– Extreme data volume– Multi-millions of requests per second
– Highly available – 24/7 sites
– Revenue maximization – Deliver the most relevant ad to maximize
revenue

• Solution – Where to use a NoSQL Database?
– Cookie store – NoSQL database used to store cookies and associated
behavioral segments
– Track behavioral data – Beacons utilized during browsing to store
timestamp, frequency, and behavioral segments by cookie
– Optimize ad delivery – Recency, frequency, and behavioral segments
used to determine optimal ad to deliver to user

11


Use Case – Online Display Advertising
Architecture

RDBMS

Ad Server Application
NoSQL DB Driver

Multi-Dimensional
Reporting

12


Hadoop Cluster

Use Case – Remote Patient Monitoring
Scenario
•

Patient uses multiple devices at home

•

Medical data periodically sent to database

•

App monitors and alerts patient state

•

Appropriate alerts sent to medical or emergency
personnel, recorded in profile

Important Attributes
•

High performance and high availability

•

High throughput event capture

•

Huge volumes of data

•

Simple data, flexible data model

•

Connectivity to Analytics and Discovery

Capture Patient Monitoring
Data

NoSQL
DB

Goal: Better Patient Care at Lower Costs
13


Alerting
System

Agenda
•
•
•
•
•
•
•

14

Introducing Ankush
Ankush : Demo
Q&A


Oracle NoSQL Database
Scalable, Highly Available, Key-Value Database
Application

NoSQL DB Driver

Storage Nodes

Datacenter A

15

Application

Storage Nodes

•
•
•
•
•
•
•
•

Application
NoSQL DB Driver

Features

Application

Datacenter B

Simple Key-Value Data Model
Horizontally Scalable
Highly Available
ACID Transactions
Elastic Configuration
Simple administration
Transparent load balancing
Commercial grade software
and support


Architecture: Application’s Perspective
Application

Application

NoSQL DB Driver

NoSQL DB Driver

Shard 1

Shard 2

...

Shard N

Master

Master

Replica 1

Replica 1

Replica 1

Replica 2

16

Master

Replica 2

Replica 2


Simple Data Model
Key-value pairs
•
•
•
•

Simple data model – key-value pair (major+minor-key paradigm)
Simple operations – read/insert/update/delete, RMW support
Scope of transaction – records within a major key, single API call
Unordered scan of all data (non-transactional)
Major key:

userid

Strings
Minor key:

Byte Array 

17

Value:


subscriptions

expiration date

address

phone #

email id

Latest YCSB Benchmark Results
Mixed Throughput

• 2 billion records
• 2 TB of data
• 95% read, 5% update

4

1,200,000
1,000,000

3
800,000

2

600,000
400,000

1
200,000

• Low latency

• High Scalability

0

0

6 (2x3)

12 (4x3)

24 (8x3)

30 (10x3)

Cluster Size
Throughput (ops/sec)
Write Latency (ms)
Read Latency (ms)

18


Average Latency (ms)

• 1.25M ops/sec

Throughput (ops/sec)

1,400,000

Oracle NoSQL Database Differentiation
Integrates seamlessly with Oracle Stack (Database, OEP, RDF Graph)

Commercial Grade
Software and Support

• General Purpose

• Reliable – Based
on proven Berkeley
DB JE HA
• Easy to Install &
Configure

Scalability and
Availability

• Intelligent Oracle
NoSQL DB Driver
• Evenly distributes data
• Ops go to fastest node
• Bounded network hops
for all operations

• Automatic replication
and failover
• 1M+ Operations/second

19


Simple Data Model

• Simple Major + Minor
Key-Value data
structure
•JSON schemas
•ACID transactions
• Configurable
consistency and
durability

Simple
Administration

• Web-based Console and
CLI commands
• Smart Topology
Manages and Monitors:
• Topology
• Load & Performance
• Events & Alerts

• JMX & SNMP Integration

Agenda
•
•
•
•
•
•
•

20

Introducing Ankush
Ankush : Demo
Q&A


Challenges for System and IT Administrators
• Enterprises are evolving from Hadoop only architectures
to Big Data solution architectures
• Impedance Mismatch : Is your IT organization geared up
to transition Big Data technologies into the Enterprise?
• Resolve Challenges
• IT Administrator Desired features

21


Introducing Ankush :
Big Data Cluster Management
• Ankush
– Rapid, easy & productive way to provision big data
clusters
– Reducing the overall time, cost & efforts required for
cluster setup
– Manage multiple clusters and cluster activities from a
common dashboard
– Support for In Premise and Cloud Clusters
– Pro Active Monitoring & Analytics
– Technology and Vendor Neutral
22


Ankush Key Features
•
•
•
•
•
•
•
•
•

23

Automated setup for Big Data Ecosystem & its pre dependency
Centralized cluster management & monitoring
Create, Manage and Monitor multiple clusters
Supports multiple vendor, version, bundles for Hadoop Ecosystem
Components
Web based Job management, Event alerts and notification mails
Support setup for local as well as cloud based clusters
i-FMR aims to offer generic Map-Reduce independent of cloud
Cloud cluster termination modes & pre termination activities
Anayltics – Cluster, Advance Profiling, Value Add


Why Ankush ?
• Multi Technology + Multi Vendor support
– Manage single relationship, easier pricing/contract
– Replace or migrate – protection from technology churn
– Encourage Experimentation with centralized control and
standardization
• Analytics and Value Added Services
– Cluster, Cross Cluster, Network, Logs, Jobs, Nodes –
Analytics powered proactive monitoring
– Profiling
– Test Framework Integration
24


Sample Ankush Use Case
• Test Beds
– Testing application across different vendors, distributions &
versions
– Benchmarking on different permutation of configuration, load &
environments
– Analyzing role of cluster size by varying volume of loads patterns
– Launching & Resizing on the fly

DEMO
25


Single Instance Database (1x1)
Good for Development Environment
Application
NoSQL DB Driver

Shard 1

Master

26


Increased Data Capacity (2x1)
Adding Shards to the cluster
Application

Application

NoSQL DB Driver

NoSQL DB Driver

Shard 1

Shard 2

Master

27


Master

Increased Cluster Availability (2x3)
Adding replication-nodes to each shard
Application

Application

NoSQL DB Driver

NoSQL DB Driver

Shard 1

Shard 2

Master
Replica 1

Replica 1

Replica 2

28

Master

Replica 2


Q & A?

• Impetus Big Data Group
– bigdata@impetus.com
– Bigdata.impetus.com

• Oracle NoSQL Database OTN Forum
http://forums.oracle.com/forums/forum.jspa?forumID=1388

29


Appendix

30


Advisors

• Experience
• Thought
Leadership

Architects

Advances

31


• Expertise
• Data Scientists

• Open Source
• Tools

Oracle NoSQL Database Resources
External
• NoSQL DB Use Cases, White Papers, Data Sheets, Benchmarks
http://www.oracle.com/technetwork/products/nosqldb/overview/index.html

• NoSQL DB Documentation
http://www.oracle.com/technetwork/products/nosqldb/documentation/index.html

• NoSQL DB Downloads
http://www.oracle.com/technetwork/products/nosqldb/downloads/index.html

• NoSQL DB OTN Forum
http://forums.oracle.com/forums/forum.jspa?forumID=1388

• NoSQL DB version 2.0 Features
http://bit.ly/UKn5Sc

• OU Training Classes
http://bit.ly/V5qbmY

32


Simple Data Model

Major-Minor Key Paradigm
Shard-1

/major/key/components/ - /minor/key/components

RN2

RN1
RN3

RN2

RN1
RN3

Shard-3
RN2

RN1
RN3

33


Oracle NoSQL Driver

Shard-2

/555.22.1111/-/profile
/555.22.1111/-/image
/555.22.1111/-/friends
/Smith/Bob/-/555.22.1111
/666.22.3333/-/profile
/666.22.3333/-/image
/666.22.3333/-/friends
/Smith/Richard/-/666.22.3333
/444.22.1212/-/profile
/444.22.1212/-/image
/444.22.1212/-/friends
/Wong/Bill/-/444.22.1212

Simple Data Model
ACID Transactions – Configurability
• Configurable Durability Policy

• Configurable Consistency Policy

34


Simple Data Model
ACID Transactions
• ACID transactions by default
• Transaction Scope
– Single API call
– All records must have the same major key
– Support for multiple operations within a transaction
• Can be relaxed for increased performance on a per-

operation basis

35


Elasticity

On-Demand Cluster Expansion

Application

On Demand
•

NoSQL DB Driver

Increase Data Capacity

•

Add more storage nodes
New shards automatically created

Increase Data Throughput
–
–

More shards = better write
throughput
More replicas/shard = better read
throughput

Master

Master

Replica

Replica

Replica

Replica

Shard-1

–
–

Shard-2

StorageNode

36


StorageNode

StorageNode

Rebalance an Unbalanced Store
Application
NoSQL DB Driver

Improve Performance
•

•

Replication nodes move from
over-utilized to under-utilized
storage nodes
Number of shards and
replication factor remain
unchanged

Master1

Master2

Master3

Represents a partition
37


JSON Data Format
Avro based Serialization/Deserialization

• Why Avro?
– Compact, highly efficient serialization
– Synergy with Hadoop
– Multiple binding options (JSON, Generic, POJO)

• Schema
– DDL allows schema creation through Avro JSON definition
– Supports serialization from/to JSON strings

• Schema evolution
– Easy to use mechanism for schema evolution
– Schema versions can be opaque to readers

38


Support for Large Objects

• Efficient storage and retrieval of large objects
• Client side streaming interface for low memory consumption
• Server side splitting and distribution of object chunks across
nodes for better read/write latency

39


Integration with Oracle Products
• Database External Tables
– Access NoSQL data directly from Oracle
– Available in the Enterprise Edition

• Oracle Event Processing (OEP)
– NoSQL cartridge for Oracle Event Processing
– Java serialization utilized for values

• Oracle Semantic Graph
– RDF Jena adapter

40


How much throughput do you need?
NoSQL DB has throughput even for the largest players

41


What’s New?
Release 2 Feature Summary
R2 Features

Scalability &
Manageability

New APIs

Integration &
Monitoring

Elasticity

JSON schemas

External Tables

Rebalancing

C-API

Oracle Event
Processing

Smart Topology

Large Object Support

RDF Graph

SNMP/JMX

42


Simple Administration
• Web-based console and CLI commands
• Manages and Monitors
– Configuration changes
–
–
–
–

43

Load: Number of operations, data size
Performance: Latency, throughput. Min, max, average, trailing, …
Events: Failover, recovery, load distribution
Alerts: Failure, poor performance, …


Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar

Similar to Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar (20)

More from Impetus Technologies

More from Impetus Technologies (20)

Recently uploaded

Recently uploaded (20)

Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar

Editor's Notes