Combat Cyber Threats with Cloudera Impala & Apache Hadoop

Combat Cyber Threats
with Cloudera Impala & Apache Hadoop
Justin Erickson | Director, Product Management, Cloudera
Wayne Wheeles | Analytic, Infrastructure and Enrichment Developer Cyber
Security, Six3 Systems
July 2013

Agenda
What’s new in Impala?
• Impala recap
• Impala 1.1
• Authorization with Sentry
Cyber security with Impala
• Cyber security demo overview
• Working with WebProxy Data
• Working with Netflow Data
• IDS Amplification and Correlation “holy grail use case”
• Discussion and questions
2

Cloudera Impala
3
Interactive SQL for Hadoop
 Responses in seconds
 ANSI-92 standard SQL with Hive SQL
Native MPP Query Engine
 Purpose-built for low-latency queries
 Separate runtime from MapReduce
 Designed as part of the Hadoop ecosystem
Open Source
 Apache-licensed

Benefits of Impala
4
More & Faster Value from “Big Data”
 Interactive BI/analytics experience via SQL
 No delays from data migration
Flexibility
 Query across existing data
 Select best-fit file formats (Parquet, Avro, etc.)
 Run multiple frameworks on the same data at the same time
Cost Efficiency
 Reduce movement, duplicate storage & compute
 10% to 1% the cost of analytic DBMS
Full Fidelity Analysis
 No loss from aggregations or fixed schemas

Impala 1.1 (released July 23, 2013)
Sentry support
• Fine-grained authorization
• Role-based authorization
Support for views
Performance
• Parquet columnar
performance
• Join order sorted by table size
• More efficient metadata
refresh for larger installations
Additional SQL
• SQL-89 joins (in addition to
existing SQL-92)
• LOAD function
• REFRESH command for
JDBC/ODBC
Improved HBase
support
• Binary types
• Caching configuration
©2013 Cloudera, Inc. All Rights
Reserved.
5

Previous State of Authorization
6
Insecure Advisory Authorization
Users can grant themselves permissions
Intended to prevent accidental deletion of data
Problem: Doesn’t guard against malicious users
HDFS Impersonation
Data is protected at the file level by HDFS permissions
Problem: File-level not granular enough
Problem: Not role-based
Two Sub-Optimal Choices for SQL on Hadoop

Sentry with CDH4.3 Hive and Impala 1.1
7
Secure Authorization
Ability to control access to data and/or privileges on data for
authenticated users
Fine-Grained Authorization
Ability to give users access to a subset of data in a database
Role-Based Authorization
Ability to create/apply templatized privileges based on
functional roles
Multi-Tenant Administration
Ability for central admin group to empower lower-level
admins to manage security for each database/schema

Part of an overall infosec landscape
8
Perimeter
Guarding access to the
cluster itself
Technical Concepts:
Authentication
Network isolation
Data
Protecting data in the
cluster from
unauthorized visibility
Technical Concepts:
Encryption
Data masking
Access
Defining what users
and applications can do
with data
Technical Concepts:
Permissions
Authorization
Visibility
Reporting on where
data came from and
how it’s being used
Technical Concepts:
Auditing
Lineage
SentryKerberos | Oozie | Knox Cloudera NavigatorCertified Partners
Available 7/23

Agenda – Cyber security with Impala
What’s new in Impala?
• Impala recap
• Impala 1.1
• Authorization with Sentry
Cyber security with Impala
• Cyber security demo overview
• Working with WebProxy Data
• Working with Netflow Data
• IDS Amplification and Correlation “holy grail use case”
• Discussion and questions
9

Impala Mission Demonstration Platform
10
Application Server
Cloudera - CDH 4 Cluster
sherpa4
sherpa3 sherpa2 sherpa1
• Cloudera Manager
• HDFS
• Impala
• HBASE
• MR
• HIVE
• HDFS
• Impala
• HBASE
• MR
• HIVE
• HDFS (NN)
• Impala (State Store)
• HBASE(RS)
• MR
• HUE
• Oozie
• Zookeeper
• HIVE
Organization
Network
Gateway to
Internet
S
E
N
S
O
R
Netflow
WebProxy
IDS

Demo Platform Data Sets
Webinar Data Sets
• Netflow Data
• The term flow refers to a single data flow
connection between two hosts, defined
uniquely by its five-tuple.
• http://tools.netsa.cert.org/silk/
• IDS/IPS Data
• a device or software application that
monitors network or system activities for
malicious activities or policy violations and
produces reports to a management station
• http://www.snort.org
• WebProxy Data
• WebProxy for request by users within the
corporate domain.
Enrichment Data Sets
• Geographic enrichment
• Geo-location information of addresses
• http://dev.maxmind.com/
• Blacklist Information
• Address list of addresses identified as
potential threat
• http://www.autoshun.org/
• Whitelist Information
• Addresses known located within the
corporate network
• Statistical Cubes
• Cubes built for the purpose of providing
statistical amplification for analysis
11

Demonstration
12
Impala Mission Demonstration Platform

13
Why Impala for Cyber Security?
Cloudera Impala and HDFS are a great choice for cyber
security:
• Offers one powerful and secure platform for
structured and unstructured data.
• Uniquely provides the capability to store large
amounts of data at a acceptable price point.
• Sentry provides even greater protection for your
cyber security data.

Thank You
• Ask questions on the Q&A tab
• Recording will be available
at cloudera.com
• After webinar, inquire at:
info@cloudera.com
• Contact info:
Email:
sherpasurfing@gmail.com
impala-user@cloudera.org
Twitter:
@WayneWheeles
@JustinErickson
@Cloudera
14
Cloudera Impala
cloudera.com/impala
“Imagination is more important than
knowledge. For knowledge is limited to all
we now know and understand, while
imagination embraces the entire world, and
all there ever will be to know and
understand.”
~Albert Einstein
Six3 Cyber Security Demo
https://github.com/sherpasurfing

Combat Cyber Threats with Cloudera Impala & Apache Hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Combat Cyber Threats with Cloudera Impala & Apache Hadoop

Similar to Combat Cyber Threats with Cloudera Impala & Apache Hadoop (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Recently uploaded

Recently uploaded (20)

Combat Cyber Threats with Cloudera Impala & Apache Hadoop

Editor's Notes