SlideShare a Scribd company logo
1 of 55
Download to read offline
Databases in a Solid State World

How Exadata X3 and Other Database Systems
Leverage the Performance of Flash
Gwen Shapira, Senior Consultant
February, 2013
About Me
                     – Oracle ACE Director
                     – Member of Oak Table
                     – 14 years of IT

                     – Performance Tuning
                     – Troubleshooting
                     – Hadoop

                     – Presents, Blogs, Tweets
                     – @gwenshap


2          © 2013 Pythian
About Pythian
•   Recognized Leader:
    – Global industry-leader in remote database administration services and
      consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server
    – Work with over 250 multinational companies such as Forbes.com, Fox
      Sports, Nordion and Western Union to help manage their complex IT
      deployments
•   Expertise:
    – Pythian’s data experts are the elite in their field. We have the highest
      concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2
      Microsoft MVPs.
    – Pythian holds 7 Specializations under Oracle Platinum Partner
      program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC
•   Global Reach & Scalability:
    – Around the clock global remote support for DBA and consulting, systems
      administration, special projects or emergency response

3                                   © 2013 Pythian
You Never Forget Your
    First SSD



4               © 2013 Pythian
Sh*t People Say about SSD:
       Too expensive       Don’t use for writes
       Fast for reads      Use SATA SSD                 Unreliable
                                  Used for REDO
    Type of SSD matters
                             Use for random writes
    Use SSD in SAN   Becomes slower over time           Use PCI SSD
       Don’t use for REDO                     Only used in Exadata
                                                      Is it same as Flash?
           Only Sun flash devices are supported




5                                © 2013 Pythian
Solid State Disk
=
No Spinning
=
Low Latency Random IO



6            © 2013 Pythian
We are talking about: NAND FLASH
• As opposed to RAM
  Flash which is rare but
    awesome

                                             0
• SLC
   – One bit per cell.                       1

   – High performance.
                                             00

• MLC                                        01
                                             10
  – Two bit per cell                         11
  – High capacity

7                           © 2013 Pythian
Will Talk About:
• IO Performance
• Using SSDs for
  Oracle
• How Exadata and
  ODA uses SSDs
• SSD devices
• Practice: Reading
  SSD Vendor Specs



8                     © 2013 Pythian
Anatomy of a SSD
    Cell
    1bit
                    Page
                     4K
                                                 Block
                                               128 Pages
                                                 512K




Plane = 1024 Blocks = 512MB
Planes are grouped into Die which are grouped in Packages



9                             © 2013 Pythian
The Big Catch:
We read and write pages
But delete blocks




10            © 2013 Pythian
IO Operations




11              © 2013 Pythian
Reads
•    CPU registers – 0.3 * ns (1 cycle)
•    CPU Cache L1 – 1.2* ns
•    CPU Cache L2 – 3.0* ns
•    CPU Cache L3 – 12-24 ns
• MainMemory (RAM) – 60-100 ns
• SSD – 60,000 ns
• Magnetic Storage (“DISK”) – 3,000,000
  ns
• SAN devices ~ 15,000,000 ns


12                            © 2013 Pythian
What about throughput?
•    15K RPM SAS HDD – 120-200MB/s
•    PCIe SSD – 1-2GB/s
•    But … How many disks do you use?
•    Network bandwidth?
•    CPU Bus bandwidth?




13                     © 2013 Pythian
Writes
• Writes on new SSD – 250,000 ns
• Similar to sequential write to disk


How much data can you write to
a new 250GB SSD?



14                      © 2013 Pythian
Deletes
• Can’t overwrite data without deleting first
• Can only delete blocks of 128*4K pages
• To Overwrite a page:
     –   Read 127 pages
     –   Write 127 to a free block
     –   Delete old block
     –   Perform the write we originally requested
• Takes 2ms
• Each cell can only be written 100K times

15                            © 2013 Pythian
The Controller
•    Over-provision SSDs
•    Maintain free lists
•    Delete and cleanup in background
•    Balance use of cells (Wear leveling)
•    RAM caching




16                        © 2013 Pythian
Consequences:
• Write Amplification
     – How much data is really written when we write 1MB
     – 1 means no overhead
     – The closer to 1 the better
• Benchmarks on new SSD are worthless
     – Run benchmarks long enough to run out of
       overprovisioned space




17                         © 2013 Pythian
Will Talk About:
• IO Performance
• Using SSDs for
  Oracle
• How Exadata and
  ODA uses SSDs
• SSD devices
• Practice: Reading
  SSD Vendor Specs



18                    © 2013 Pythian
Redo Logs
A: Redo log writes are sequential writes and
therefore won’t benefit from SSD

B: Log file sync times are critical to Oracle
performance. Therefore placing redo logs on SSD
will have dramatic impact on performance.




19                     © 2013 Pythian
Don’t use SSD for redo if:
• You don’t have “log file sync” related
  performance problems
• You have dedicated disks for each redo log
• Even better if multiple disks, striped.
• Your SAN is well configured and has ample
  caching
• You have RAC and no shared SSDs




20                    © 2013 Pythian
SSD can make Redo faster if:
• You are suffering from high ―log file parallel
  write‖
• And your storage admin won’t even discuss it
• Redo is on LUN shared with:
     – Redo from multiple databases
     – Other services (SAP, etc)
• Not enough cache on storage array
• Storage network is a bottleneck


21                         © 2013 Pythian
Placing Data on SSD




22           © 2013 Pythian
Should you place data on SSD?
• SSD solves IO latency problems
• If ―DB File Sequential Read‖ is not in your top 5
  wait events, you probably don’t need your data
  on SSD.
• If you don’t maximize RAM use for buffer cache
  – don’t get SSD (yet)
• If your CPU utilization is high, solve this first.




23                      © 2013 Pythian
Not enough space?
•    Move most active segments
•    Random reads get most benefits from SSD
•    Active indexes with unique-scans
•    Fewer writes is better
•    AWR has IO statistics per segment
•    https://github.com/gwenshap/Oracle-DBA-
     Scripts/blob/master/SSD.sql




24                      © 2013 Pythian
Why Choose?
• SAN Devices that contain both HDD and SSD
• Smart controllers move most active data to SSD
  automatically.

• Pros: No need to choose and manually migrate
  data
• Cons: Your most active data will move without
  advanced notice



25                    © 2013 Pythian
Top Mistakes
• Using SSD for production and HDD for Standby
     – If production needs SSD…
     – Good chance that standby will fall behind


• Database Smart Flash Cache




26                          © 2013 Pythian
Database Smart Flash Cache


                 SGA                       If block is
                                           needed, it is
     Block                                 read from
     read from                             SSD
     disk
                 Block evicted
                 from SGA is
                 written to
                 SSD cache
     Disk        by DBWR                            Flash Cache




27                        © 2013 Pythian
Database Smart Flash Cache
• Pros:
     – Automatically keeps active data in SSD
• Cons:
     –   Large overhead for managing cache, all taken from SGA
     –   Overhead for DBWR
     –   No benefit and some overhead for writes
     –   Only one SSD device


 Using Smart Flash Cache will make your IO faster
 than using just disks, but smartly placing data on
 SSD will be even faster.
28                               © 2013 Pythian
Will Talk About:
• IO Performance
• Using SSDs for
  Oracle
• How Exadata and
  ODA uses SSDs
• SSD devices
• Practice: Reading
  SSD Vendor Specs



29                    © 2013 Pythian
Exadata has LOTS of SSD
•    Quarter rack has 3 storage cells
•    Each with 4 Sun Flash Accelerator F40
•    400GB * 4 * 3 = 4.8TB
•    21.5GB/s throughput
•    375,000 IOPS
•    Note that IB will limit you to 4GB/s per DB node




30                        © 2013 Pythian
Exadata Smart Flash Logging
• Redo log writes are written to disk and SSD
  together.
• Log sync is finished when one write is
  successful.
• Can’t Lose.
• Can’t try that at home
• This improves performance for redo when disks
  are busy with high throughput operations



31                    © 2013 Pythian
Exadata Smart Flash Cache
• Not same as DB Smart Flash Cache
• SSDs are on storage cells
• SSD on Exadata can also be used as ASM disks
  and not cache.




32                   © 2013 Pythian
Exadata Smart Flash Cache
• Reading un-cached data:
     1. Un-cached data is read
        from disk first
     2. Sent to the database
     3. and then copied to cache
                                                Cellsrv        Database




                                     Disks                SSD Cache




33                             © 2013 Pythian
Exadata Smart Flash Cache
• Cached reads:
     – Read from disk and SSD simultaneously
     – Whichever returns first
     – Effectively increase read throughput
     – Smart scans mostly
       read from disk                        Cellsrv           Database
     – Except for objects
       using ―cell_flash_cache‖
       KEEP clause.
                                                       SSD Cache
                             Disks




34                                © 2013 Pythian
Exadata Smart Flash Cache
• Writes:
     – Write through cache
     – Writes go to disk first
     – Then copied to cache, sometimes
     – Indexes and tables with random IO
                                                Cellsrv        Database
     – ALTER TABLE customers STORAGE
       (CELL_FLASH_CACHE KEEP)




                                     Disks                SSD Cache




35                             © 2013 Pythian
Exadata Smart Flash Cache
• Writes:
     – Write back cache
     – Writes go to SSD first
     – Then copied to disk, eventually
                                             Cellsrv        Database




                                  Disks                SSD Cache




36                          © 2013 Pythian
ODA and SSD
• ―Four 2.5-inch 200 GB SAS-2 SLC SSDs
  per shelf for database redo logs ―
• Allows multiple databases on ODA
• Reduces risk of disk bottlenecks




37                  © 2013 Pythian
Will Talk About:
• IO Performance
• Using SSDs for
  Oracle
• How Exadata and
  ODA uses SSDs
• SSD devices
• Practice: Reading
  SSD Vendor Specs



38                    © 2013 Pythian
Interfaces
• SATA
     – 32 outstanding IO
     – 6Gb/s = 600MB/s
     – significant latency
• SAS
     – 256 outstanding IO
     – 6Gb/s = 600MB/s
     – Used on ODA shared
       storage



39                           © 2013 Pythian
Interfaces
• PCIe
     – ―Flash‖ ―Accelerator‖
     – Multiple 500 MB/s
       lanes
     – Low latency
     – Multiple SAS/SATA
       controllers on card
       for extra throughput




40                             © 2013 Pythian
Interfaces
• Fiber
     – Use existing enterprise
       infrastructures
     – Shared storage
     – Usual SAN headache
     – Mandatory for RAC




41                          © 2013 Pythian
Will Talk About:
• IO Performance
• Using SSDs for
  Oracle
• How Exadata and
  ODA uses SSDs
• SSD devices
• Practice: Reading
  SSD Vendor Specs



42                    © 2013 Pythian
Write latency lower
                      than read?




43   © 2013 Pythian
Intel SSD 910




                                 identical read/write
                                 latency?




  44            © 2013 Pythian
45   © 2013 Pythian
RAMSAN




  46     © 2013 Pythian
47   © 2013 Pythian
Quick Recap
• SSDs make random reads wicked fast
• Writes and deletes are complicated
• Place segments with many random reads on
  SSD
• Exadata uses Smart Flash Cache to increase
  throughput
• Not all SSDs are the same
• Read specs carefully


48                   © 2013 Pythian
Thank you – Q&A
To contact us
        sales@pythian.com

        1-877-PYTHIAN

To follow us
        http://www.pythian.com/blog

        http://www.facebook.com/pages/The-Pythian-
     Group/163902527671

        @pythian

        http://www.linkedin.com/company/pythian

49                             © 2013 Pythian
Toolkit – Colour palette
• The theme colours for this template are pre-
  loaded. However, if you’re curious this is the
  palette:



                RGB 0 0 0           RGB 204 204 204           RGB 153 153 153      RGB 255 255 255




     RGB 0 119 139          RGB 0 163 173        RGB 255 143 40            RGB 255 210 0             RGB 200 0 0




50                                                    © 2013 Pythian
Toolkit – Service Icons
Higher res will be uploaded soon




51                     © 2013 Pythian
Toolkit – General Icons




52             © 2013 Pythian
Toolkit – Social Media Icons




53             © 2013 Pythian
Toolkit – Industry Logos




54             © 2013 Pythian
Toolkit – Stock Photos (will grow)




55               © 2013 Pythian

More Related Content

What's hot

Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseConfiguring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseCloudera, Inc.
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 
What's new in MySQL Cluster 7.4 webinar charts
What's new in MySQL Cluster 7.4 webinar chartsWhat's new in MySQL Cluster 7.4 webinar charts
What's new in MySQL Cluster 7.4 webinar chartsAndrew Morgan
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudCloudera, Inc.
 
Road to Cloudera certification
Road to Cloudera certificationRoad to Cloudera certification
Road to Cloudera certificationCloudera, Inc.
 
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Clustrix
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free FridayOtávio Carvalho
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBrian Ritchie
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloudSteve Loughran
 
Kafka as Message Broker
Kafka as Message BrokerKafka as Message Broker
Kafka as Message BrokerHaluan Irsad
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 

What's hot (20)

Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseConfiguring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
What's new in MySQL Cluster 7.4 webinar charts
What's new in MySQL Cluster 7.4 webinar chartsWhat's new in MySQL Cluster 7.4 webinar charts
What's new in MySQL Cluster 7.4 webinar charts
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
Road to Cloudera certification
Road to Cloudera certificationRoad to Cloudera certification
Road to Cloudera certification
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
Kafka as Message Broker
Kafka as Message BrokerKafka as Message Broker
Kafka as Message Broker
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 

Similar to Ssd collab13

OOW13: It's a solid state-world
OOW13: It's a solid state-worldOOW13: It's a solid state-world
OOW13: It's a solid state-worldMarc Fielding
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Johnny Miller
 
Analyst Perspective: SSD Caching or SSD Tiering - Which is Better?
Analyst Perspective: SSD Caching or SSD Tiering - Which is Better?Analyst Perspective: SSD Caching or SSD Tiering - Which is Better?
Analyst Perspective: SSD Caching or SSD Tiering - Which is Better?Dennis Martin
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014marvin herrera
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Lars Marowsky-Brée
 
Exadata architecture and internals presentation
Exadata architecture and internals presentationExadata architecture and internals presentation
Exadata architecture and internals presentationSanjoy Dasgupta
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...DataStax
 
OOW13: Accelerate your Exadata deployment with the DBA skills you already have
OOW13: Accelerate your Exadata deployment with the DBA skills you already haveOOW13: Accelerate your Exadata deployment with the DBA skills you already have
OOW13: Accelerate your Exadata deployment with the DBA skills you already haveMarc Fielding
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red_Hat_Storage
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsJavier González
 
Exadata x3 workshop
Exadata x3 workshopExadata x3 workshop
Exadata x3 workshopFran Navarro
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...DataStax Academy
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems confluent
 
Tuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadTuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadMarius Adrian Popa
 
2015 deploying flash in the data center
2015 deploying flash in the data center2015 deploying flash in the data center
2015 deploying flash in the data centerHoward Marks
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red_Hat_Storage
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraCeph Community
 

Similar to Ssd collab13 (20)

OOW13: It's a solid state-world
OOW13: It's a solid state-worldOOW13: It's a solid state-world
OOW13: It's a solid state-world
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?
 
Analyst Perspective: SSD Caching or SSD Tiering - Which is Better?
Analyst Perspective: SSD Caching or SSD Tiering - Which is Better?Analyst Perspective: SSD Caching or SSD Tiering - Which is Better?
Analyst Perspective: SSD Caching or SSD Tiering - Which is Better?
 
seminar.pdf
seminar.pdfseminar.pdf
seminar.pdf
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
 
Exadata architecture and internals presentation
Exadata architecture and internals presentationExadata architecture and internals presentation
Exadata architecture and internals presentation
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
 
OOW13: Accelerate your Exadata deployment with the DBA skills you already have
OOW13: Accelerate your Exadata deployment with the DBA skills you already haveOOW13: Accelerate your Exadata deployment with the DBA skills you already have
OOW13: Accelerate your Exadata deployment with the DBA skills you already have
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDs
 
Exadata x3 workshop
Exadata x3 workshopExadata x3 workshop
Exadata x3 workshop
 
Momentus xt PP Briefing
Momentus xt PP BriefingMomentus xt PP Briefing
Momentus xt PP Briefing
 
Solid state drives
Solid state drivesSolid state drives
Solid state drives
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
Tuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadTuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy Workload
 
2015 deploying flash in the data center
2015 deploying flash in the data center2015 deploying flash in the data center
2015 deploying flash in the data center
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 

More from Gwen (Chen) Shapira

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep DiveGwen (Chen) Shapira
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote Gwen (Chen) Shapira
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGwen (Chen) Shapira
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebookGwen (Chen) Shapira
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupGwen (Chen) Shapira
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Gwen (Chen) Shapira
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings MeetupGwen (Chen) Shapira
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersGwen (Chen) Shapira
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupGwen (Chen) Shapira
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureGwen (Chen) Shapira
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
 

More from Gwen (Chen) Shapira (20)

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep Dive
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 

Ssd collab13

  • 1. Databases in a Solid State World How Exadata X3 and Other Database Systems Leverage the Performance of Flash Gwen Shapira, Senior Consultant February, 2013
  • 2. About Me – Oracle ACE Director – Member of Oak Table – 14 years of IT – Performance Tuning – Troubleshooting – Hadoop – Presents, Blogs, Tweets – @gwenshap 2 © 2013 Pythian
  • 3. About Pythian • Recognized Leader: – Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server – Work with over 250 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments • Expertise: – Pythian’s data experts are the elite in their field. We have the highest concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2 Microsoft MVPs. – Pythian holds 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC • Global Reach & Scalability: – Around the clock global remote support for DBA and consulting, systems administration, special projects or emergency response 3 © 2013 Pythian
  • 4. You Never Forget Your First SSD 4 © 2013 Pythian
  • 5. Sh*t People Say about SSD: Too expensive Don’t use for writes Fast for reads Use SATA SSD Unreliable Used for REDO Type of SSD matters Use for random writes Use SSD in SAN Becomes slower over time Use PCI SSD Don’t use for REDO Only used in Exadata Is it same as Flash? Only Sun flash devices are supported 5 © 2013 Pythian
  • 6. Solid State Disk = No Spinning = Low Latency Random IO 6 © 2013 Pythian
  • 7. We are talking about: NAND FLASH • As opposed to RAM Flash which is rare but awesome 0 • SLC – One bit per cell. 1 – High performance. 00 • MLC 01 10 – Two bit per cell 11 – High capacity 7 © 2013 Pythian
  • 8. Will Talk About: • IO Performance • Using SSDs for Oracle • How Exadata and ODA uses SSDs • SSD devices • Practice: Reading SSD Vendor Specs 8 © 2013 Pythian
  • 9. Anatomy of a SSD Cell 1bit Page 4K Block 128 Pages 512K Plane = 1024 Blocks = 512MB Planes are grouped into Die which are grouped in Packages 9 © 2013 Pythian
  • 10. The Big Catch: We read and write pages But delete blocks 10 © 2013 Pythian
  • 11. IO Operations 11 © 2013 Pythian
  • 12. Reads • CPU registers – 0.3 * ns (1 cycle) • CPU Cache L1 – 1.2* ns • CPU Cache L2 – 3.0* ns • CPU Cache L3 – 12-24 ns • MainMemory (RAM) – 60-100 ns • SSD – 60,000 ns • Magnetic Storage (“DISK”) – 3,000,000 ns • SAN devices ~ 15,000,000 ns 12 © 2013 Pythian
  • 13. What about throughput? • 15K RPM SAS HDD – 120-200MB/s • PCIe SSD – 1-2GB/s • But … How many disks do you use? • Network bandwidth? • CPU Bus bandwidth? 13 © 2013 Pythian
  • 14. Writes • Writes on new SSD – 250,000 ns • Similar to sequential write to disk How much data can you write to a new 250GB SSD? 14 © 2013 Pythian
  • 15. Deletes • Can’t overwrite data without deleting first • Can only delete blocks of 128*4K pages • To Overwrite a page: – Read 127 pages – Write 127 to a free block – Delete old block – Perform the write we originally requested • Takes 2ms • Each cell can only be written 100K times 15 © 2013 Pythian
  • 16. The Controller • Over-provision SSDs • Maintain free lists • Delete and cleanup in background • Balance use of cells (Wear leveling) • RAM caching 16 © 2013 Pythian
  • 17. Consequences: • Write Amplification – How much data is really written when we write 1MB – 1 means no overhead – The closer to 1 the better • Benchmarks on new SSD are worthless – Run benchmarks long enough to run out of overprovisioned space 17 © 2013 Pythian
  • 18. Will Talk About: • IO Performance • Using SSDs for Oracle • How Exadata and ODA uses SSDs • SSD devices • Practice: Reading SSD Vendor Specs 18 © 2013 Pythian
  • 19. Redo Logs A: Redo log writes are sequential writes and therefore won’t benefit from SSD B: Log file sync times are critical to Oracle performance. Therefore placing redo logs on SSD will have dramatic impact on performance. 19 © 2013 Pythian
  • 20. Don’t use SSD for redo if: • You don’t have “log file sync” related performance problems • You have dedicated disks for each redo log • Even better if multiple disks, striped. • Your SAN is well configured and has ample caching • You have RAC and no shared SSDs 20 © 2013 Pythian
  • 21. SSD can make Redo faster if: • You are suffering from high ―log file parallel write‖ • And your storage admin won’t even discuss it • Redo is on LUN shared with: – Redo from multiple databases – Other services (SAP, etc) • Not enough cache on storage array • Storage network is a bottleneck 21 © 2013 Pythian
  • 22. Placing Data on SSD 22 © 2013 Pythian
  • 23. Should you place data on SSD? • SSD solves IO latency problems • If ―DB File Sequential Read‖ is not in your top 5 wait events, you probably don’t need your data on SSD. • If you don’t maximize RAM use for buffer cache – don’t get SSD (yet) • If your CPU utilization is high, solve this first. 23 © 2013 Pythian
  • 24. Not enough space? • Move most active segments • Random reads get most benefits from SSD • Active indexes with unique-scans • Fewer writes is better • AWR has IO statistics per segment • https://github.com/gwenshap/Oracle-DBA- Scripts/blob/master/SSD.sql 24 © 2013 Pythian
  • 25. Why Choose? • SAN Devices that contain both HDD and SSD • Smart controllers move most active data to SSD automatically. • Pros: No need to choose and manually migrate data • Cons: Your most active data will move without advanced notice 25 © 2013 Pythian
  • 26. Top Mistakes • Using SSD for production and HDD for Standby – If production needs SSD… – Good chance that standby will fall behind • Database Smart Flash Cache 26 © 2013 Pythian
  • 27. Database Smart Flash Cache SGA If block is needed, it is Block read from read from SSD disk Block evicted from SGA is written to SSD cache Disk by DBWR Flash Cache 27 © 2013 Pythian
  • 28. Database Smart Flash Cache • Pros: – Automatically keeps active data in SSD • Cons: – Large overhead for managing cache, all taken from SGA – Overhead for DBWR – No benefit and some overhead for writes – Only one SSD device Using Smart Flash Cache will make your IO faster than using just disks, but smartly placing data on SSD will be even faster. 28 © 2013 Pythian
  • 29. Will Talk About: • IO Performance • Using SSDs for Oracle • How Exadata and ODA uses SSDs • SSD devices • Practice: Reading SSD Vendor Specs 29 © 2013 Pythian
  • 30. Exadata has LOTS of SSD • Quarter rack has 3 storage cells • Each with 4 Sun Flash Accelerator F40 • 400GB * 4 * 3 = 4.8TB • 21.5GB/s throughput • 375,000 IOPS • Note that IB will limit you to 4GB/s per DB node 30 © 2013 Pythian
  • 31. Exadata Smart Flash Logging • Redo log writes are written to disk and SSD together. • Log sync is finished when one write is successful. • Can’t Lose. • Can’t try that at home • This improves performance for redo when disks are busy with high throughput operations 31 © 2013 Pythian
  • 32. Exadata Smart Flash Cache • Not same as DB Smart Flash Cache • SSDs are on storage cells • SSD on Exadata can also be used as ASM disks and not cache. 32 © 2013 Pythian
  • 33. Exadata Smart Flash Cache • Reading un-cached data: 1. Un-cached data is read from disk first 2. Sent to the database 3. and then copied to cache Cellsrv Database Disks SSD Cache 33 © 2013 Pythian
  • 34. Exadata Smart Flash Cache • Cached reads: – Read from disk and SSD simultaneously – Whichever returns first – Effectively increase read throughput – Smart scans mostly read from disk Cellsrv Database – Except for objects using ―cell_flash_cache‖ KEEP clause. SSD Cache Disks 34 © 2013 Pythian
  • 35. Exadata Smart Flash Cache • Writes: – Write through cache – Writes go to disk first – Then copied to cache, sometimes – Indexes and tables with random IO Cellsrv Database – ALTER TABLE customers STORAGE (CELL_FLASH_CACHE KEEP) Disks SSD Cache 35 © 2013 Pythian
  • 36. Exadata Smart Flash Cache • Writes: – Write back cache – Writes go to SSD first – Then copied to disk, eventually Cellsrv Database Disks SSD Cache 36 © 2013 Pythian
  • 37. ODA and SSD • ―Four 2.5-inch 200 GB SAS-2 SLC SSDs per shelf for database redo logs ― • Allows multiple databases on ODA • Reduces risk of disk bottlenecks 37 © 2013 Pythian
  • 38. Will Talk About: • IO Performance • Using SSDs for Oracle • How Exadata and ODA uses SSDs • SSD devices • Practice: Reading SSD Vendor Specs 38 © 2013 Pythian
  • 39. Interfaces • SATA – 32 outstanding IO – 6Gb/s = 600MB/s – significant latency • SAS – 256 outstanding IO – 6Gb/s = 600MB/s – Used on ODA shared storage 39 © 2013 Pythian
  • 40. Interfaces • PCIe – ―Flash‖ ―Accelerator‖ – Multiple 500 MB/s lanes – Low latency – Multiple SAS/SATA controllers on card for extra throughput 40 © 2013 Pythian
  • 41. Interfaces • Fiber – Use existing enterprise infrastructures – Shared storage – Usual SAN headache – Mandatory for RAC 41 © 2013 Pythian
  • 42. Will Talk About: • IO Performance • Using SSDs for Oracle • How Exadata and ODA uses SSDs • SSD devices • Practice: Reading SSD Vendor Specs 42 © 2013 Pythian
  • 43. Write latency lower than read? 43 © 2013 Pythian
  • 44. Intel SSD 910 identical read/write latency? 44 © 2013 Pythian
  • 45. 45 © 2013 Pythian
  • 46. RAMSAN 46 © 2013 Pythian
  • 47. 47 © 2013 Pythian
  • 48. Quick Recap • SSDs make random reads wicked fast • Writes and deletes are complicated • Place segments with many random reads on SSD • Exadata uses Smart Flash Cache to increase throughput • Not all SSDs are the same • Read specs carefully 48 © 2013 Pythian
  • 49. Thank you – Q&A To contact us sales@pythian.com 1-877-PYTHIAN To follow us http://www.pythian.com/blog http://www.facebook.com/pages/The-Pythian- Group/163902527671 @pythian http://www.linkedin.com/company/pythian 49 © 2013 Pythian
  • 50. Toolkit – Colour palette • The theme colours for this template are pre- loaded. However, if you’re curious this is the palette: RGB 0 0 0 RGB 204 204 204 RGB 153 153 153 RGB 255 255 255 RGB 0 119 139 RGB 0 163 173 RGB 255 143 40 RGB 255 210 0 RGB 200 0 0 50 © 2013 Pythian
  • 51. Toolkit – Service Icons Higher res will be uploaded soon 51 © 2013 Pythian
  • 52. Toolkit – General Icons 52 © 2013 Pythian
  • 53. Toolkit – Social Media Icons 53 © 2013 Pythian
  • 54. Toolkit – Industry Logos 54 © 2013 Pythian
  • 55. Toolkit – Stock Photos (will grow) 55 © 2013 Pythian

Editor's Notes

  1. http://www.dramexchange.com/service/faqs.aspx#c7
  2. SSD’s base memory unit is a cell, which holds 1 bit in SLC and 2 bits in MLC. Cells are organized in pages (usually 4k) and pages are organized in blocks (512K). Data can be read and written in pages, but is always deleted in blocks. This will become really important in a moment.
  3. Vendors don’t share write amplification numbers – but you can use APIs they sometimes provide to check how much data is written when you write 1M
  4. This means that write performance is throttled by disk which is why Exadata can do 60 reads for each write.
  5. Very very
  6. Very very
  7. Very very
  8. 4 * 6Gb/s = 4 * 600MB/s = 2.4GB/s8 * 500MB/s = 4GB/s