More Related Content Similar to Ssd collab13 (20) More from Gwen (Chen) Shapira More from Gwen (Chen) Shapira (20) Ssd collab131. Databases in a Solid State World
How Exadata X3 and Other Database Systems
Leverage the Performance of Flash
Gwen Shapira, Senior Consultant
February, 2013
2. About Me
– Oracle ACE Director
– Member of Oak Table
– 14 years of IT
– Performance Tuning
– Troubleshooting
– Hadoop
– Presents, Blogs, Tweets
– @gwenshap
2 © 2013 Pythian
3. About Pythian
• Recognized Leader:
– Global industry-leader in remote database administration services and
consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server
– Work with over 250 multinational companies such as Forbes.com, Fox
Sports, Nordion and Western Union to help manage their complex IT
deployments
• Expertise:
– Pythian’s data experts are the elite in their field. We have the highest
concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2
Microsoft MVPs.
– Pythian holds 7 Specializations under Oracle Platinum Partner
program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC
• Global Reach & Scalability:
– Around the clock global remote support for DBA and consulting, systems
administration, special projects or emergency response
3 © 2013 Pythian
5. Sh*t People Say about SSD:
Too expensive Don’t use for writes
Fast for reads Use SATA SSD Unreliable
Used for REDO
Type of SSD matters
Use for random writes
Use SSD in SAN Becomes slower over time Use PCI SSD
Don’t use for REDO Only used in Exadata
Is it same as Flash?
Only Sun flash devices are supported
5 © 2013 Pythian
7. We are talking about: NAND FLASH
• As opposed to RAM
Flash which is rare but
awesome
0
• SLC
– One bit per cell. 1
– High performance.
00
• MLC 01
10
– Two bit per cell 11
– High capacity
7 © 2013 Pythian
8. Will Talk About:
• IO Performance
• Using SSDs for
Oracle
• How Exadata and
ODA uses SSDs
• SSD devices
• Practice: Reading
SSD Vendor Specs
8 © 2013 Pythian
9. Anatomy of a SSD
Cell
1bit
Page
4K
Block
128 Pages
512K
Plane = 1024 Blocks = 512MB
Planes are grouped into Die which are grouped in Packages
9 © 2013 Pythian
12. Reads
• CPU registers – 0.3 * ns (1 cycle)
• CPU Cache L1 – 1.2* ns
• CPU Cache L2 – 3.0* ns
• CPU Cache L3 – 12-24 ns
• MainMemory (RAM) – 60-100 ns
• SSD – 60,000 ns
• Magnetic Storage (“DISK”) – 3,000,000
ns
• SAN devices ~ 15,000,000 ns
12 © 2013 Pythian
13. What about throughput?
• 15K RPM SAS HDD – 120-200MB/s
• PCIe SSD – 1-2GB/s
• But … How many disks do you use?
• Network bandwidth?
• CPU Bus bandwidth?
13 © 2013 Pythian
14. Writes
• Writes on new SSD – 250,000 ns
• Similar to sequential write to disk
How much data can you write to
a new 250GB SSD?
14 © 2013 Pythian
15. Deletes
• Can’t overwrite data without deleting first
• Can only delete blocks of 128*4K pages
• To Overwrite a page:
– Read 127 pages
– Write 127 to a free block
– Delete old block
– Perform the write we originally requested
• Takes 2ms
• Each cell can only be written 100K times
15 © 2013 Pythian
16. The Controller
• Over-provision SSDs
• Maintain free lists
• Delete and cleanup in background
• Balance use of cells (Wear leveling)
• RAM caching
16 © 2013 Pythian
17. Consequences:
• Write Amplification
– How much data is really written when we write 1MB
– 1 means no overhead
– The closer to 1 the better
• Benchmarks on new SSD are worthless
– Run benchmarks long enough to run out of
overprovisioned space
17 © 2013 Pythian
18. Will Talk About:
• IO Performance
• Using SSDs for
Oracle
• How Exadata and
ODA uses SSDs
• SSD devices
• Practice: Reading
SSD Vendor Specs
18 © 2013 Pythian
19. Redo Logs
A: Redo log writes are sequential writes and
therefore won’t benefit from SSD
B: Log file sync times are critical to Oracle
performance. Therefore placing redo logs on SSD
will have dramatic impact on performance.
19 © 2013 Pythian
20. Don’t use SSD for redo if:
• You don’t have “log file sync” related
performance problems
• You have dedicated disks for each redo log
• Even better if multiple disks, striped.
• Your SAN is well configured and has ample
caching
• You have RAC and no shared SSDs
20 © 2013 Pythian
21. SSD can make Redo faster if:
• You are suffering from high ―log file parallel
write‖
• And your storage admin won’t even discuss it
• Redo is on LUN shared with:
– Redo from multiple databases
– Other services (SAP, etc)
• Not enough cache on storage array
• Storage network is a bottleneck
21 © 2013 Pythian
23. Should you place data on SSD?
• SSD solves IO latency problems
• If ―DB File Sequential Read‖ is not in your top 5
wait events, you probably don’t need your data
on SSD.
• If you don’t maximize RAM use for buffer cache
– don’t get SSD (yet)
• If your CPU utilization is high, solve this first.
23 © 2013 Pythian
24. Not enough space?
• Move most active segments
• Random reads get most benefits from SSD
• Active indexes with unique-scans
• Fewer writes is better
• AWR has IO statistics per segment
• https://github.com/gwenshap/Oracle-DBA-
Scripts/blob/master/SSD.sql
24 © 2013 Pythian
25. Why Choose?
• SAN Devices that contain both HDD and SSD
• Smart controllers move most active data to SSD
automatically.
• Pros: No need to choose and manually migrate
data
• Cons: Your most active data will move without
advanced notice
25 © 2013 Pythian
26. Top Mistakes
• Using SSD for production and HDD for Standby
– If production needs SSD…
– Good chance that standby will fall behind
• Database Smart Flash Cache
26 © 2013 Pythian
27. Database Smart Flash Cache
SGA If block is
needed, it is
Block read from
read from SSD
disk
Block evicted
from SGA is
written to
SSD cache
Disk by DBWR Flash Cache
27 © 2013 Pythian
28. Database Smart Flash Cache
• Pros:
– Automatically keeps active data in SSD
• Cons:
– Large overhead for managing cache, all taken from SGA
– Overhead for DBWR
– No benefit and some overhead for writes
– Only one SSD device
Using Smart Flash Cache will make your IO faster
than using just disks, but smartly placing data on
SSD will be even faster.
28 © 2013 Pythian
29. Will Talk About:
• IO Performance
• Using SSDs for
Oracle
• How Exadata and
ODA uses SSDs
• SSD devices
• Practice: Reading
SSD Vendor Specs
29 © 2013 Pythian
30. Exadata has LOTS of SSD
• Quarter rack has 3 storage cells
• Each with 4 Sun Flash Accelerator F40
• 400GB * 4 * 3 = 4.8TB
• 21.5GB/s throughput
• 375,000 IOPS
• Note that IB will limit you to 4GB/s per DB node
30 © 2013 Pythian
31. Exadata Smart Flash Logging
• Redo log writes are written to disk and SSD
together.
• Log sync is finished when one write is
successful.
• Can’t Lose.
• Can’t try that at home
• This improves performance for redo when disks
are busy with high throughput operations
31 © 2013 Pythian
32. Exadata Smart Flash Cache
• Not same as DB Smart Flash Cache
• SSDs are on storage cells
• SSD on Exadata can also be used as ASM disks
and not cache.
32 © 2013 Pythian
33. Exadata Smart Flash Cache
• Reading un-cached data:
1. Un-cached data is read
from disk first
2. Sent to the database
3. and then copied to cache
Cellsrv Database
Disks SSD Cache
33 © 2013 Pythian
34. Exadata Smart Flash Cache
• Cached reads:
– Read from disk and SSD simultaneously
– Whichever returns first
– Effectively increase read throughput
– Smart scans mostly
read from disk Cellsrv Database
– Except for objects
using ―cell_flash_cache‖
KEEP clause.
SSD Cache
Disks
34 © 2013 Pythian
35. Exadata Smart Flash Cache
• Writes:
– Write through cache
– Writes go to disk first
– Then copied to cache, sometimes
– Indexes and tables with random IO
Cellsrv Database
– ALTER TABLE customers STORAGE
(CELL_FLASH_CACHE KEEP)
Disks SSD Cache
35 © 2013 Pythian
36. Exadata Smart Flash Cache
• Writes:
– Write back cache
– Writes go to SSD first
– Then copied to disk, eventually
Cellsrv Database
Disks SSD Cache
36 © 2013 Pythian
37. ODA and SSD
• ―Four 2.5-inch 200 GB SAS-2 SLC SSDs
per shelf for database redo logs ―
• Allows multiple databases on ODA
• Reduces risk of disk bottlenecks
37 © 2013 Pythian
38. Will Talk About:
• IO Performance
• Using SSDs for
Oracle
• How Exadata and
ODA uses SSDs
• SSD devices
• Practice: Reading
SSD Vendor Specs
38 © 2013 Pythian
39. Interfaces
• SATA
– 32 outstanding IO
– 6Gb/s = 600MB/s
– significant latency
• SAS
– 256 outstanding IO
– 6Gb/s = 600MB/s
– Used on ODA shared
storage
39 © 2013 Pythian
40. Interfaces
• PCIe
– ―Flash‖ ―Accelerator‖
– Multiple 500 MB/s
lanes
– Low latency
– Multiple SAS/SATA
controllers on card
for extra throughput
40 © 2013 Pythian
41. Interfaces
• Fiber
– Use existing enterprise
infrastructures
– Shared storage
– Usual SAN headache
– Mandatory for RAC
41 © 2013 Pythian
42. Will Talk About:
• IO Performance
• Using SSDs for
Oracle
• How Exadata and
ODA uses SSDs
• SSD devices
• Practice: Reading
SSD Vendor Specs
42 © 2013 Pythian
44. Intel SSD 910
identical read/write
latency?
44 © 2013 Pythian
48. Quick Recap
• SSDs make random reads wicked fast
• Writes and deletes are complicated
• Place segments with many random reads on
SSD
• Exadata uses Smart Flash Cache to increase
throughput
• Not all SSDs are the same
• Read specs carefully
48 © 2013 Pythian
49. Thank you – Q&A
To contact us
sales@pythian.com
1-877-PYTHIAN
To follow us
http://www.pythian.com/blog
http://www.facebook.com/pages/The-Pythian-
Group/163902527671
@pythian
http://www.linkedin.com/company/pythian
49 © 2013 Pythian
50. Toolkit – Colour palette
• The theme colours for this template are pre-
loaded. However, if you’re curious this is the
palette:
RGB 0 0 0 RGB 204 204 204 RGB 153 153 153 RGB 255 255 255
RGB 0 119 139 RGB 0 163 173 RGB 255 143 40 RGB 255 210 0 RGB 200 0 0
50 © 2013 Pythian
Editor's Notes http://www.dramexchange.com/service/faqs.aspx#c7 SSD’s base memory unit is a cell, which holds 1 bit in SLC and 2 bits in MLC. Cells are organized in pages (usually 4k) and pages are organized in blocks (512K). Data can be read and written in pages, but is always deleted in blocks. This will become really important in a moment. Vendors don’t share write amplification numbers – but you can use APIs they sometimes provide to check how much data is written when you write 1M This means that write performance is throttled by disk which is why Exadata can do 60 reads for each write. Very very Very very Very very 4 * 6Gb/s = 4 * 600MB/s = 2.4GB/s8 * 500MB/s = 4GB/s