SlideShare a Scribd company logo
1 of 38
More IOPS Please
DRMC’s VMware View Implementation Using
              Nexenta

             Keith Brennan
             October 2011



                                          S
Delano Regional Medical
        Center

            S   156 bed community hospital in
                central California.

            S   Four satellite clinics.

            S   Only hospital in a 30 mile
                radius.

            S   Serves approximately 60,000
                people spread over several
                communities.

            S   80%+ of our patients are Medi-
                Cal or Medicare.
                S Government doesn’t pay
                   well.
DRMC’s Clinical Staff
The Great Directive of 2009


S Need to deploy 150 new desktops in support
  of a Clinical Documentation implementation.
S Do it as cheaply as possible.

S Oh, by the way, you’re losing an FTE due to
  budget cuts.
“Never let a good crisis go to
     waste.”                     –Rahm Emmanuel




S Used this “Opportunity” to justify moving to VDI.
  S Users resistant to using something other than a traditional
     desktop.
     S   Perceived lack of freedom.
     S   Perceived increase in “Big Brother.”

S Why I wanted the transition to VDI
  S Ease of management.
  S We had a set, well defined, integrated, desktop experience.
  S Wanted a way to deliver the same experience in a controlled
     manner to a myriad of devices. IOS, Android, etc.
I Need Storage!

S My Existing EMC CX500 was barely cutting it for 3 ESX
  hosts w/ a combined 32 VM’s.

S Lots of people on the Virtualization forums liked NetApp.

S NetApp had just published a white paper on a 750 View
  virtual desktop deployment on a FAS 2050a.
  S Near normal desktop load times.
  S Seamless user experience.
Well That’s Timely!


S The next week another vendor calls letting me know that
  IBM is running a huge storage sale.

S It includes their N series of network attached storage.
  S Rebadged NetApps.

S Three weeks later a N3600, a rebadged NetApp 2050a,
  arrives.

S It is setup identically to the VDI whitepaper’s setup.
Implementation Guidelines

S Linked clones are to be used whenever possible.
   S Ease of maintenance
   S Ease of provisioning

S No user data to be stored on the VM’s.

S Significant patching shall be done through the Golden Image
  and VM’s will be re-provisioned with using the updated image.

S AV will run on the VM’s but only in real-time scan mode. No
  scheduled system scans.
Initial Testing

S Two Hosts with 25 VM’s each.
   S One connected to the N3600 via ISCSI
   S The other via NFS.

S Test lab of 25 thin clients.

S Good performance.
   S Equivalent to a desktop of the previous generation.
   S Quick user logins due to the VM’s being always on and waiting.
   S The N3600 is maintaining low utilization.
   S NFS and ISCI exhibit similar speed.
Go Live!

S Five additional ESX Hosts are deployed.
  S Each hosts ~25 VM’s
  S Current setup gives me N+2 host redundancy.

S For the first week everything looks good.

S User complaints are primarily with the clinical application.

S N3600 is handling it well. Running at about 35%
  utilization.
  S ~1.5k IOPs/Sec of regular background chatter.
  S VM’s report average latency of 12ms.
Disaster!
            For me they happen seem to happen in threes.




S First AV engine update happens 1 week after go live.
  S AV server pushes it to all clients at once.
  S The simultaneous update of all the View VM’s forces the
    SAN to a crawl for 3 hours.
  S Users complain that the Virtual Desktops are unusable.
  S Temporarily corrected the problem by only allowing the AV
    to update 3 machines at once.
  S This worked like a champ until a dot version update on the
    AV server a month later broke that setting.
     S   Another 3 hour “downtime.”
Disaster (cont)


S Three days later a helpdesk tech forces the simultaneous
  reprovisioning of 60 of the View VM’s at once.
  S Was applying an application patch.
  S Was trained not to restart more than 5 VM’s at once.
     S   That obviously didn’t stick!
  S That was another hour of the SAN crawling.
  S Once again, users complain that the system was unusable
     during this time.
Disaster! (yet again)

S .net 3.5 service pack is approved for deployment.

S SP is large. >100mb.

S Set to deploy starting at 2am and only on restart.
  S At 04:15 four VM’s restart within one minute of each other.
  S N3600 starts to lag.
  S Users seeing their system running slow decide to restart.

S At 5am I get the call regarding the issue.

S I immediately disabled the SP deployment.
  S Still took an hour for the N3600 to catch up.
My Users Aren’t Happy
What’s Going On???


         S Oh $41+…
          S General use chatter is
            eating my bandwidth.
          S N3600 CPU utilization is
            regularly now above
            50%.
          S Disk utilization rarely
            drops below 40%.
          S Average disk latency
            >18ms.
I Have a Problem

S   I’m maxing performance with just day to day operations.
S IBM has verified that the appliance is functioning
    properly.
    S In other words, this is all I’m going to get out of it.
    S Adding disks might help some, but too costly!
      S Additional Tray would be $15k!
      S SAS drives to populate it are almost $1k each!
      S Still have CPU limitations.
      S NIC Limitations (2 – 1gbe links per head)

S Did I mention that I have no money left in the budget?
Nexenta to the Rescue


S Had just installed Nexenta Core for my home file server.

S Time to find some hardware:
  S Pulled a box out of the View cluster.
  S Installed six Intel SSD’s.
  S Installed Nexenta Core. (yeah, I know.. EULA..)
  S Created the volume and shared via NFS.
  S The next day my poor brain figured out that I could have just
      done a Nexenta VM. Doh!

S Over the next week I migrated half the virtual desktops over.
Its like Night and Day

S Average latency drops
  from 18ms to 2ms.
S Write throughput
  quadruples.
S Read throughput
  doubles.
S 20x improvement on 4k
  iops!
They Like me Again!
Time For a Full Nexenta
         Implementation

S I was able to secure $45k capital for the next year.
  S Normally this would just draw laughter when talking about
      storage.

S I also intend on replacing the existing EMC.
  S Annual maintenance too costly.
  S I despise the fact that I have to call them out every time I want
      to connect a new piece of hardware to it.

S Still some questioning from higher-ups on this whole open-
   storage thing.
Final Solution Hardware

S 2x Supermicro dual Xeon servers with 96gb ram.

S 1x DataOn 1600 JBOD
  S Houses twenty one 1tb nearline SAS drives.

S 1x DataOn 1620 JBOD
  S Houses seventeen 300gb 10k rpm SAS drives

S 2x Stec ZeusRam

S 8x 160gb Intel 320 SSD’s
Hardware Diagram
Why DataOn?


S Disk Shelf Manager
   S One thing Nexenta lacked
     was a way to monitor the
     JBoD’s
   S How could one of my techs
     know how which drive to
     pull?

S Intuitive slot lighting.

S They’re responsive even
   after the sale is made!
Why Nexenta?


S Its good to have on demand support.
  S I am the only member of our technical staff that has a basic
    understanding of storage architectures.
  S I like to have the ability to go on vacation from time to time!

S Its good to have experts for unique problems.

S Regular tested bug-fixes.

S Its always nice to have someone’s neck to wring!
The End Result

S 2ms latency.

S 500 mb/s reads

S 200 mb/s writes

S Happy Users!

S Note: Benchmark was
  done on production
  system with 175 active
  VM’s.
To Dedup or Not to Dedup


S Dedup can give you huge storage savings.
  S I had 14x Dedup ratio on my VDI volume.

S Inline dedup saves on disk write IO.
  S It’ll still hit the ZIL, but won’t be written to disk if it is
     determined to be duplicated data.
     S   Instead of a 4+kb write you get a sub 256 byte metadata write.
To Dedup or Not to Dedup

S Ram Hog!
  S For good performance you need enough ram to store the
      dedup table.
      S   Uses ARC for this, which means you will have less room for
          cached data.

S Potential for hash collision.
  S Odds are astronimcal, but still a chance for data corruption.

S Dedup performance penalty.
  S Small IOPS suffer.
Dedup Perfomance Penalty

 Dedup Enabled   No Dedup
Is Dedup Worth it?

S If you’re using a “Golden Image” - No.
  S VMDC Plugin provides great efficiency by only storing one
    copy of the Golden Image vs one for each pool of VM’s.
  S Compression is virtually free and will do a good job of
    making up the difference in the “new” blocks.
  S Disk is cheap.

S If you’re doing a bunch of P2V desktop migrations -
  Maybe.
  S If the desktops are poorly configured, or have other aspects
    that can cause excessive I/O than no.
  S If the desktops are similar and large, then sure.
Compression


S Use it. Unless you’re using a 5 year old processor, there
  will be no noticeable performance hit.
  S On by default in Nexenta 3.1
  S Compresses before write. Saves disk bandwidth!
Cache is Key!


S Between the the 70gb of arc and 640gb of l2arc the read cache
  is hit almost 98% of the time!




S This equates to sub 2ms average disk latency to the end user.

S Beats the crud out of the >15ms average latency of the N3600!

S Know your working set. You could get away with a lot
  smaller or need a lot larger cache.
Latency Under Stress
Gig-E vs TenGig-E
Gig-E vs TenGig-E


S Obvious differences in maximum throughput.

S Small IOP differences are mainly attributable to network
  latency differences.

S If you’re stuck with Gig-E go use 802.3ad trunk groups.
  S Still stuck with 100 mb/s throughput but no one ESX host
     will saturate the link for the rest.
Gig-E vs TenGig-E - User
            Perspective

S Average time from the “Power On VM” command being
    issued to the user is able to login:
    S 10gbe: 23 seconds
    S 1gbe: 32 seconds

S Time from when user presses “login” button until the
    desktop is ready to use:
    S 10gbe: 5 seconds
    S 1gbe: 9 seconds

*Windows 7, 2 procs, 2gb ram, DRMC’s Standard Clinical Image
Final Thought – All SSD
            Goodness

S For deployments of Linked Clones or VM’s off of a Golden
   Image.

S Allows you to get rid of the L2ARC.

S Use a good ZIL Device (STEC ZeusRam, DDRDrive)
  S Allows for sequential writes to the SSD’s in the pool.
      S   Saves on write wear which is a SSD killer.
          S My first test box with the x25m SSD’s started suffering after
             about 3 months.

S If you want HA you have to use SAS drives.
Takeaway:

Latency is
  Key!!
             S
keith@drmc.com
 661-721-5650
  Feel free to contact me.




                             S

More Related Content

Similar to OSS Presentation DRMC by Keith Brennan

Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 NotesRoss Lawley
 
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationC* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationDataStax Academy
 
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...Concentrated Technology
 
Development to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB ClustersDevelopment to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB ClustersSeveralnines
 
Pl2017 High Availability in GCE
Pl2017 High Availability in GCEPl2017 High Availability in GCE
Pl2017 High Availability in GCEAllan Mason
 
High Availability in GCE
High Availability in GCEHigh Availability in GCE
High Availability in GCECarmen Mason
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central paJoseph D'Antoni
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...DataStax
 
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...Unidesk Corporation
 
Optimize Your Hardware for Drupal
Optimize Your Hardware for DrupalOptimize Your Hardware for Drupal
Optimize Your Hardware for DrupalChristoph Weber
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Community
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld
 
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...OpenNebula Project
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraCeph Community
 
IT Made Me Virtualize Essbase and Performance Sucks
IT Made Me Virtualize Essbase and Performance SucksIT Made Me Virtualize Essbase and Performance Sucks
IT Made Me Virtualize Essbase and Performance SucksUS-Analytics
 

Similar to OSS Presentation DRMC by Keith Brennan (20)

Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 
Good virtual machines
Good virtual machinesGood virtual machines
Good virtual machines
 
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationC* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
 
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
 
Development to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB ClustersDevelopment to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB Clusters
 
poster
posterposter
poster
 
Pl2017 High Availability in GCE
Pl2017 High Availability in GCEPl2017 High Availability in GCE
Pl2017 High Availability in GCE
 
High Availability in GCE
High Availability in GCEHigh Availability in GCE
High Availability in GCE
 
The Smug Mug Tale
The Smug Mug TaleThe Smug Mug Tale
The Smug Mug Tale
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central pa
 
SSD-Bondi.pptx
SSD-Bondi.pptxSSD-Bondi.pptx
SSD-Bondi.pptx
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
 
Optimize Your Hardware for Drupal
Optimize Your Hardware for DrupalOptimize Your Hardware for Drupal
Optimize Your Hardware for Drupal
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
 
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 
Implementing dr w. hyper v clustering
Implementing dr w. hyper v clusteringImplementing dr w. hyper v clustering
Implementing dr w. hyper v clustering
 
IT Made Me Virtualize Essbase and Performance Sucks
IT Made Me Virtualize Essbase and Performance SucksIT Made Me Virtualize Essbase and Performance Sucks
IT Made Me Virtualize Essbase and Performance Sucks
 

More from OpenStorageSummit

OSS Presentation Keynote by Jason Hoffman
OSS Presentation Keynote by Jason HoffmanOSS Presentation Keynote by Jason Hoffman
OSS Presentation Keynote by Jason HoffmanOpenStorageSummit
 
OSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOpenStorageSummit
 
OSS Presentation Keynote by Evan Powell
OSS Presentation Keynote by Evan PowellOSS Presentation Keynote by Evan Powell
OSS Presentation Keynote by Evan PowellOpenStorageSummit
 
OSS Presentation by Kevin Halgren
OSS Presentation by Kevin HalgrenOSS Presentation by Kevin Halgren
OSS Presentation by Kevin HalgrenOpenStorageSummit
 
OSS Presentation by Stefano Maffulli
OSS Presentation by Stefano MaffulliOSS Presentation by Stefano Maffulli
OSS Presentation by Stefano MaffulliOpenStorageSummit
 
OSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OSS Presentation Metro Cluster by Andy Bennett & Roel De FreneOSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OSS Presentation Metro Cluster by Andy Bennett & Roel De FreneOpenStorageSummit
 
OSS Presentation DDR Drive ZIL Accelerator by Christopher George
OSS Presentation DDR Drive ZIL Accelerator by Christopher GeorgeOSS Presentation DDR Drive ZIL Accelerator by Christopher George
OSS Presentation DDR Drive ZIL Accelerator by Christopher GeorgeOpenStorageSummit
 

More from OpenStorageSummit (8)

OSS Presentation Keynote by Jason Hoffman
OSS Presentation Keynote by Jason HoffmanOSS Presentation Keynote by Jason Hoffman
OSS Presentation Keynote by Jason Hoffman
 
OSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal Stern
 
OSS Presentation Keynote by Evan Powell
OSS Presentation Keynote by Evan PowellOSS Presentation Keynote by Evan Powell
OSS Presentation Keynote by Evan Powell
 
OSS Presentation by Kevin Halgren
OSS Presentation by Kevin HalgrenOSS Presentation by Kevin Halgren
OSS Presentation by Kevin Halgren
 
OSS Presentation by Stefano Maffulli
OSS Presentation by Stefano MaffulliOSS Presentation by Stefano Maffulli
OSS Presentation by Stefano Maffulli
 
OSS Presentation Arista
OSS Presentation AristaOSS Presentation Arista
OSS Presentation Arista
 
OSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OSS Presentation Metro Cluster by Andy Bennett & Roel De FreneOSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
 
OSS Presentation DDR Drive ZIL Accelerator by Christopher George
OSS Presentation DDR Drive ZIL Accelerator by Christopher GeorgeOSS Presentation DDR Drive ZIL Accelerator by Christopher George
OSS Presentation DDR Drive ZIL Accelerator by Christopher George
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

OSS Presentation DRMC by Keith Brennan

  • 1. More IOPS Please DRMC’s VMware View Implementation Using Nexenta Keith Brennan October 2011 S
  • 2. Delano Regional Medical Center S 156 bed community hospital in central California. S Four satellite clinics. S Only hospital in a 30 mile radius. S Serves approximately 60,000 people spread over several communities. S 80%+ of our patients are Medi- Cal or Medicare. S Government doesn’t pay well.
  • 4. The Great Directive of 2009 S Need to deploy 150 new desktops in support of a Clinical Documentation implementation. S Do it as cheaply as possible. S Oh, by the way, you’re losing an FTE due to budget cuts.
  • 5. “Never let a good crisis go to waste.” –Rahm Emmanuel S Used this “Opportunity” to justify moving to VDI. S Users resistant to using something other than a traditional desktop. S Perceived lack of freedom. S Perceived increase in “Big Brother.” S Why I wanted the transition to VDI S Ease of management. S We had a set, well defined, integrated, desktop experience. S Wanted a way to deliver the same experience in a controlled manner to a myriad of devices. IOS, Android, etc.
  • 6. I Need Storage! S My Existing EMC CX500 was barely cutting it for 3 ESX hosts w/ a combined 32 VM’s. S Lots of people on the Virtualization forums liked NetApp. S NetApp had just published a white paper on a 750 View virtual desktop deployment on a FAS 2050a. S Near normal desktop load times. S Seamless user experience.
  • 7. Well That’s Timely! S The next week another vendor calls letting me know that IBM is running a huge storage sale. S It includes their N series of network attached storage. S Rebadged NetApps. S Three weeks later a N3600, a rebadged NetApp 2050a, arrives. S It is setup identically to the VDI whitepaper’s setup.
  • 8. Implementation Guidelines S Linked clones are to be used whenever possible. S Ease of maintenance S Ease of provisioning S No user data to be stored on the VM’s. S Significant patching shall be done through the Golden Image and VM’s will be re-provisioned with using the updated image. S AV will run on the VM’s but only in real-time scan mode. No scheduled system scans.
  • 9. Initial Testing S Two Hosts with 25 VM’s each. S One connected to the N3600 via ISCSI S The other via NFS. S Test lab of 25 thin clients. S Good performance. S Equivalent to a desktop of the previous generation. S Quick user logins due to the VM’s being always on and waiting. S The N3600 is maintaining low utilization. S NFS and ISCI exhibit similar speed.
  • 10. Go Live! S Five additional ESX Hosts are deployed. S Each hosts ~25 VM’s S Current setup gives me N+2 host redundancy. S For the first week everything looks good. S User complaints are primarily with the clinical application. S N3600 is handling it well. Running at about 35% utilization. S ~1.5k IOPs/Sec of regular background chatter. S VM’s report average latency of 12ms.
  • 11. Disaster! For me they happen seem to happen in threes. S First AV engine update happens 1 week after go live. S AV server pushes it to all clients at once. S The simultaneous update of all the View VM’s forces the SAN to a crawl for 3 hours. S Users complain that the Virtual Desktops are unusable. S Temporarily corrected the problem by only allowing the AV to update 3 machines at once. S This worked like a champ until a dot version update on the AV server a month later broke that setting. S Another 3 hour “downtime.”
  • 12. Disaster (cont) S Three days later a helpdesk tech forces the simultaneous reprovisioning of 60 of the View VM’s at once. S Was applying an application patch. S Was trained not to restart more than 5 VM’s at once. S That obviously didn’t stick! S That was another hour of the SAN crawling. S Once again, users complain that the system was unusable during this time.
  • 13. Disaster! (yet again) S .net 3.5 service pack is approved for deployment. S SP is large. >100mb. S Set to deploy starting at 2am and only on restart. S At 04:15 four VM’s restart within one minute of each other. S N3600 starts to lag. S Users seeing their system running slow decide to restart. S At 5am I get the call regarding the issue. S I immediately disabled the SP deployment. S Still took an hour for the N3600 to catch up.
  • 15. What’s Going On??? S Oh $41+… S General use chatter is eating my bandwidth. S N3600 CPU utilization is regularly now above 50%. S Disk utilization rarely drops below 40%. S Average disk latency >18ms.
  • 16. I Have a Problem S I’m maxing performance with just day to day operations. S IBM has verified that the appliance is functioning properly. S In other words, this is all I’m going to get out of it. S Adding disks might help some, but too costly! S Additional Tray would be $15k! S SAS drives to populate it are almost $1k each! S Still have CPU limitations. S NIC Limitations (2 – 1gbe links per head) S Did I mention that I have no money left in the budget?
  • 17. Nexenta to the Rescue S Had just installed Nexenta Core for my home file server. S Time to find some hardware: S Pulled a box out of the View cluster. S Installed six Intel SSD’s. S Installed Nexenta Core. (yeah, I know.. EULA..) S Created the volume and shared via NFS. S The next day my poor brain figured out that I could have just done a Nexenta VM. Doh! S Over the next week I migrated half the virtual desktops over.
  • 18. Its like Night and Day S Average latency drops from 18ms to 2ms. S Write throughput quadruples. S Read throughput doubles. S 20x improvement on 4k iops!
  • 19. They Like me Again!
  • 20. Time For a Full Nexenta Implementation S I was able to secure $45k capital for the next year. S Normally this would just draw laughter when talking about storage. S I also intend on replacing the existing EMC. S Annual maintenance too costly. S I despise the fact that I have to call them out every time I want to connect a new piece of hardware to it. S Still some questioning from higher-ups on this whole open- storage thing.
  • 21. Final Solution Hardware S 2x Supermicro dual Xeon servers with 96gb ram. S 1x DataOn 1600 JBOD S Houses twenty one 1tb nearline SAS drives. S 1x DataOn 1620 JBOD S Houses seventeen 300gb 10k rpm SAS drives S 2x Stec ZeusRam S 8x 160gb Intel 320 SSD’s
  • 23. Why DataOn? S Disk Shelf Manager S One thing Nexenta lacked was a way to monitor the JBoD’s S How could one of my techs know how which drive to pull? S Intuitive slot lighting. S They’re responsive even after the sale is made!
  • 24. Why Nexenta? S Its good to have on demand support. S I am the only member of our technical staff that has a basic understanding of storage architectures. S I like to have the ability to go on vacation from time to time! S Its good to have experts for unique problems. S Regular tested bug-fixes. S Its always nice to have someone’s neck to wring!
  • 25. The End Result S 2ms latency. S 500 mb/s reads S 200 mb/s writes S Happy Users! S Note: Benchmark was done on production system with 175 active VM’s.
  • 26. To Dedup or Not to Dedup S Dedup can give you huge storage savings. S I had 14x Dedup ratio on my VDI volume. S Inline dedup saves on disk write IO. S It’ll still hit the ZIL, but won’t be written to disk if it is determined to be duplicated data. S Instead of a 4+kb write you get a sub 256 byte metadata write.
  • 27. To Dedup or Not to Dedup S Ram Hog! S For good performance you need enough ram to store the dedup table. S Uses ARC for this, which means you will have less room for cached data. S Potential for hash collision. S Odds are astronimcal, but still a chance for data corruption. S Dedup performance penalty. S Small IOPS suffer.
  • 28. Dedup Perfomance Penalty Dedup Enabled No Dedup
  • 29. Is Dedup Worth it? S If you’re using a “Golden Image” - No. S VMDC Plugin provides great efficiency by only storing one copy of the Golden Image vs one for each pool of VM’s. S Compression is virtually free and will do a good job of making up the difference in the “new” blocks. S Disk is cheap. S If you’re doing a bunch of P2V desktop migrations - Maybe. S If the desktops are poorly configured, or have other aspects that can cause excessive I/O than no. S If the desktops are similar and large, then sure.
  • 30. Compression S Use it. Unless you’re using a 5 year old processor, there will be no noticeable performance hit. S On by default in Nexenta 3.1 S Compresses before write. Saves disk bandwidth!
  • 31. Cache is Key! S Between the the 70gb of arc and 640gb of l2arc the read cache is hit almost 98% of the time! S This equates to sub 2ms average disk latency to the end user. S Beats the crud out of the >15ms average latency of the N3600! S Know your working set. You could get away with a lot smaller or need a lot larger cache.
  • 34. Gig-E vs TenGig-E S Obvious differences in maximum throughput. S Small IOP differences are mainly attributable to network latency differences. S If you’re stuck with Gig-E go use 802.3ad trunk groups. S Still stuck with 100 mb/s throughput but no one ESX host will saturate the link for the rest.
  • 35. Gig-E vs TenGig-E - User Perspective S Average time from the “Power On VM” command being issued to the user is able to login: S 10gbe: 23 seconds S 1gbe: 32 seconds S Time from when user presses “login” button until the desktop is ready to use: S 10gbe: 5 seconds S 1gbe: 9 seconds *Windows 7, 2 procs, 2gb ram, DRMC’s Standard Clinical Image
  • 36. Final Thought – All SSD Goodness S For deployments of Linked Clones or VM’s off of a Golden Image. S Allows you to get rid of the L2ARC. S Use a good ZIL Device (STEC ZeusRam, DDRDrive) S Allows for sequential writes to the SSD’s in the pool. S Saves on write wear which is a SSD killer. S My first test box with the x25m SSD’s started suffering after about 3 months. S If you want HA you have to use SAS drives.
  • 38. keith@drmc.com 661-721-5650 Feel free to contact me. S