Archive

Posts Tagged ‘IOPS’

ThinIO Public Beta is go!

September 15, 2014 Leave a comment

logoLets get right to it!

Warm up your labs or fire up your golden images ladies and gents, we’re delighted to announce ThinIO’s brief public beta will begin today!

This project has taught us some really interesting things about Windows IO, how Windows behaves and how the hypervisor and storage can behave. This project really felt like a David vs. Goliath task as we (members of our community with a desire to simplify this issue) attempted to tackle one of the largest issues in our industry, storage bottlenecks and Windows desktops.

What’s really unique about our approach is there are no hardware lead times, no architecture changes needed and no external dependencies. ThinIO can be installed in seconds and the benefits are seen immediately.

Read more…

ThinIO, taking a peak under the covers.

logoWhat a busy few weeks, Citrix Synergy already feels like a distant memory. We had a great trip and were dumbfounded by the interest and excitement shown by enthusiasts, customers and vendors around our ThinIO solution, with quite a few people insisting on seeing the inner mechanics and trying to break our demo’s to ensure the figures they saw were legit!

For those unfortunate enough to miss synergy or our Webinar with Erik over at XenAppBlog, here’s a little blog post you will find interesting as I walk you through the inners of ThinIO and why it’s so simple to deliver disk access with RAM speeds without any of the complexity.

 

What is ThinIO?

 

ThinIO is a filter driver which operates at a block level, inline between the windows and the disk.

ThinIO sits in the operating system layer and can be used on windows desktop operating systems or server based computing models.

ThinIO delivers a greatly reduced IO footprint on your storage, while also speeding up core items like boot times and login times. ThinIO also helps standardize the peaks your storage will get hit by at busy periods during the day. Ultimately this allows you to size your storage for an average, as opposed to sizing for the worst case scenario during peaks.

How does it work?

 

When ThinIO starts up, it allocates a configurable cache of reserved RAM to perform its optimisations.

Being the last filter in the stack, ThinIO can still allow windows to perform it’s own optimisation on IO, delivering value by catching the read and write IO’s just as they hit the disk.

ThinIO interacts with block data as read’s and writes traverse the cache. As a read is observed it is retrieved from disk and subsequently stored for future use, meaning any subsequent read will be served directly from cache.

But read’s a boring, and everyone has a solution for read caching. ThinIO also treats this RAM cache as a storage area for write IO. Write IO is committed nearly instantly to the cache and no IO is sent down to the disk while free space is available in the cache.

“But what about if the machines run out of RAM?”

 

Well I’m glad you asked! The cache in ThinIO is hard set at a value you configure, so RAM will never be taken from the cache to service other processes. But in the situation where the cache has become 100% volatile write, ThinIO will then begin to spill over to the local disk allowing the Virtual Machine continue to operate.

There’s more, ThinIO actively manages cache contents to ensure it’s as relevant as possible. As the cache begins to fill, ThinIO’s Lazy Page Writer can identify and flush out blocks that have not been frequently used. This allows you to use a relatively small cache size while still deliver the big numbers we’ll discuss later.

Designed to be fool proof:

 

ThinIO’s GUI is fool proof, it’s intuitive and gives you a really quick view of ThinIO Realtime. ThinIO provides graphical representations of stats on reads, writes and cache usage, as well as an immediate view of the benefit ThinIO has created for the desktop.

The ThinIO console can also remotely connect to machines to ensure you don’t have to disturb the user while checking performance.

 

 

When the cache is enabled, ThinIO also has a realtime statistics window to help you identify disk patterns and cache performance

 

 

Boot and application launch time optimization:

 

ThinIO has some really clever technology built in to optimise the windows boot process and user experience.

During early testing, we observed just how inefficiently windows uses its disk resources during the boot process. Regularly the same files are requested over and over again on boot, if these blocks are non-contiguous, seek times are inherent. Busy servers were requesting up to 80,000 read IOPS during boot and process start.

ThinIO’s Read Ahead feature allows you to teach windows to be less of a storage monster. As the ThinIO cache is already aware of all blocks needed to boot, or even serve the users first launch of their applications, Read Ahead allows you to boot the machine, with a preloaded cache of required blocks, sorted contiguously.

When ThinIO starts up, it identifies the ‘Read Ahead’ configuration file and pauses windows while it reads the required blocks once, in a contiguous pattern around the disk. Once finished, windows continues to boot retrieving the majority of its block data directly from cache.

By doing this, ThinIO was delivering roughly 30% improved boot times while also reducing boot IOPS by over 80% in our testing. In the below graphs, we did a side by side comparison of the windows start-up process with and without ThinIO.

In the Gui below you will see a machine with the ThinIO cache enabled but no read ahead configured, we achieve a good 40% reduction of IOPS on boot and login, which is not bad on it’s own, but we knew we could make it better:

So after configuring a ‘Read Ahead’ configuration by booting a machine, logging in, opening the core set of applications and committing the file we see the following, large improvement of IOPS saving and cache hit rate on read:

 

So there you have it. By taking an additional 3-4 minutes with your golden image, you reduce nearly 30,000 IOPS to roughly 5,000 IOPS while also reducing boot times. Not only have you taken alot of pressure off your storage, if you launched your users applications core files as part of the read ahead configuration, your user’s login speeds will receive a really good boost while making their application launch times near instant.

Once the read ahead is complete, the driver will then start to use the data which is no longer needed for more chatty blocks of read or write, so configuring read ahead has zero impact on cache usage in the longer term.

Deploy, size, done.

 

Out of box, ThinIO takes less than 5 minutes to install and configure delivering you immediate benefits. No hoping, trusting or praying the hardware vendor’s figures are correct. No SAN or LUN type requirement, no hardware lead time, no hypervisor requirements and no change needed to your architecture. Whether you are doing on premises or even cloud SaaS / DaaS, ThinIO installs without any change.

 

Licensing:

 

ThinIO will ship with a 30 day grace period for you to test to your heart’s content without any commitment. If ThinIO is not for you, it’s just a matter of uninstalling it! Keeping in the spirit of the community, ThinIO will even have a free version available!

Ultimately, designing and deploying virtual desktops is difficult, we really wanted to write a product that both delivers and is simple and easy to deploy. We feel we’ve absolutely hit the mark on this and we look forward to opening the program to full deployment in the coming weeks.

Sounds great, how do I learn more?

 

Head on over to the ThinScale Technology web page and read more or register for the private beta.

Categories: ThinScale Tags: , ,

ThinIO, here comes something incredible.

logo

Well we’ve been busy! Very, very busy. In the next week you will see the culmination of  two years work on a product we’re about to release called ThinIO.

Cast your mind back if you will to some ramblings and napkin math I devised some time ago in my series on IOPS negation strategies:

IOPS, Shared Storage and a Fresh Idea. (Part 1)
IOPS, Shared Storage and a Fresh Idea. (Part 2)
IOPS, Shared Storage and a Fresh Idea. (Part 3)

In these post’s a bunch of community guys ( Barry SchifferIain BrightonIngmar VerheijKees BaggermanRemko Weijnen and Simon Pettit) devised a cunning plan during E2EVC to see if we could counter the monotony of IOPS and their devastating impact on Virtual Desktop implementations. We threw together a loose test scenario where we could demonstrate how technology from Microsofts Windows Embedded Standard functionality EWF (Extended Write Filter) and Citrix’s XenServer intellicache with explosive performance and IO reduction statistics.

This blog series got way more attention than we possibly hoped and judging by citrix’s response by adding ram caching and disk overflow in Citrix provisioning services… we were definitely listened to. At the end of the series, I elluded to a technology that could be leveraged to achieve some of this, while right, it has taken along time to get right! With the help of our newest collaborator David Coombes, this technology is very much alive and ready for use.

Here’s the kicker:

Next week at Citrix Synergy, we’re dropping some big news for this market, we’re releasing a product that will deliver insanely fast IOPS to any storage utilising inexpensive RAM. With our product, no architecture change is required, no san volume dependencies, no expensive hardware upgrades and no hypervisor gotcha’s. ThinIO works with all major desktop virtualisation products like XenApp, XenDesktop, VDI in a Box, Microsoft Remote Desktop technologies and even VMware Horizon View!

ThinIO is just a simple installation and off you go. Not only will this product reduce, standardise and improve the speed of IOPS, it will also optimise and reduce boot storms dramatically.

Register for XenAppBlogs webinar here where we’ll discuss how ThinIO works for the first time or come visit us in Citrix Synergy (Booth 513) to celebrate the culmination of 2 years of work and learn how ThinIO is performant, reliable and an extremely cost effective method to deliver lightning fast experience to your users while protecting your disk storage from grinding to a halt.

Watch this space.

Register for Xenappblogs webinar with ThinScale Technology for the official launch of ThinIO

Categories: ThinScale Tags: ,

On IOPS, shared storage and a fresh idea. (Part 2) GO GO Citrix Machine Creation Services.

October 10, 2012 4 comments

Welcome back to part two of this little adventure on Exploring the RAM caching ability of our newly favourite little file system filter driver, the Micrsoft Extended Write Filter. If you missed part 1, you can find it here.

So after the first post, my mailbox blew up with queries on how to do this, how the ram cache weighs up vs pvs and how can you manipulate and “Write Out” to physical disk in a spill over, so before I go any further, lets have a quick look at the EWF.


Deeper into the EWF:


As previously mentioned, the EWF is contained in the operating system of all Windows Embedded Operating systems and also Windows Thin PC. The EWF is a mini filter driver that redirects all reads and writes to memory or local file system depending on where the file currently lives.

In the case of this experiment, we’re using the RAMREG method of EWF. Have a read of that link if you would like more information, but basically the EWF creates an overlay in RAM for the files to live and the configuration is stored in the registry.

Once the EWF is enabled and rebooted, on next login (from an adminstrative command prompt) you can run ewfmgr -all to view the current status of the EWF:





So what happens if I write a large file to the local disk?

Well this basically! Below I have an installer for SQL express which is roughly 250mb,  I copied this file to the desktop from a network share, as you can see below the file has been stuck into the RAM overlay.





And That’s pretty much it! Simple stuff for the moment. When I venture into the API of the EWF in a later post I’ll show you how to write this file out to storage if we are reaching capacity, allowing us to spill to disk and address the largest concern surrounding Citrix Provisioning Server and RAM caching presently.


Next up to the block. Machine Creation Services…


I won’t go into why I think this technology is cool, I covered that quite well in the previous post. But this was the first technology I planned to test with the EWF.

So I created a lab at home consisting of two XenServers and an NFS share with MCS enabled as  below:





Now before we get down to the nitty gritty, let’s remind ourselves of what our tests looked like before and after using the extended write filter in this configuration when booting using and shutting down a single VM:


As with last time, black Denotes Writes, Red Denotes Reads.

 

So looking at our little experiment earlier, we pretty much killed write IOPS on this volume when using the Extended write filter driver. But the read IOPS (in red above) were still quite high for a single desktop.

And this is where I had hoped MCS would step into the fray. Even on the first boot with MCS enabled the read negation was notable:

 

Test performed on a newly built host.

 

At no point do read’s hit as high as the 400 + peak we saw in the previous test. But we still see spikey read IOPS as the Intellicache begins to read and store the image.

So now that we know what a first boot will look like. I kicked the test off again from a pre cached device, this time the image should be cached on the local disk of the XenServer as I’ve booted the image a number of times.

Below are the results of the pre cached test:

 

Holy crap…

 

Well, I was incredibly impressed with this result… But did it scale?

So next up, I added additional desktops to the pool to allow for 10 concurrent desktop (restraints of my lab size).

 

 

Once all the desktops had been booted, I logged in 10 users and decided to send a mass shutdown command from Desktop studio, with fairly cool results:

 

 

And what about the other way around, what about a 10 vm cold boot?

 

Pretty much IO Free boots and shutdowns!

 

So lets do some napkin math for a second.

 

Taking what we’ve seen so far, lets get down to the nitty gritty and look at maximums / averages.

In the first test, minus EWF and MCS, from a single desktop, we saw:

  • Maximum Write IOPS: 238
  • Average Write IOPS: 24
  • Maximum Read IOPS: 613
  • Average Read IOPS: 83

And in the end result, with EWF MCS and even with the workload times 10, we saw the following:

  • Maximum Write IOPS: 26 (2,380*)
  • Average Write IOPS: 1.9 (240*)
  • Maximum Read IOPS: 34 (6130*)
  • Average Read IOPS: 1.3 (830*)

* Denotes original figures times 10

I was amazed by the results, of two readily available technologies coupled together to tackle a problem in VDI that we are all aware of and regularly struggle with.

 

What about other, similar technologies?

Now as VMware and Microsoft have their own technologies similar to MCS (CBRC and CSV cache, respectfully). I would be interested in seeing similar tests to see if their solutions can match the underestimated Intellicache. So if you have a lab with either of these technologies, get in touch and I’ll send you the method to test this.

 

What’s up next:

In the following two blog posts, I’ll cover off the remaining topics:

  • VDI in a Box.
  • EWF API for spill over
  • Who has this technology at their disposal?
  • Other ways to skin this cat.
  • How to recreate this yourself for you own test.



Last up I wanted to address a few quick queries I received via email / Comments:


“Will this technology approach work with Provisioning Server and local disk caching, allowing you to leverage PVS but spill to a disk write cache?”

No, The Provisioning Server filter driver has a higher altitude than poor EWF, so PVS grabs and deals with the write before EWF can see it.

“Couldn’t we just use a RAM disk on the hypervisor?”

Yes, maybe and not yet…

Not yet, with Citrix MCS and Citrix VDI in a Box, Separating the write cache and identity disk from the LUN on which the image is hosted is a bit of a challenge.

Maybe If using Hyper-V v3 with the shared nothing migration, you now have migration options for live vm’s. This would allow you to move the WC / ID from one ram cache to another.

Yes, If using Citrix Provisioning server you could assign the WC to a local storage object on the host the VM lives. This would be tricky with VMware ESXi and XenServer but feel free to give it a try, Hyper-V on the other hand would be extremely easy as many ram disk’s are available online.

“Atlantis Ilio also has inline dedupe, offering more than just ram caching?”

True, and I never meant, even for a second to say this technology from Atlantis was anything but brilliant, but with RAM caching on a VM basis, wouldn’t VMware’s Transparent page sharing also deliver similar benefits? Without the associated cost?

On E2E, Geek speak, IOPS, shared storage and a fresh idea. (Part 1)

October 5, 2012 14 comments


Note: this is part 1 of a 4 post blog.

While attending E2EVC vienna recently, I found myself attending the Citrix Geekspeak session, although the opportunity to rub shoulders with fellow Citrix aficionado’s was a blast, I found myself utterly frustrated spending time talking about storage challenges in VDI.

The topic was interesting and informative, and while there were a few idea’s shared about solid state arrays, read IOPS negation with PVS or MCS there really wasn’t a vendor based, one size fits all solution to reduce both Read and Write IOPS without leveraging paid for (and quite expensive I may add) solutions like Atlantis ILIO, Fusion IO, etc. The conversation was tepid, deflating and we moved on fairly quickly to another topic.

So we got to talking…

In the company of great minds and my good friends Barry Schiffer, Iain BrightonIngmar VerheijKees BaggermanRemko Weijnen and Simon Pettit. Over lunch we got to talking about this challenge and the pro’s and con’s to IO negation technologies.

Citrix Provisioning Server… 

We spoke about the pro’s and con’s delivered by Citrix provisioning server, namely rather than actually reducing read IOPS to shared storage, it merely puts the pressure instead on the network rather than a SAN or local resources in the hypervisor.

The RAM caching option for differencing is really clever with Citrix Provisioning server, but it’s rarely utilised due to the ability to fill the cache in RAM and the ultimate blue screen that will follow this event.

Citrix provisioning server does require quite a bit of pre work to get the PVS server configured, highly available then the education process of learning how to manage images with Provisioning server, it’s more of an enterprise tool.

Citrix Machine Creation Services…

We spoke about the pro’s and con’s of delivered by Citrix Machine Creation Services and Intellicache, this technology is much smarter around caching regular reads to a local device but it’s adoption is mainly SMB and the requirements for NFS and XenServer were a hard sell… Particularly with a certain hypervisors dominance to date.

Machine Creation Services is stupidly simple to deploy, install XenServer, stick the image on an NFS share (even windows servers can host these)… Bingo. Image changes are based on Snapshots so the educational process is a null point.

But again MCS is only negating read IO, and this assumes you have capable disk’s under the hypervizor to run multiple workspaces from a single disk or array. It’s also specific to XenDesktop, sadly. So no hosted shared desktop solution.

We mulled over SSD based storage quite alot with MCS and in agreement we discussed how MCS and Intellicache could be leveraged on an SSD, but the write cache or differencing disk activity would be so write intensive it would not be long before the SSD would start to wear out.

The mission Statement:

So all this got us to thinking, without paying for incredibly expensive shared storage, redirecting the issue onto another infrastructure service, or adding on to your current shared storage to squeeze out those precious few IOPS, what could be done with VDI and storage to negate this issue?

So we got back to talking about RAM:

RAM has been around for a long time, and it’s getting cheaper as time goes by. We pile the stuff into hypervisors for shared server workloads. Citrix provisioning server chews up the stuff for caching images in server side and the PVS client even offers a ram caching feature too, but at present it limit’s the customer to a Provisioning server and XenDesktop or XenApp.

But what if we could leverage a RAM caching mechanism? decoupled from a paid for technology, or provisioning server, that’s freely available and can be leveraged in any hypervisor or technology stack? Providing a catch all, free and easy caching mechanism?

and then it hit me…

Eureka!

This technology has been around for years, cleverly tucked away in Thin Client architecture and Microsoft have been gradually adding more and more functionality to this heavily underestimated driver…

Re Introducing ladies and gents, the Microsoft Extended write filter.

The Microsoft Extended Write Filter has been happily tucked away in the Microsoft embedded operating systems since XP and is fairly unfamiliar to most administrators. The Microsoft extended write filter saves all disk writes to memory (except for specific chunks of registry) resulting in the pc being clean each time it is restarted.

This technology has been mostly ignored unless you are particularly comfortable with Thin Client architectures. Interestingly during my initial research for this mini project, I found that many car pc enthusiasts or early SSD adopters have been hacking this component out of Windows Embedded and installing it into their mainstream windows operating systems to cut down on wear and tear to flash or Solid state disks.

Microsoft have been gradually building this technology with each release adding a powerful API to view cache hits, contents of the cache and even write the cache out to disk as required, in an image update scenario… or even better, when a RAM spill over is about to happen…

If this technology is tried and trusted in the car pc world, could it be translated to VDI?…

To the lab!

Excited with my research Thus far, I decided to take the plunge. I began by extracting the EWF drivers and registry keys necessary to run this driver from a copy of windows Embedded standard…

With a bit of trial, error, hair pulling and programming I managed to get this driver into a clean windows 7 x86 image, so from here I disabled the driver and started testing first without this driver.

(I’m not going to go into how to extract the driver yet, please check back later for a follow up post)

Test A: XenDesktop,  no RAM caching

To start my testing, I went with a XenDesktop image, hosted on XenServer without intellicache. I sealed the shared image on an NFS shared from a windows server. I performed a cold boot, opened a few applications then shutdown again.

I tracked the read and write IOPS Via Windows Perfmon on the shared NFS disk, and below were my findings:

(Black line denotes Write IOPS to shared storage)

Test B: XenDesktop, with the Microsoft Extended Write Filter.

The results below pretty much speak for themselves…

(again, black line denotes write IOPS)

I couldn’t believe how much of a difference this little driver had made to the write IOPS…

Just for ease of comparison, here’s a side by side look at what exactly is happening here:

So where do we go from here:

Well, no matter what you do, Microsoft will not support this option and they’d probably sue the pants off me if I wrapped this solution up for download…

But this exercise on thinking outside of the box raised some really interesting questions… Around other technologies and the offerings they already have in their stack.

In my next few posts on this subject I’ll cover the below topic’s at length and discuss my findings.

  • I’ll be looking at how to leverage this with Citrix Intellicache for truly IO less desktops.
  • I’ll be looking at how this technology could be adopted by Microsoft for their MED-V stack.
  • I’ll be looking at one of my personal favourite technologies, VDI in a Box.
  • I’ll be looking at how we could leverage this API ,for spill over to disk in a cache flood scenario or even a management appliance to control the spill overs to your storage.

And Finally, I’ll be looking at how Citrix or Microsoft could quite easily combine two of their current solutions to provide an incredible offering.

And that’s it for now, just a little teaser for the follow up blog posts which I’ll gradually release before Citrix Synergy.

I would like to thank Barry SchifferIngmar VerheijKees Baggerman and Remko Weijnen. For their help and input on this mini project. It was greatly appreciated.