Marathon Technologies CTO drilled; everrun VM and Citrix's HA virtualization discussed

So I chatted with Jerry, CTO at Marathon once again, and he explained about the everrun VM release.

How does Marathon provide HA to today's loosely coupled virtualization solutions?

As Simon Crosby says, “everRun VM brings fault-tolerant computing to the masses.” Our new software transparently creates and manages what we call “protected virtual machines” running on redundant hosts in a XenServer resource pool. Disk data is mirrored synchronously to redundant storage and redundant networks are managed across the hosts. If a failure occurs in a disk or network device, everRun automatically reconfigures resources to permit the application to continue operating without interruption. And we’re bringing fault-tolerant computing to the masses by making it brain-dead simple. You can deploy everRun VM from bare metal in 30 minutes. I’m not kidding, it really is 30 minutes or less. And once it’s up and running we automate everything so that it practically runs itself.

What are we announcing with everRun VM? Is it a software appliance? A virtual machine?

everRun VM is loaded as a virtual appliance that we call the everRun Availability Manager. The Availability Manager runs within the XenServer environment as a purpose-built appliance that protects other virtual machines on the host. The Availability Manager establishes a tightly-coupled relationship to the virtual machines it protects. It resides in the data path between the virtual machine and the control domain, which handles I/O for the virtual machines.

This approach has three key advantages over using clustering or failover technologies for virtualization HA. These include highest availability, highest reliability, and extremely simple administration. everRun achieves these advantages through its unique software appliance architecture. High availability is accomplished using our ComputeThru technology. Device failures are automatically detected and managed by redirecting IO to correctly operating redundant devices without losing I/O transactions or interrupting the application. That means that applications compute through IO failures without interruption – that’s very different from a cluster technology or something like VMware HA.

everRun’s reliability is also a virtue of the an architecture that yields much better and more accurate insight into the health and activity of the protected virtual machine. With access to the data path, everRun can rapidly identify errors in specific devices and recover operations in a discrete and responsive manner, rather than incurring a complete failover process for every class of error. Many cluster class systems can fail just when you need them most. This happens when they attempt a failover and the devices on the surviving host are not operational or incorrectly configured. In contrast, everRun is actively operating redundant devices across two hosts, recovery is always assured because the device states are well known.

As far as simplicity goes, protecting a virtual machine is as simple as running a short wizard to describe what you want protected and then your done. everRun controls virtual machine operations transparently – there’s no need to script or change the application. Once a virtual machine is protected, there’s nothing else to do. You can really just walk away.

If I am setting up a data center with Citrix XenServer's, how will Marathon's everRun VM help in keeping my clients data completely and always highly available?

Great question. By giving you redundant fault tolerant VMs everRun VM prevents outages. And with synchronous mirroring of your network, storage and data, it prevents any data loss. Because it’s completely automated it eliminates the admin errors that are so often the source of downtime. And finally, since everRun VM works over a wide area network, your servers in the pair or pool that we’re protecting can be located in different data centers. So even if a whole data center goes down you still have 100% data protection and high availability.

Please tell us about all the 3 levels of availability?

The first level of availability protection is what we refer to as Basic HA. This is a basic failover functionality similar to VMware HA that fails the VM over to a second server if the first one has a problem. It’s a standard “best effort” recovery that is best suited for providing limited high availability for applications like file/print servers.

The second level of protection is what we refer to as component-level fault tolerance. At this level, if any I/O component such as a network adapter, storage adaptor or storage device fails, everRun VM automatically reconfigures resources to permit the VM to continue operating without interruption. We think this is the level that companies will use to protect business critical applications and others that they rely on such as Microsoft Exchange, SQL and SharePoint.

The third level is for when you need “transaction protection” to ensure you don’t lose any transactions, is system-level fault tolerance. At this level the VM is running identically in lockstep on two hosts, so even if the host goes down, you still have continuous availability. We expect this will be reserved for protecting mission-critical applications. We will sell this third level as an add-on for everRun VM, called the LockStep Option.

The coolest thing about how we are doing these different levels is that everRun VM lets you select the availability protection for each VM. So you can match the right availability level for each application running in a VM. Why is that such a big deal? You get optimal availability price/performance because you only pay for the protection you need.

What is the pricing model?

everRun VM is available in two flavors. For companies that already have XenServer Enterprise, they can buy everRun VM for $2000 per physical server. If they don’t have a hypervisor installed, they can buy an integrated bundle from us that includes XenServer Enterprise Edition and everRun VM for $4500 per physical server. That’s for a perpetual license by the way. We’ve briefed a lot of channel partners and potential customers on the product and the pricing and more than once we’ve had them respond, “what’s the catch.” There is no catch. We’re just trying to make this so easy and inexpensive that companies don’t have to think to hard about adding it to XenServer.

What about geographically separated data centers, how does it work there?

everRunVM includes technology we developed for our other everRun products which we call SplitSite. SplitSiteenables geographically dispersed redundancy. Servers in the Citrix XenServer pair or pool can be located in different data centers or locations.

Our SplitSite software manages synchronization of the redundant XenServer servers over dual synchronization links that are standard gigabit Ethernet connections. With SplitSite, these synchronization links can be routed over a WAN. That means that even if a company gets hit with a site-wide outage, their applications that are protected by everRun VM are still available – and their data still protected - at a separate location.

When will everRun VM be GA?

everRun VM is advanced beta now and we expect it to be generally available by the end of April.

What about the LockStep option, when can we expect that?

The lockstep option is in development and scheduled for general availability in Q4 of this year. We’re looking forward to bringing that market. But we think the component-level fault tolerance we’re delivering with everRun VM next month is going to fit the bill for the vast majority of businesses, and the vast majority of their applications.

Comments