Friday 31 October 2008

What we learned from 1 million businesses in the cloud

The reliability of cloud computing has been a hot topic recently, partly because glitches in the cloud don't happen behind closed doors as with traditional on-premises solutions for businesses. Instead, when a small number of cloud computing users have problems, it makes headlines. As with most things at Google, we are fanatical about measuring the availability of Gmail, and we thought it best to simply share our reliability metrics, which we measure as average uptime per user based on server-side error rates. We think this reliability metric lets you do a true side-by-side comparison with other solutions.

We measure every server request for every user, every moment of every day. Any millisecond delay is logged. Over the last year, Gmail has been available more than 99.9 percent of the time — for everyone, both consumers and business users. The vast majority of people using Gmail have seen few issues, experienced no downtime, and have continued to have a great Gmail experience, with exception of an outage in August 2008. If you average all these data together, including the August outage, across the entire Gmail service, there has been an aggregate 10-15 minutes of downtime per month over the last year of providing the service. That 10-15 minutes per month average represents small delays of a couple of seconds here and there. A very small number of people have unfortunately been subject to some disruption of service that affected them for a few minutes or a few hours. For those users, we are very sorry. And for Google Apps Premier Edition customers, we have extended service level agreement credits to them.

So how does greater than 99.9 percent reliability compare to more conventional approaches for business email? We asked some experts. Naturally, the normal caveats apply for on-premises solutions, since each individual business environment will vary, depending on server reliability, staff response time, and actual maintenance schedules for each application.

According to the research firm Radicati Group, companies with on-premises email solutions averaged from 30 to 60 minutes of unscheduled downtime and an additional 36 to 90 minutes of planned downtime per month.1

Looking just at the unplanned outages that catch IT staffs by surprise, these results suggest Gmail is twice as reliable as a Novell GroupWise solution, and four times more reliable than a Microsoft Exchange-based solution that companies must maintain themselves. And higher reliability translates to higher employee productivity. Gmail's reliability jumps to more than four times as reliable as a GroupWise solution and 10 times more reliable than an Exchange-based solution if you factor in the planned outages inherent in on-premises messaging platforms. But this isn't the only way Google Apps helps businesses do more with their resources. Compared to the costs of Microsoft Exchange, IBM Lotus or Novell GroupWise — including software licensing, server expenses and the labor associated with deploying, maintaining and upgrading them on a regular basis — Google Apps leaves companies with much more time and money to focus on their real business.

We are now extending what we've learned from Gmail to the other applications in Google Apps.

Today, we're announcing that we will extend the 99.9 percent service level agreement we offer Premier Edition customers on Gmail to Google Calendar, Google Docs, Google Sites, and Google Talk. We have been delivering high levels of reliability across all these products, so it makes sense to extend our guarantees to them.

More than 1 million businesses have selected Google Apps to run their business, and tens of millions of people use Gmail every day. With this type of adoption, a disruption of any size — even a minor one affecting fewer than 0.003% of Google Apps Premier Edition users, like the one a few weeks ago — attracts a disproportional amount of attention. We've made a series of commitments to improve our communications with customers during any outages, and we have an unwavering commitment to make all issues visible and transparent through our open user groups.

Google is one of the 1 million businesses that run on Google Apps, and any service interruption affects our users and our business; our engineers are also some of our most demanding customers. We understand the importance of delivering on the cloud's promise of greater security, reliability and capability at lower cost. We are hugely thankful to our customers who drive us to become better every day.

1. The Radicati Group, 2008. "Corporate IT Survey – Messaging & Collaboration, 2008-2009"