Tuesday, February 24, 2009

A Mid-Course Correction for SQL Data Services

The Azure Services folks have decided that SQL Data Services (SDS) needs more relational attributes at the expense of the “simplicity” policy espoused by the original SQL Server Data Services (SSDS) team. First news about the change in direction came at the MSDN Developer Conference’s visit to San Francisco on 2/23/2009 in conjunction with 1105 Media’s Visual Studio Live! conference at the Hyatt Regency.

Gavin Clarke reports in 'Full' SQL Server planned for Microsoft's Azure cloud in a 2/23/2009 midnight (GMT) post to The Register:

[Microsoft] told The Reg it's working to add as many features as possible from SQL Server to its fledgling Azure Services Platform cloud as quickly as possible, following feedback.

General manager [of] developer and platform evangelism Mark Hindsbro said Microsoft hoped to complete this work with the first release of Azure, currently available as a Community Technology Preview (CTP). But he added that some features might be rolled into subsequent updates to Azure. Microsoft has not yet given a date for the first version of Azure, which was released as a CTP last October.

"We are still getting feedback from ISVs for specific development scenarios they want. Based on feedback we will prioritize features and get that out first," he said.

"The aim is to get that in the same ship cycle of the overall Azure platform but it might be that some of it lags a little big and comes short there after."

Hopefully, SDS hasn’t reached the point of no return.

Adding RDBMS Features Reverses Original Policy

Traditional relational databases don’t deliver the extreme scalability expected of cloud computing in general and Azure in particular. So SQL Server Data Services (SSDS) adopted a Entity-Attribute-Value (EAV) data model built on top of a customized version of SQL Server 2005 (not SQL Server 2008), as I reported in my Test-Drive SQL Server Data Services cover story for Visual Studio Magazine’s July 2008 issue.

SSDS architect Soumitra Sengupta posted Philosophy behind the design of SSDS and some personal thoughts to the S[S]DS Team blog on 6/26/2008. According to Soumitra, the first and foremost problems the team needed to solve were:

  1. Building a scale free, highly available consistent data service that is fault tolerant and self healing
  2. Building the service using low cost commonly available hardware
  3. Building a service that was also cheap to operate - lights out operation

The team favored simplicity at the expense of traditional relational database features, which potential users (such as me) expected .

Since we made an early decision to limit the number of hard problems we needed to solve, we decided that we would focus less on the features of the service but more on the quality of the service and the cost of standing up and running the service.  The less the service does we argued, the easier it would be for us to achieve our objectives.  In hindsight, this was probably one of the best decisions we made.  Istvan, Tudor and Nigel deserve special credit for keeping us focused on "less is better".

The result of this policy were schemaless EAV tables that offered flexible properties (property bags) in an Authority-Container-Entity (ACE) architecture that mystified .NET developers, who were then in the process of about-facing their mindset from traditional SQL queries to .NET 3.5’s Language Integrated Query (LINQ) constructs and object/relational mapping with LINQ to SQL and the Entity Framework. SSDS offered SOAP and REST data access protocols with a very limited query syntax.

The SSDS folks believed the simplified ACE construct made it easy for developers who weren’t database experts to create data-driven applications that used SSDS instead of Amazon Web Service’s SimpleDB or the Google App Engine as a scalable data store in the cloud.

Less wasn’t Better

Apparently, “less” didn’t turn out to be “better” when it comes to the .NET developers who are Azure’s target audience. Microsoft promotes the Azure Services Platform’s ability to leverage their Visual Studio 2008 expertise. VS 2008 is all about, ADO.NET, object/relational modeling (O/RM), and integration with SQL Server 200x with the SqlClient classes. SSDS’s REST interface didn’t even align with heavily promoted ADO.NET Data Services.

Gavin continues:

According to Hindsbro, partners want a full SQL Server database in the cloud. The current SQL Data Services (SDS), which became available last March, provides a lightweight and limited set of features. Prior to SDS, Microsoft's database service was called SQL Server Data Services.

"If you go there now you will find more rudimentary database experiences exposed. Not a lot of these apps would be interesting without a full database in the cloud, and that is coming," Hindsbro said.

He did not say what SQL Server features Microsoft would add to Azure, other than to say it'll include greater relational functionality.

Microsoft in a statement also did not provide specifics, but said it's "evolving SDS capabilities to provide customers with the ability to leverage a traditional RDBMS data model in a cloud-based environment. Developers will be able to use existing programming interfaces and apply existing investments in development, training, and tools to build their applications."

The pre-beta SDS restricts what users can do in a number of ways that make it hard to set up and manage and that are limit its usefulness in large deployments.

Gavin’s last paragraph is an understatement, to be charitable.

Less is Azure Table Services

Azure’s early testers are mystified by SDS’s overlap with Azure’s Table Service, which has a feature set that’s almost identical to SDS today, but is aligned with ADO.NET Data Services and its LINQ to REST queries.

Microsoft’s standard answer to Azure and SDS Forum questions, such as “The confusion here is why are there two different kinds of storage. Are they different?  If so why and if not what is the relation?” in the Azure Forum’s Difference between Azure Storage and SDS Storage thread and SDS Forum’s What Are the Advantages of SDS Over Table Storage Services with the StorageClient Helper Classes? thread is:

"SDS will provide scalable relational database as a service (today, Joins, OrderBy, Top...are supported) and as it evolves, we plan to support other features such as aggregates, schemas, fan-out queries, and so on.  SDS just like any other database also supports blobs.  SDS is for Unstructured, Semi, and Structured data, with a roadmap of having highly available relational capabilities."

Microsoft won’t reveal pricing for Azure services, but it’s clear that SDS is positioned as a value-added feature with premium per-hour and per GB storage charges compared with prices for renting plain old tables (POTs).

Early RDBMS Feature Promises

The SDS team began promising more SQL Server features shortly after releasing the SQL Server Data Services (SSDS) invitation-only CTP on 3/5/2008 at the MIX08 conference. Primary examples were optional schemas, full-text indexing and search, blob data type, ORDER BY clauses for queries, cross-container queries, transactions, JOINs, TOP(n), simplified backup and restore, and alignment of the REST API with ADO.NET Data Services.

The team delivered blobs, pseudo-JOINs, ORDER BY, and Take (but not Skip) by PDC 2008 (late October 2008) when the Azure invitation-only CTP released. My SQL Data Services (SDS) Update from the S[S]DS Team post of 10/27/2008 describes the new features in Sprint #5.

The SDS team will need to deliver all the promised features, and perhaps a few more, to justify a significant increase to service charges over those for Azure tables.

Competition from Amazon

In the meantime, Amazon Web Services (AWS) announced on 10/1/2008 that Amazon EC2 “will offer you the ability to run Microsoft Windows Server or Microsoft SQL Server … later this Fall.” My Amazon Adds SQL Server to Oracle and MySQL as EC2 Relational Database Management Systems post of 10/1/2009 has more details. Amazon announced support for IBM DB2 and Informix Dynamic Server in this IBM and AWS page on 2/11/2009.

EC2 currently supports Windows Server 2003 R2 and SQL Server 2005 Express and Standard editions. There’s no surcharge for the Express edition and the surcharges for the three instance types that offer SQL Server Standard edition are:

Instance Type Surcharge/Hour Surcharge/Year
Standard Large US$ 0.60 US$   5,256
Standard Extra Large US$ 1.20 US$ 10,512
High CPU Extra Large US$ 1.20 US$ 10,512

Note that SQL Server Standard isn’t available for the Standard Small instance type, which costs US$ 0.375 per hour less than Standard Large. If you don’t need Standard Large’s added capacity, the yearly surcharge increases by US$ 3,285.

You probably can’t beat AWS’s SimpleDB for low-cost usage and storage charges. Amazon now offers a simplified SQL subset for querying SimpleDB EAV tables.

Soumitra’s SQL Server Data Services (SSDS) is simple, but it is not SimpleDB post of 3/7/2008 claims that SSDS isn’t a SimpleDB-compete and concludes:

Underneath the hood, the service is running on SQL Server.  So the rich capabilities of our server software is all there.  We have chosen to expose a very simple slice of it for now.  As Nigel explained, we will be refreshing the service quite frequently as we understand our user scenarios better.  So you can expect to see more capabilities of the Data Platform to start showing up in our service over time.  What we announced here is just a starting point, our destination remains the extension of our Data Platform to the cloud.  I know you are asking "I need more details and a timeline".  As we on-board beta customers and get their feedback, we will be able to give you more details.

Whether or not SSDS is a SimpleDB-compete or not, I’m sure that the SDS Team would like to offer their product at a surcharge that’s competitive with Amazon’s for SQL Server Standard.

Silence Isn’t Golden

In the first few months of SSDS’s existence, the team posted frequently in to the S[S]DS Team Blog, but went silent after PDC 2008. I mentioned the lack of communication in my The SQL Data Services Team’s Recent Silence Isn’t Golden post of 1/3/2009.

Jeff Currier replied in a comment:

We've been a bit more silent than usual because the features we've been focusing on have been more of a operational nature (and therefore not customer facing). This should explain the recent silence (along with the holidays).

That might be an explanation, but it isn’t a very satisfactory one.

Dave Robinson posted SQL Data Services – What’s with the silence? today, presumably in response to The Register’s article:

Just wanted to drop a quick note. People are starting to question what’s going on in the SDS world and why we have been so silent. Well, to be honest, we have been so silent because the entire team has been heads down adding some new exciting features that customers have been demanding. Last year at Mix we told the world about SDS. This time around we will be unveiling some new features that are going to knock your socks off. So, that’s it for now. Just wanted to let everyone the team is alive and well and super excited for the road ahead. We are 3 weeks away from Mix so hang on just a little bit longer. Trust me, it’s worth it.

I’d like to know why it’s “worth it” to wait for MIX09 to find out what’s in store for SDS and when can we finally expect it.

What new features are going to knock my socks off?

6 comments:

mamund said...

FWIW, I think this about-face is a mistake. A solid non-relational data store is necessary for cloud applications. Azure Table storage (as it exists today and for the near term) will not meet that need.

Anonymous said...

Someone in Microsoft's SQL Server team got a night sweat about licensing revenue and killed the cleverness happening in Azure.

You expect nothing more from Microsoft. Why do something different, when they can trot out old re-hashed stuff and serve that up? Things like CouchDB and Key/Value DBs with Map/Reduce functionality are where cloud apps are going.

Entity Framework is diabolical. Why would anyone want to use that when a simple REST API can get you the data you need?

Microsoft pander to the lowest common denominator too often. Its why Google are eating them up and Amazon is blitzing them in Cloud. You have to innovate and run the risk of leaving some people who can't keep up behind. So what if some .Net 3.5 developers are too stupid to use HTTP Verbs, it is technological natural selection.

Anonymous said...

Hi the December Anonymous guy returns.

This is all so predictable but can Microsoft afford the luxury of the resetting the Azure development clock just now as the Google and Amazon ships power up to full speed. Meanwhile Microsoft’s software engineers are left stumbling incoherently on the beach spanners in hand, or more likely knowing Microsoft’s structure the Azure SDS engineers will be throwing those spanners at a rival internal team.

Why doesn’t Microsoft accept it has arrived very late at the Cloud party and instead launch a V1.0 database offering that simply matches the feature set of Google’s Datastore which is exactly what SDS should be.

The technology update to the existing SDS implementation, to achieve this, is not massive:

1 - Transactions within the scope of an Authority-Container, just like Google’s AppEngine.
2 - A simple sweet OR/M for VB.Net and C# that maps directly onto the functionality of SDS. Just like Google’s Datastore OR/M for AppEngine.
3 - Auto incrementing primary keys within Authority-Container scope.
4 - Declarative entity relationships modelled in code as per Google AppEngine entities.
5 - New-age cloud db query syntax built into the OR/M api. Just like Google has done with GQL.

There is no need to drop the entity/value storage concept just now, it works for Google’s Datastore and sidesteps the major engineering headache of rolling out a schema update into a 24x7 business app distributed around the planetary cloud.

Anonymous said...

Tricky one this!
As I understand it, SalesForce.com took Oracle and implemented a huge Property Bag in it to produce their extensible database that is the engine of their superb CRM application.

I thought that the SSDS team were on the same path, which boded very well for my plans to build a multi-tenanted SaaS application.

Now the Enterprise SQL Server community have got involved and look like they want to turn SDS into SQL Server.

Personally, I would'nt confuse the two. If you need SQL Server in the cloud, why not just take a copy of SQL Server and put it there? I hope those guys do not ruin a promissing Multi-Tenancy engine!

McGeeky said...

Microsoft's direction for Azure, as far as I understand, is based on customer feedback. Why would they ignore feedback from their customers? If the customers are calling for more enterprise features then that's what you give them.

Anonymous said...

Most of the posts here unfortunately miss the point. Microsoft has got 2 (I repeat TWO) cloud dbs. The one you need is Azure Storage. And yes, id does scale well and does support multi-tenancy and does allows all other things SDS provided and that is exactly the reason for this fork in direction which was long overdue.