Thursday, August 05, 2010

Windows Azure and Cloud Computing Posts for 8/4/2010+

A compendium of Windows Azure, Windows Azure Platform Appliance, SQL Azure Database, AppFabric and other cloud-computing articles.

AzureArchitecture2H_thumb3


Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:

To use the above links, first click the post’s title to display the single article you want to navigate.

Cloud Computing with the Windows Azure Platform published 9/21/2009. Order today from Amazon or Barnes & Noble (in stock.)

Read the detailed TOC here (PDF) and download the sample code here.

Discuss the book on its WROX P2P Forum.

See a short-form TOC, get links to live Azure sample projects, and read a detailed TOC of electronic-only chapters 12 and 13 here.

Wrox’s Web site manager posted on 9/29/2009 a lengthy excerpt from Chapter 4, “Scaling Azure Table and Blob Storage” here.

You can now freely download by FTP and save the following two online-only PDF chapters of Cloud Computing with the Windows Azure Platform, which have been updated for SQL Azure’s January 4, 2010 commercial release:

  • Chapter 12: “Managing SQL Azure Accounts and Databases”
  • Chapter 13: “Exploiting SQL Azure Database's Relational Features”

HTTP downloads of the two chapters are available for download at no charge from the book's Code Download page.

Azure Blob, Drive, Table and Queue Services

imageSee Black Hat Briefings USA 2010 distributed Grant Bugher’s Secure Use of Cloud Storage PDF whitepaper of July 2010, which covers Windows Azure Table storage and Amazon S3/SimpleDB security in the Cloud Security and Governance section below.

<Return to section navigation list> 

SQL Azure Database, Codename “Dallas” and OData

Wayne Walter Berry (@WayneBerry) posted Using SQL Azure for Session State on 8/4/2010:

image Hypertext Transfer Protocol (HTTP) is a stateless protocol; the advantage of a stateless protocol is that web servers do not need to retain information about users between requests. However, in some scenarios web site developers want to maintain state between page requests to provide consistency to the web application. To create state from a stateless protocol, ASP.NET has the concept of a session which is maintained from the user’s first request to their last request for that visit to the web site.

imageBy default, ASP.NET session is maintained in the RAM of the running web server. However, Windows Azure is a stateless platform, web role instances have no local storage; at any time the web role instance could be moved to a different server in the data center. When the web role instance is moved, the session state is lost. To have a perceived sense of state with a stateless protocol on a stateless web server, you need permanent server side storage that persists even if the web role instance is moved. In this article I will discuss how to use SQL Azure to create persistent storage for an ASP.NET session in Windows Azure.

SQL Azure is a perfect fit for maintaining session in Windows Azure, because there is already a SqlSessionStateStore; a Microsoft session state provider developed for on-premise SQL Server installations. The SQL server provider was developed for local IIS installations across multiple web servers in a web farm that wanted to maintain the user’s state across machines.

Creating the Tables

If we are going to use the SqlSessionStateStore provider on Windows Azure against SQL Azure, we are going to need to create the appropriate tables and stored procedures. Typically this would be done with the InstallSqlState.sql script that ships with the .NET framework (or Aspnet_regsql.exe's –sstype), however this script doesn’t work for SQL Azure, because of Transact-SQL differences. Instead we have to use a modified script (see download at the bottom of this blog post).

Here are the instructions to create the databases, stored procedures, and tables needed to store session state on SQL Azure

  1. Download a modified Transact-SQL script called ASPStateInstall.sql that will create the ASPState database.
  2. Execute the ASPStateInstall.sql script from SQL Server Management Studio on the master database, read more about connecting to SQL Azure with SQL Server Management Studio here.
  3. Reconnect SQL Server Management Studio to the ASPState database that you just created.
  4. Execute the InstallSqlState.sql script from the download from SQL Server Management Studio on the ASPState database.
Modifying the web.config

Next thing to do is modify the web.config so that Windows Azure uses SQL Azure as storage for the session state. Your web.config should look something like this:

<sessionState
  mode="SQLServer"
  sqlConnectionString="Server=tcp:...;Trusted_Connection=False;Encrypt=True;"
  cookieless="false"
  timeout="20"
  allowCustomSqlDatabase="true"
/>

Make sure to modify the sqlConnectionString to match the SQL Azure connection string from the SQL Azure Portal for the ASPState database. If you are trying this on an on-premise installation of IIS, the same modification to the web.config will work.

Doing the Clean Up

When installing ASP.NET SQL Session State Management provider with an on-premise SQL Server the install creates a job that the SQL Server Agent executes which cleans up the old session data. SQL Azure doesn’t have the concept of a SQL Server Agent; instead we can use a Windows Azure worker role to clean-up the SQL Azure database. For more information see our SQL Server Agent blog series (Part 1, Part 2, and Part 3). The InstallSqlState.sql script that you ran to setup the database contains a DeleteExpiredSessions. Trimming the expired sessions is as easy as calling this script from the worker role. Here is what the code looks like:

public override void Run()
{
    // This is a sample worker implementation. Replace with your logic.
    Trace.WriteLine("WorkerRole1 entry point called", "Information");

    while (true)
    {
        Thread.Sleep(60000);

        // Create a SqlConnection Class, the connection isn't established 
        // until the Open() method is called
        using (SqlConnection sqlConnection = new SqlConnection(
            ConfigurationManager.ConnectionStrings["ASPState"].
                ConnectionString))
        {
            try
            {
                // Open the connection
                sqlConnection.Open();

                SqlCommand sqlCommand = new SqlCommand(
                    "DeleteExpiredSessions", sqlConnection);

                sqlCommand.CommandType = 
                    System.Data.CommandType.StoredProcedure;

                sqlCommand.ExecuteNonQuery();
            }
            catch (SqlException)
            {
                // WWB: Don't Fail On SQL Exceptions, 
                // Just Try Again After the Sleep
            }
        }
    }
}

Make sure to add the ASPState connection string to the worker role’s app.config or the worker role will never completely initialize when you deploy to Windows Azure. Here is what it will look like:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <connectionStrings>
    <add name="ASPState" connectionString="Server=tcp:…;Trusted_Connection=False;Encrypt=True;"/>
  </connectionStrings>

If you cut and paste the code above, make sure to modify the connectionString attribute to match the SQL Azure connection string from the SQL Azure Portal.

Open attached fileAzureSession.zip


Dinakar Nethi’s four-page Sync Framework for SQL Azure whitepaper became available for download on 8/4/2010. From the summary:

imageSQL Azure Database is a cloud database service from Microsoft. SQL Azure provides Web-facing database functionality as a utility service. Cloud-based database solutions such as SQL Azure can provide many benefits, including rapid provisioning, cost-effective scalability, high availability, and reduced management overhead. This document is not intended to provide comprehensive information about Sync Framework. The intent is to provide best practices on synchronizing SQL Azure with SQL Server, and to supplement the information available at the links in the References section.

The whitepaper primarily addresses:

Guidelines for efficient Scoping:

    • Each scope has one thread allocated to it from the OS. So distributing the tables across multiple scopes will help parallelize the data migrations
    • Put static/changing at a very low rate tables in one scope and reduce their sync frequency
    • Group frequently changing tables in different scopes
    • Put logical related tables (Primary Key-Foreign key dependency or logical dependency) in one scope
    • Scopes that are only read on the client should be marked as download only as this streamlines the sync workflow and decreases sync times
    • It is better to minimize the number of clients that are in each scope with the best case being a different scope for each client.  This minimizes contention on the server and is ideal for the hub-spoke case where all changes flow through a single server vs. being synced between clients
    • Initialize via snapshots vs. full initialization wherever possible to improve initial sync time by an order of magnitude

For a detailed walkthrough of SQL Azure Data Sync, see my Synchronizing On-Premises and SQL Azure Northwind Sample Databases with SQL Azure Data Sync tutorial of 1/28/2010.


Franz Bouma compares LightSwitch and Visual Studio with Squire and Stratocaster guitars in his Microsoft LightSwitch: a [Squire] which will never be a Fender in this 8/4/2010 post:

image Yesterday, Microsoft announced a new Visual Studio tool: Microsoft LightSwitch. LightSwitch is a tool which allows you to create Line of Business (LoB) applications by using a visual tool, similar to Microsoft Access, although LightSwitch can also produce applications for the web and can pull data from various sources instead of its own build-in database.

imageLarge companies like Microsoft develop many products which will never see the light of day or will die in the first weeks after being released, that's life. A successful product has to appeal to a large enough audience and that audience has to be willing to pay money for the product (if it costs money), otherwise the market for the product is too small, it won't bring in enough money to cover the development costs and things will go rough from there. It doesn't have to be that a product directly generates money; it can be it generates money indirectly, for example because it stimulates its users to purchase additional products which cost more money and are from the same company, e.g. services, support, add-ons. Give a guy a car and he'll come back every day for gas.

What puzzles me with LightSwitch is: what's the target audience? Who is supposed to use this tool instead of another tool? Is this a tool to sell more Sharepoint licenses, more Azure licenses? I have no idea. The main problem is that there's some friction in the image of LightSwitch. Microsoft says LightSwitch is aimed at the tech-savvy non-developer who wants to create a LoB application without needing to hire a truck full of professional developers. In short: a tool for an amateur who wants to 'Do It Him/Herself'. The friction is in the level of knowledge a person apparently has to have: what's a database, what's a table, what's an entity, what's a screen, what's validation etc.. So is it really an amateur tool for amateurs or is it an amateur tool for professionals?

The 'Do It Yourself' remark is familiar: a lot of people try to fix things around the house themselves before they call in the pro's, and sometimes they even succeed wonderfully. These 'do-it-yourself' people buy off-the-shelve cheap powertools to help them with the job and if you close your eyes a bit, the end result looks OK, as if a professional did the work. However, how many of those 'do-it-yourself' people will successfully install a full electrical circuit in the house, or create a new bathroom, with bath, plumbing, fancy mirrors etc.? Not many, they'll call the professionals, who have different tools and different skills and don't create a dangerous train-wreck.

I didn't want to compare LightSwitch to an el-cheapo power-drill, so I have chosen a different metaphore: an electrical guitar. A beginner will buy a beginner's guitar. A professional will buy a professional's guitar. Let's look at two brand examples: [Squire] and Fender. pSquire] is a brand from Fender actually and under that brand, Fender sells el-cheapo knock-offs of its expensive equipment, like the [T]elecaster and the [S]tratocaster. A [Squire Stratocaster] costs below 200 [E]uros, a Fender USA made [S]tratocaster costs 1400+ [E]uros. Why's that? They both have 6 strings, pick-ups (the 'elements' below the strings) and produce sound, and look almost the same: what's the difference?

As an amateur rock-guitarist, I can only try to describe the difference, but I hope it will show you what I mean. I played on el-cheapo guitars for some time, maybe 2 years or so, and one day I was offered to play a couple of hours on a real Fender telecaster (which costs over 1300 [E]uros). I still can't believe the difference in sound that guitar made. It played like a dream, the sustain (the time a note continues to sound) was endless, the pickups were able to produce much deeper sound than I had ever heard from my [los]-cheapos. Did it make my own compositions at that time sound better (warmth, depth)? Yes absolutely. Did it make my compositions better? No. Did it make me a better guitar player? No.

An amateur guitarist will sound like an amateur guitarist, no matter the equipment. A professional guitarist will sound like a professional, no matter the equipment. Don't make the mistake that by using a more expensive guitar you suddenly are Jeff Kollman of Cosmosquad (one of the best guitarists in the world, see below): the notes you'll play perhaps sound better, but the overall music will still be at the amateur level.

Microsoft LightSwitch is a tool for amateurs to produce stuff amateurs will produce. It's a mistake to think the stuff produced with LightSwitch will be usable by professional developers later on to extend it / maintain it or will appeal to professionals. See LightSwitch as that el-cheapo [Squire] Telecaster: it looks like a real Fender Telecaster guitar, it produces guitar sound, but a professional will choose for the real deal, for reasons a professional understands. Is that bad or arrogant? No: a professional is a professional and knows his/her field and has skills an amateur doesn't have and therefore doesn't understand. In these videos on Youtube (Part 1 | Part 2) (12 minutes combined) Jeff Kollman / Cosmosquad is interviewed and plays a Fender Telecaster in a custom tuning. It's very advanced stuff, but it shows what a professional can do with a tool for professionals.

In guitar-land things are pretty much settled down, amateurs use amateur material/tools, professionals use professional material/tools. In developer-land, let's see it the same way. The only fear I have is that in a few years time, the world is 'blessed' with applications created by amateurs using a tool meant for amateurs and us professionals have to 'fix the problems'. You can't bend a [Squire] to become a Fender, it will stay a [Squire]: amateurs of the world, please do realize that.

C programmers similarly denigrated Visual Basic 1.0 when it arrived in May 1991. I wonder if Frans received an early copy of the Beta, which won’t be available to MSDN Subscribers until 8/23/2010. If not, his condemnation of the framework appears to me to be premature.


Gavin Clarke reported “Microsoft may submit its OData web data protocol for standards ratification, but seems eager to avoid the bruising it received on OOXML” in his OOXML and open clouds: Microsoft's lessons learned From conflation to inflation post of 8/3/2010 to The Register:

image Microsoft may submit its OData web data protocol for standards ratification, but seems eager to avoid the bruising it received on OOXML.

Jean Paoli, Microsoft's interoperability strategy general manager and one of the co-inventors of the original XML whose work ultimately went into OOXML, told The Reg that Microsoft might submit OData to the W3C or OASIS.

OData is on a list of both Microsoft and non-Microsoft technologies that the company is touting as one answer to moving data between clouds and providing interoperability. Data interoperability is a major cause of concern as the initial euphoria of the cloud evaporates leaving a headache of practical concerns such as: how do I move my data to a new cloud should I choose?

Paoli, speaking after Microsoft unveiled its four principles of cloud interoperability recently, said Microsoft and the industry should reuse existing standards as much as possible to solve interoperability problems.

He also believes, though, that the cloud will create new situations in the next three to five years that people can't currently foresee and that'll need new standards to manage. "With the cloud there's new scenarios we don't know about," Paoli said.

The last time Microsoft got involved with portability of data it was about document formats, and things turned nasty given that Microsoft is the biggest supplier of desktop productivity apps with Office, and SharePoint is increasingly Microsoft's back-end data repository.

While open sourcers, IBM, Red Hat, Sun Microsystems and others lined up to establish the Open Document Format (ODF) as an official standard, Microsoft predictably went its own way.

Rather than open Office to ODF, Microsoft instead proposed Office Open XML (OOXML) in a standards battle that saw accusations flying that Microsoft had loaded the local standards voting processes to force through OOXML so it wouldn't have to fully open up.

Then there were the real-world battles, as government bodies began to mandate they'd only accept documents using ODF. Things came to a head in the cradle of the American revolution, Massachusetts, which declared for ODF but then also accepted OOXML following intense political lobbying by Microsoft, while the IT exec who'd made the call for ODF resigned his post.

The sour grapes of ODF ratification, followed by the bitter pills of local politics, left people feeling Microsoft had deliberately fragmented data openness to keep a grip through Office.

Paoli was once one of Microsoft's XML architects who designed the XML capabilities of Office 2003, the first version of Office to implement OOXML. Today he leads a team of around 80 individuals who work with other Microsoft product groups on interoperability from strategy to coding.

What lessons did Microsoft lean from OOXML that it can apply to pushing data portability in the cloud?

"I think collaboration is important in general and communication," Paoli said. "I think we did a very poor job of communications a long time ago and I think we need to communicate better. People did not understand what we were trying to do [on OOXML]."

This time Paoli said that Microsoft is going into the standards bodies and open-source communities to discuss ways of working together on cloud interoperability, identity, and application deployment. Results from conversations will be posted back to a new Microsoft site listing those four principles of cloud interoperability here.

On OData, Paoli was keen to point out how the technology uses the existing and widely accepted HTTP, JSON, and AtomPub. "We want to deepen the conversation with the industry," he said.

For all the we're-all-in-this-together stuff, there's still a sense that Microsoft is promoting its own Azure cloud as much as trying to champion a common cause.

Microsoft's new site makes great play about how Windows Azure use HTTP, SOAP, and REST, that you can mount and dismount server drives in Azure clouds using NTFS, and the availability of GUI tools for Eclipse and Windows Azure SDKs for Java and Ruby. On the upstanding-citizen side, the site also lists the standards bodies in which Microsoft is participating.

Yet, concerning OData, things have a decidedly Microsoft feel.

OData might be "open" but it's Microsoft products — SharePoint 2010, SQL Azure, Windows Azure table storage and SQL reporting services — that mostly expose data as OData.

IBM's WebSphere does, too, and PHP, Java, JavaScript, and the iPhone can consume OData — but it looks like you're mostly moving data between Microsoft's applications and cloud services.

One major user of OData is Netflix, whose entire catalog is available in OData — Netflix is a premier customer of Microsoft technology already, using Silverlight on its site.

In a world of circular logic, Microsoft needs to get more applications and services using OData to justify Azure's use of it, while OData is important to help sell more copies of SharePoint 2010 on the basis of interoperability with the cloud — Microsoft's cloud, specifically.

Does this mean that Microsoft has an agenda — and should it be trusted? Trust is something Microsoft always has to work hard to achieve thanks to its history and periodic outbursts on things like patents in open source — a community Microsoft is courting to support Azure.

Paoli says skeptics will always exist, but today Microsoft is part of the community through work on things like Stonehenge at the Apache Software Foundation (ASF).

"I'm very, very pragmatic. It reminds me of when I moved from France and was hired by Microsoft... everyone was asking me: 'Hey, wow, is Microsoft really into XML? I said 'yeah'. There was always some skepticism, and that was 14 years ago. We implemented XML — we helped created the basic standards, we had a lot of partners.

"The best approach is to work with people pragmatically and work with people on technology issues and just move on."

He reckons, too, that Microsoft has learned its lessons about dealing with open sourcers — people it's relying on to deploy PHP and Ruby apps on Azure. Microsoft's mistake in the past was to conflate Linux and open source products and the developer community — a community Paoli said Microsoft feels at home in. And we know how much Microsoft loves developers, developers, developers.

"We know the world is a mixed IT environment — this is really ingrained in our thinking," Paoli claimed.

It's early days for cloud and Microsoft's role in shaping it, but the strategy sounds different.

Ten years ago, before OOXML, Microsoft decided it would lead with IBM a push to shape the future of web services, a foundation of cloud, with the WS-* specs. WS-* proved inflexible, and developers moved on to better technologies. On OOXML, Microsoft led again but was left looking isolated and awakard.

Microsoft's needs OData as much as it did the ideas behind WS-* and OOXML. This time, Microsoft seems to be searching for a subtler way to advance its cause.


John Alioto shows you Three ways to interact with SQL Azure … in this 8/3/2010 post (if screen captures are missing, click the placeholder which returns a HTTP 404 Not Found error, then click back to display the capture):

image There are several different tools that one can use to interact with a SQL Azure database.  Each tool has scenerios that it is best for and an audience to whom it will appeal.  Here are three along with some thoughts on each.

Method #1: Tried and True, SSMS [2008 R2]

imageThere has been plenty written on this already, so I’m not going to focus on it.  You download the tools, punch a hole in your SQL Azure firewall, and away you go …

image

This method is great for heavy-duty management and creation of databases.  It’s also great because it’s the tool we’re all most familiar with.  I’m just using the AdventureWorks sample for Azure which has a nice little installer and I can interact with my database as normal …

image

Method #2: Project Codename “Houston”

The team over at SQL Azure Labs has created a very nice Silverlight tool they are calling Houston.  This is a lighter weight tool than SSMS.  You can’t do all the database management that you can with the full Management Studio, but that’s okay as this is a tool more targeted at developers (which I am, so that’s good!)  You will see it has a great Silverlight interface that is easy to use (spinning cubes are hotness!)

image

You can select multiple rowsets, click to zoom (ctrl-click to zoom out), save queries and more.  You just have to get out of the habit of using F5 to execute your query! :)

image

Take a look at this blog post by my buddy Richard Seroter for more detailed walkthrough. [See my Test Drive Project Houston CTP1 with SQL Azure post (updated 7/31/2010) for an even more detailed walkthrough.]

Method #3: Quadrant

Quadrant is a graphical tool for manipulating data.  In order to get Quadrant, you need to download and install the SQL Server Modeling CTP – November 2009 (as of this writing, check for updates depending on when you read this.)  Quadrant is a very different data manipulation experience.  It is simple, beautiful and powerful.

In order to connect to a SQL Azure database with Quadrant, you create a new Session (File->New Session).  You need to look under the “More” drop down, as you must connect to SQL Azure with SQL Server Authentication

image

If all is well, you are greeted with a simple canvas upon which to manipulate your data.

image

The first thing you will notice is that this is a WPF application, so you have great things like Mouse Wheel in/out for zoom.  You can simply open your explorer and start dragging tables onto the canvas.  It’s quite an amazing experience – watch some videos about the UI and I think you will quickly see just how compelling this experience can be.

image

It remains to be seen in my mind who this application is for.  It certainly allows you to look and interact with data in a different way than the other two – perhaps a bit more right-brained.

There you have it, three very simple, very powerful ways to interact with your SQL Azure databases.

I haven’t tried Quadrant with SQL Azure, but will. John says he was raised in the Bay Area and lives in the East Bay. I wonder if he’s a member of San Francisco’s [in]famous Alioto clan (mayor, supervisor, attorney, et al.)

Update 8/5/2010: Oops. Mary Jo Foley reported “Microsoft is dropping Quadrant, a tool originally slated to be part its data-modeling platform, which was originally codenamed Oslo, and is revising its plans for its M data-modeling language” in her Another piece of Microsoft's Oslo modeling puzzle disappears post to ZDNet’s All About Microsoft blog.


Brian Harry shares his take on LightSwitch in his Announcing Visual Studio LightSwitch! post of 8/3/2010:

image Today at VSLive!, Jason Zander announced a new Visual Studio product called LightSwitch.  It’s been in the works for quite some time now, as you might imagine.  Beta 1 of LightSwitch will be available on August 23rd – I’ll post again with a link as soon as I have it.  You can check out this link to learn more: http://www.microsoft.com/visualstudio/lightswitch

imageBasically LightSwitch is a new tool to make building business applications easier than ever before.  It allows you to build local or browser hosted applications using SilverLight.  Your apps can run on premise or in the cloud.  In some ways, I draw an analogy with Microsoft Access in the sense that it is a radically simplified way to build business apps that enable you to have your app up and running within minutes or hours.

However, there are some key innovations.  For one, your app is, by default, architected for the future – scalability and the cloud.  Further when you hit the wall on the “simple” tool, which apps seem to do when the little departmental app suddenly becomes a smash hit, you have headroom.  LightSwitch IS a Visual Studio based product.  You can “go outside the box” and bring the full power of Visual Studio and Expression to bear to build a bullet proof, scalable app without throwing everything out and starting over.

LightSwitch provides a heavily “data-oriented” application design paradigm.  It can consume and mash-up external data in Sharepoint or SQL (including SQL Azure) and provides quick and easy way to build the UI around it.

It’s a really awesome way to get started, yet ensure that there’s no ceiling for your app.  If you find yourself automating a bunch of business processes, I strongly encourage you to give LightSwitch a try.

Jason’s blog has a nice walk through with a simple LightSwitch example: http://blogs.msdn.com/b/jasonz/archive/2010/08/03/introducing-microsoft-visual-studio-lightswitch.aspx.

Mike Taulty likened LightSwitch to Microsoft Access in his “Coming Soon”– Visual Studio LightSwitch of 8/3/2010:

image A new version of Visual Studio called “LightSwitch” was announced at the VS Live conference today by Jason Zander.

At the moment, I don’t have any deep details to write about here but the essence is around a productive tool for building business applications with Silverlight for both the browser and the desktop which takes in cloud options as well.

imageIt’s not necessarily targeted at every developer who’s building applications with .NET or Silverlight and undoubtedly there’s bound to be trade-offs that you make between [productivity/control] just as there are every time you adopt a [framework/toolset] but I think that this is a really interesting addition to Visual Studio.

It’s great to see Silverlight being used as the front-end here and ( as others have said ) in some ways the demos I’ve seen so far have a slight flavour of how Access was used to put together an app on top of a SQL store.

However, with the front end being Silverlight I can expect to run that cross-browser, cross-platform and in or out of the browser and, from the announcements, it looks like having SQL, SharePoint or SQL Azure data storage are key scenarios so I’ll perhaps ditch the “Access comparison” at that point Smile

Either way – the best way to work it all out will be to try it out and, with that in mind, there’s a beta coming later in the month.

In the meantime, to get a few more details I’ve found ( in descending order from more detail to less detail );

<Return to section navigation list> 

AppFabric: Access Control and Service Bus

No significant articles today.

<Return to section navigation list>

Live Windows Azure Apps, APIs, Tools and Test Harnesses

Gorka Sandowski resumes his cloud-logging series with Logs for Better Clouds - Part 9: Pay per Use of 8/4/2010:

Wow, quite a journey...

We spent time in articulating how and why logs contribute to building Trust between Cloud Providers and customers, paving the way for smoother and cleaner relationships between clients and provider.

This time, let's look at another specific use case for logs, Pay per Use enforcement, and continue bringing clarity and remove opacity from the Clouds.

Report for Billing Purposes - Pay per use
If you haven't read earlier parts of this "Logs for Better Clouds" series, we touched upon the reasons why pay-per-use is important in the context of Clouds.

Predictive rightsizing is a difficult if not impossible exercise that represents a barrier of entry for Cloud adoption. It implies either paying too much for unnecessary service, or risking Denial of Service in case we need an extra oomph.

Pay per use requires a granularity so fine that it allows visibility on exactly what resources were consumed.  The Cloud Provider then charges for these resources, not more and not less.

Looking at a back to back scenario, an organization could then use these reports in order to charge back internal organizations based on their usage.  In case of dispute, further reports based on raw logs would be available to demonstrate specific usage.

Pay-per-Use is a promise that can be fulfilled through the use of logs and log reports.

Instead of having to deploy countless number of specialized tools to monitor, follow, track and report on these minute uses of virtual IT resources, Cloud Providers can rely on the ease of deployment, ease of use and accuracy of Log Management tools.

The figure below represents an actual report from one of LogLogic's customer that shows usage of an application on a per BU Business Unit for billing purposes.  The report was generated based on raw logs, these being available to clients via search features if accuracy of the report needs to be validated and in case of dispute.

Figure 7 – Report showing actual usage for pay-per-use billing purposes

Another example is pay-per-use VM Vulnerability Management Cloud Providers. The billing charge could be based on the total number of vulnerability tests performed, which is a combination of the number of IP tested and the number of tests performed for each IP address. It is not necessarily easy to calculate this; the number of vulnerability tests depends on the type of OS, hence on the IP address, as well as the date and time in which these tests were performed, because the number of vulnerabilities changes in time and so do the number of tests.

In this scenario, a SaaS Provider properly managing logs will be able to charge their clients based on the exact number of vulnerability tests performed across the board, and provide reports to support the invoice generated.

In case of doubt, a client can always find out when an IP was scanned, and each of the tests performed through a report that details and singles out the corresponding logs.

Another example is when a SaaS CRM is able to charge enterprises for the exact usage, based on transactions and storage, users and reports. With log reports supporting all that data so that enterprises know they pay exactly for what they consumed, not more, not less.

There are countless examples of IaaS Infrastructure as a Service, PaaS Platform as a Service, and SaaS Software/Security as a Service Pay per Use opportunities.

No matter what you are asking a Provider to do for you, make sure that you get indisputable proof of your consumption in the form of reports based on logs.

Return to section navigation list> 

Windows Azure Infrastructure

Jacques Bughin, Michael Chui, and James Manyika prefaced their Clouds, big data, and smart assets: Ten tech-enabled business trends to watch for the McKinsey Quarterly’s August 2010 issue on 4/3/2010 with “Advancing technologies and their swift adoption are upending traditional business models. Senior executives need to think strategically about how to prepare their organizations for the challenging new environment.” From the introduction:

image Two-and-a-half years ago, we described eight technology-enabled business trends that were profoundly reshaping strategy across a wide swath of industries.1 We showed how the combined effects of emerging Internet technologies, increased computing power, and fast, pervasive digital communications were spawning new ways to manage talent and assets as well as new thinking about organizational structures.

imageSince then, the technology landscape has continued to evolve rapidly. Facebook, in just over two short years, has quintupled in size to a network that touches more than 500 million users. More than 4 billion people around the world now use cell phones, and for 450 million of those people the Web is a fully mobile experience. The ways information technologies are deployed are changing too, as new developments such as virtualization and cloud computing reallocate technology costs and usage patterns while creating new ways for individuals to consume goods and services and for entrepreneurs and enterprises to dream up viable business models. The dizzying pace of change has affected our original eight trends, which have continued to spread (though often at a more rapid pace than we anticipated), morph in unexpected ways, and grow in number to an even ten.2

The rapidly shifting technology environment raises serious questions for executives about how to help their companies capitalize on the transformation under way. Exploiting these trends typically doesn’t fall to any one executive—and as change accelerates, the odds of missing a beat rise significantly. For senior executives, therefore, merely understanding the ten trends outlined here isn’t enough. They also need to think strategically about how to adapt management and organizational structures to meet these new demands.

For the first six trends, which can be applied across an enterprise, it will be important to assign the responsibility for identifying the specific implications of each issue to functional groups and business units. The impact of these six trends—distributed cocreation, networks as organizations, deeper collaboration, the Internet of Things, experimentation with big data, and wiring for a sustainable world—often will vary considerably in different parts of the organization and should be managed accordingly. But local accountability won’t be sufficient. Because some of the most powerful applications of these trends will cut across traditional organizational boundaries, senior leaders should catalyze regular collisions among teams in different corners of the company that are wrestling with similar issues.

Three of the trends—anything-as-a-service, multisided business models, and innovation from the bottom of the pyramid—augur far-reaching changes in the business environment that could require radical shifts in strategy. CEOs and their immediate senior teams need to grapple with these issues; otherwise it will be too difficult to generate the interdisciplinary, enterprise-wide insights needed to exploit these trends fully. Once opportunities start emerging, senior executives also need to turn their organizations into laboratories capable of quickly testing and learning on a small scale and then expand successes quickly. And finally the tenth trend, using technology to improve communities and generate societal benefits by linking citizens, requires action by not just senior business executives but also leaders in government, nongovernmental organizations, and citizens.

Across the board, the stakes are high. Consider the results of a recent McKinsey Quarterly survey of global executives3 on the impact of participatory Web 2.0 technologies (such as social networks, wikis, and microblogs) on management and performance. The survey found that deploying these technologies to create networked organizations that foster innovative collaboration among employees, customers, and business partners is highly correlated with market share gains. That’s just one example of how these trends transcend technology and provide a map of the terrain for creating value and competing effectively in these challenging and uncertain times.

Register to continue.

Thomas Erl continues his SOA Principles and Patterns series with Cloud Computing, SOA and Windows Azure - Part 2 posted 8/4/2010:

Windows Azure Platform Overview
The Windows Azure platform is an Internet-scale cloud computing services platform hosted in Microsoft data centers. Windows tools provide functionality to build solutions that include a cloud services operating system and a set of developer services. The key parts of the Windows Azure platform are:

  • Windows Azure (application container)
  • Microsoft SQL Azure
  • Windows Azure platform AppFabric

The infrastructure and service architectures that underlie many of these native services (as well as cloud-based services in general) are based on direct combined application of Stateful Services [786] and Redundant Implementation [766]. This is made possible by leveraging several of the built-in extensions and mechanisms provided by the Windows Azure platform (as explained in this chapter and Chapter 16).

imageThe Windows Azure platform is part of the Microsoft cloud, which consists of multiple categories of services:

  • Cloud-based applications: These are services that are always available and highly scalable. They run in the Microsoft cloud that consumers can directly utilize. Examples include Bing, Windows Live Hotmail, Office Live, etc.
  • Software services: These services are hosted instances of Microsoft's enterprise server products that consumers can use directly. Examples include Exchange Online, SharePoint Online, Office Communications Online, etc.
  • Platform services: This is where the Windows Azure platform itself is positioned. It serves as an application platform public cloud that developers can use to deploy next-generation, Internet-scale, and always available solutions.
  • Infrastructure services: There is a limited set of elements of the Windows Azure platform that can support cloud-based infrastructure resources.

Figure 3 illustrates the service categories related to the Windows Azure platform. Given that Windows Azure is itself a platform, let's explore it as an implementation of the PaaS delivery model.

Figure 3: A high-level representation of categories of services available in the Windows Azure cloud

The Windows Azure platform was built from the ground up using Microsoft technologies, such as the Windows Server Hyper-V-based system virtualization layer. However, the Windows Azure platform is not intended to be just another off-premise Windows Server hosting environment. It has a cloud fabric layer, called the Windows Azure Fabric Controller, built on top of its underlying infrastructure.

The Windows Azure Fabric Controller pools an array of virtualized Windows Server instances into a logical entity and automatically manages the following:

  • Resources
  • Load balancing
  • Fault-tolerance
  • Geo-replication
  • Application lifecycle

These are managed without requiring the hosted applications to explicitly deal with the details. The fabric layer provides a parallel management system that abstracts the complexities in the infrastructure and presents a cloud environment that is inherently elastic. As a form of PaaS, it also supports the access points for user and application interactions with the Windows Azure platform.

The Windows Azure platform essentially provides a set of cloud-based services that are symmetric with existing mainstream on-site enterprise application platforms (see Figure 4).

Figure 4: An overview of common Windows Azure platform capabilities

For example:

  • Storage services: A scalable distributed data storage system that supports many types of storage models, including hash map or table-like structured data, large binary files, asynchronous messaging queues, traditional file systems, and content distribution networks
  • Compute services: Application containers that support existing mainstream development technologies and frameworks, including .NET, Java, PHP, Python, Ruby on Rails, and native code.
  • Data services: Highly reliable and scalable relational database services that also support integration and data synchronization capabilities with existing on-premise relational databases
  • Connectivity services: These are provided via a cloud-based service bus that can be used as a message intermediary to broker connections with other cloud-based services and services behind firewalls within on-premise enterprise environments
  • Security services: Policy-driven access control services that are federation-aware and can seamlessly integrate with existing on-premise identity management systems
  • Framework services: Components and tools that support specific aspects and requirements of solution frameworks
  • Application services: Higher-level services that can be used to support application development, such as application and data marketplaces

All of these capabilities can be utilized individually or in combination.

Windows Azure (Application Container)
Windows Azure serves as the development, service hosting, and service management environment. It provides the application container into which code and logic, such as Visual Studio projects, can be deployed. The application environment is similar to existing Windows Server environments. In fact, most .NET projects can be deployed directly without significant changes.

A Windows Azure instance represents a unit of deployment, and is mapped to specific virtual machines with a range of variable sizes. Physical provisioning of the Windows Azure instances is handled by the cloud fabric. We are required only to specify, by policy, how many instances we want the cloud fabric to deploy for a given service.

We have the ability to manually start and shut down instances, and grow or shrink the deployment pool; however, the cloud fabric also provides automated management of the health and lifecycles of instances. For example, in the event of an instance failure, the cloud fabric would automatically shut down the instance and attempt to bring it back up on another node.

Windows Azure also provides a set of storage services that consumers can use to store and manage persistent and transient data. Storage services support geo-location and offer high durability of data by triple-replicating everything within a cluster and across data centers. Furthermore, they can manage scalability requirements by automatically partitioning and load balancing services across servers.

Also supported by Windows Azure is a VHD-based deployment model as an option to enable some IaaS requirements. This is primarily geared for services that require closer integration with the Windows Server OS. This option provides more control over the service hosting environment and can better support legacy applications.
Services deployed within Windows Azure containers and made available via Windows Azure instances establish service architectures that, on the surface, resemble typical Web service or REST service implementations. However, the nature of the back-end processing is highly extensible and scalable and can be further subject to various forms of Service Refactoring [783] over time to accommodate changing usage requirements. This highlights the need for Windows Azure hosted services to maintain the freedom to be independently governed and evolved. This, in turn, places a greater emphasis on the balanced design of the service contract and its proper separation as part of the overall service architecture.

Specifically, it elevates the importance of the Standardized Service Contract (693), Service Loose Coupling (695), and Service Abstraction (696) principles that, through collective application, shape and position service contracts to maximize abstraction and cross-service standardization, while minimizing negative forms of consumer and implementation coupling. Decoupled Contract [735] forms an expected foundation for Windows Azure-hosted service contracts, and there will generally be the need for more specialized contract-centric patterns, such as Validation Abstraction [792], Canonical Schema [718], and Schema Centralization [769].

SQL Azure
imageSQL Azure is a cloud-based relational database service built on SQL Server technologies that exposes a fault-tolerant, scalable, and multi-tenant database service. SQL Azure does not exist as hosted instances of SQL Server. It also uses a cloud fabric layer to abstract and encapsulate the underlying technologies required for provisioning, server administration, patching, health monitoring, and lifecycle management. We are only required to deal with logical administration tasks, such as schema creation and maintenance, query optimization, and security ­management.

In addition to reliability and scalability improvements, SQL Azure's replication mechanism can be used to apply Service Data Replication [773] in support of the Service Autonomy (699) principle. This is significant, as individual service autonomy within cloud environments can often fluctuate due to the heavy emphasis on shared resources across pools of cloud-based services.

A SQL Azure database instance is actually implemented as three replicas on top of a shared SQL Server infrastructure managed by the cloud fabric. This cloud fabric delivers high availability, reliability, and scalability with automated and transparent replication and failover. It further supports load-balancing of consumer requests and the synchronization of concurrent, incremental changes across the replicas. The cloud fabric also handles concurrency conflict resolutions when performing bi-directional data synchronization between replicas by using built-in policies (such as last-writer-wins) or custom policies.

Because SQL Azure is built on SQL Server, it provides a familiar relational data model and is highly symmetric to on-premise SQL Server implementations. It supports most features available in the regular SQL Server database engine and can also be used with tools like SQL Server 2008 Management Studio, SQLCMD, and BCP, and SQL Server Integration Services for data migration.

Windows Azure Platform AppFabric
image In Chapter 7, as part of our coverage of .NET Enterprise Services, we introduced Windows Server AppFabric. This represents the version of AppFabric that is local to the Windows Server environment. Windows Azure platform AppFabric (with the word "platform" intentionally not capitalized), is the cloud-based version of AppFabric that runs on Windows Azure.

Windows Azure platform AppFabric helps connect services within or across clouds and enterprises. It provides a Service Bus for connectivity across networks and organizational boundaries, and an Access Control service for federated authorization as a ­service.

The Service Bus acts as a centralized message broker in the cloud to relay messages between services and service consumers. It has the ability to connect to on-premise services through firewalls, NATs, and over any network topology.

Its features include:

  • Connectivity using standard protocols and standard WCF bindings
  • Multiple communication models (such as publish-and-subscribe, one-way messaging, unicast and multicast datagram distribution, full-duplex bi-directional connection-oriented sessions, peer-to-peer sessions, and end-to-end NAT traversal)
  • Service endpoints that are published and discovered via Internet-accessible URLs
  • Global hierarchical namespaces that are DNS and transport-independent
  • Built-in intrusion detection and protection against denial-of-service attacks

The Windows Azure Service Bus complies to the familiar Enterprise Service Bus [741] compound pattern, and focuses on realizing this pattern across network, security, and organizational domains. Service Bus also provides a service registry to provide registration and discovery of service metadata, which allows for the application of Metadata Centralization [754] and emphasizes the need to apply the Service Discoverability (702) principle.

Access Control acts as a centralized cloud-based security gateway that regulates access to cloud-based services and Service Bus communications, while integrating with standards-based identity providers (including enterprise directories such as Active Directory and online identity systems like Windows Live ID). Access Control and other Windows Azure-related security topics are covered in Chapter 17.

Unlike Windows Azure and SQL Azure, which are based on Windows Server and SQL Server, Access Control Service is not based on an existing server product. It uses technology included in Windows Identity Foundation and is considered a purely cloud-based service built specifically for the Windows Azure platform environment.

Summary of Key Points

  • The Windows Azure platform is primarily a PaaS deployed in a public cloud managed by Microsoft.
  • Windows Azure platform provides a distinct set of capabilities suitable for building scalable and reliable cloud-based services.
  • The overall Windows Azure platform further encompasses SQL Azure and Windows Azure platform AppFabric.

This excerpt is from the book, "SOA with .NET & Windows Azure: Realizing Service-Orientation with the Microsoft Platform", edited and co-authored by Thomas Erl, with David Chou, John deVadoss, Nitin Ghandi, Hanu Kommapalati, Brian Loesgen, Christoph Schittko, Herbjörn Wilhelmsen, and Mickie Williams, with additional contributions from Scott Golightly, Daryl Hogan, Jeff King, and Scott Seely, published by Prentice Hall Professional, June 2010, ISBN 0131582313, Copyright 2010 SOA Systems Inc. For a complete Table of Contents please visit: www.informit.com/title/0131582313

See Windows Azure and Cloud Computing Posts for 7/27/2010+ for Part 1 of this series. For a complete list of the co-authors and contributors, see the end of that article.


Lori MacVittie (@lmacvittie) observed An impassioned plea from a devops blogger and a reality check from a large enterprise highlight a growing problem with devops evolutions – not enough dev with the ops to set up her  Will DevOps Fork? post of 8/4/2010 to F5’s DevCentral blog:

John E. Vincent offered a lengthy blog on a subject near and dear to his heart recently: devops. His plea was not to be left behind as devops gains momentum and continues to barrel forward toward becoming a recognized IT discipline. The problem is that John, like many folks, works in an enterprise. An enterprise in which not only the existence of legacy and traditional solutions require a bit more ingenuity to integrate but in which the imposition of regulations breaks the devops ability to rely solely on script-based solutions to automate operations.

quote-left The whole point of this long-winded post is to say "Don't write us off". We know. You're preaching to the choir. It takes baby steps and we have to pursue it in a way that works with the structure we have in place. It's great that you're a startup and don't have the legacy issues older companies have. We're all on the same team. Don't leave us behind.

John E. Vincent, “No operations team left behind - Where DevOps misses the mark

But it isn’t just legacy solutions and regulations slowing down the mass adoption of cloud computing and devops, it’s the sheer rate of change that can be present in very large enterprise operations.

blockquote Scripted provisioning is faster than the traditional approach and reduces technical and human costs, but it is not without drawbacks. First, it takes time to write a script that will shift an entire, sometimes complex workload seamlessly to the cloud. Second, scripts must be continually updated to accommodate the constant changes being made to dozens of host and targeted servers.

"Script-based provisioning can be a pretty good solution for smaller companies that have low data volumes, or where speed in moving workloads around is the most important thing. But in a company of our size, the sheer number of machines and workloads we need to bring up quickly makes scripts irrelevant in the cloud age," said Jack Henderson, an IT administrator with a national transportation company based in Jacksonville, Fla.

More aggressive enterprises that want to move their cloud and virtualization projects forward now are looking at more advanced provisioning methods

Server provisioning methods holding back cloud computing initiatives

Scripting, it appears, just isn’t going to cut it as the primary tool in devops toolbox. Not that this is any surprise to those who’ve been watching or have been tackling this problem from the inevitable infrastructure integration point of view. In order to accommodate policies and processes specific to regulations and simultaneously be able to support a rapid rate of change something larger than scripts and broader than automation is going to be necessary.

You are going to need orchestration, and that means you’re going to need integration. You’re going to need Infrastructure 2.0.

AUTOMATED OPERATIONS versus DATA CENTER ORCHESTRATION

In order to properly scale automation along with a high volume of workload you need more than just scripted automation. You need collaboration and dynamism, not infra20defcodified automation and brittle configurations. What scripts provide now is the ability to configure (and update) the application deployment environment and – if you’re lucky – piece of the application network infrastructure on an individual basis using primarily codified configurations. What we need is twofold. First, we need to be able to integrate and automate all applicable infrastructure components. This becomes apparent when you consider the number of network infrastructure components upon which an application relies today, especially those in a highly virtualized or cloud computing environment.

Second is the ability to direct a piece of infrastructure to configure itself based on a set of parameters and known operational states – at the time it becomes active. We don’t want to inject a new configuration every time a system comes up, we want to modify on the fly, to adapt in real-time, to what’s happening in the network, in the application delivery channel, in the application environment. While it may be acceptable (and it isn’t in very large environments but may be acceptable in smaller ones) to use a reset approach, i.e. change and reboot/reload a daemon to apply those changes, this is not generally an acceptable approach in the network. Other applications and infrastructure may be relying on that component and rebooting/resetting the core processes will interrupt service to those dependent components. The best way to achieve the goal desired – real-time management – is to use the APIs provided to do so.

On top of that we need to be able to orchestrate a process. And that process must be able to incorporate the human element if necessary, such as may be the case with regulations that require approvals or “sign-offs”. We need solutions that are based on open-standards and integrate with one another in such a way as to make it possible to arrive at a solution that can serve an organization of any size and any age and at any stage in the cloud maturity model.

Right now devops is heading down a path that relegates it to little more than automation operators, which is really not all that much different than what the practitioners were before. Virtualization and cloud computing have simply raised their visibility due to the increased reliance on automation as a means to an end. But treating automated operations as the end goal completely eliminates the “dev” in “devops” and ignores the concept of an integrated, collaborative network that is not a second-class citizen but a full-fledged participant in the application lifecycle and deployment process. That concept is integral to the evolution of highly virtualized implementations toward a mature, cloud-based environment that can leverage services whether they are local, remote, or a combination of both. Or that change from day to day or hour to hour based on business and operational conditions and requirements.  image

DEVOPS NEEDS to MANAGE INFRASTRUCTURE not CONFIGURATIONS

The core concept behind infrastructure 2.0 is collaboration between all applicable constituents – from the end user to the network to the application infrastructure to the application itself. From the provisioning systems to the catalog of services to the billing systems. It’s collaborative and dynamic, which means adaptive and able to change the way in which policies are applied – from routing to switching to security to load balancing – based on the application and its right-now needs. Devops is – or should be - about enabling that integration. If that can be done with a script, great. But the reality is that a single script or set of scripts that focus on the automation of components rather than systems and architectures is not going to scale well and will instead end up contributing to the diseconomy of scale that was and still is the primary driver behind the next-generation network.

Scripts are also unlikely to address the very real need to codify the processes that drive an enterprise deployment, and do not take into consideration the very real possibility that a new deployment may need to be “backed-out” if something goes wrong. Scripts are too focused on managing configurations and not focused enough on managing the infrastructure. It is the latter that will ultimately provide the most value and the means the which the network will be elevated to a first class citizen in the deployment process.

Infrastructure 2.0 is the way in which organizations will move from aggregation to automation and toward the liberation of the data center based on full stack interoperability and portability. A services-based infrastructure that can be combined to form a dynamic control plane that allows infrastructure services to be integrated into the processes required to not just automate and ultimately orchestrate the data center, but to do so in a way that scales along with the implementation.

Collaboration and integration will require development, there’s no way to avoid that. This should be obvious from the reliance on APIs (Application Programming Interface), a term which if updated to reflect today’s terminology would be called an ADI (Application Development Interface). Devops needs to broaden past “ops” and start embracing the “dev” as a means to integrate and enable the collaboration necessary across the infrastructure to allow the maturation of emerging data center models to continue. If the ops in devops isn’t balanced with dev, it’s quite possible that like many cross-discipline roles within IT, the concept of devops may have to fork in order to continue moving virtualization and cloud computing down its evolutionary path.

<Return to section navigation list> 

Windows Azure Platform Appliance 

David Linthicum claims “Data centers will spend double on server hardware by 2014 to power private clouds, and mobile usage will also boost server investments” in a preface to his Why cloud adoption is driving hardware growth, not slowing it post of 8/4/2010 to InfoWorld’s Cloud Computing blog:

image The move to cloud computing is driving significant spending on data center hardware to support businesses' private cloud initiatives, says IDC. In fact, private cloud hardware spending will draw public cloud hardware spending, IDC predicts. IDC also forecasts that server hardware revenue for public cloud computing will grow from $582 million in 2009 to $718 million in 2014, and server hardware revenue for the larger private cloud market will grow from $2.6 billion to $5.7 billion in the same period.

image The growth in private cloud computing hardware revenue is not surprising. Survey after survey has shown that enterprises moving to cloud computing are looking to move to private clouds first, which means many new boxes of servers are showing up in the lobby to build these private clouds. That said, I suspect some of these so-called private clouds are just relabeled traditional data center and won't have many built-in cloud computing features beyond simple virtualization. Cloudwashing comes to the data center.

An irony in all this is that cloud computing may drive up the number of servers in the data center, even though many organizations are looking to cloud computing to reduce the hardware footprint in that same area. But building all those new cloud services, both public and private, means building the platforms to run them. Ultimately, the adoption of cloud computing could diminish the number of servers deployed in proportion to the number of users served, but that won't happen until the late 2010s or even early 2020s.

Also driving this server growth is the increase in mobile platforms and applications, which are almost always based in the cloud. In all likelihood, mobile may drive much of the server hardware growth in the next two years.

Finally, the number of VC dollars driving new cloud computing startups will boost server sales in 2011 and beyond. Although many startups will use existing clouds for their infrastructure, such as the offerings from Amazon.com and Google, I suspect a significant number of the differentiated startups will have their own data center and server farms.

Just when you thought it was time to sell your hardware stocks due to the rise of cloud computing, the trend upends your expectations.

<Return to section navigation list> 

Cloud Security and Governance

Chris Hoff (@Beaker) makes suggestions If You Could Have One Resource For Cloud Security… in this 8/4/2010 post:

I got an interesting tweet sent to me today that asked a great question:I thought about this and it occurred to me that while I would have liked to have answered that the Cloud Security Alliance Guidance was my first choice, I think the most appropriate answer is actually the following:

imageCloud Security and Privacy: An Enterprise Perspective on Risks and Compliance”  by Tim Mather, Subra Kumaraswamy, and Shahed Latif is an excellent overview of the issues (and approaches to solutions) for Cloud Security and privacy. Pair it with the CSA and ENISA guidance and you’ve got a fantastic set of resources. 

I’d also suggest George Reese’s excellent book “Cloud Application Architectures: Building Applications and Infrastructure in the Cloud

I suppose it’s only fair to disclose that I played a small part in reviewing/commenting on both of these books prior to being published.


Black Hat Briefings USA 2010 distributed Grant Bugher’s Secure Use of Cloud Storage PDF whitepaper of July 2010, which covers Windows Azure Table storage and Amazon S3/SimpleDB security. From the Executive Summary and Introduction:

Executive Summary

imageCloud storage systems like those offered by Microsoft Windows Azure and Amazon Web Services provide the ability to store large amounts of structured or unstructured data in a way that promises high levels of availability, performance, and scalability. However, just as with traditional data storage methods such as SQL‐based relational databases, the
interfaces to these data storage systems can exploited by an attacker to gain unauthorized access if they are not used correctly.

image The query strings used by cloud providers of tabular data make use of query strings that are subject to SQL Injection‐like attacks, while the XML transport interfaces of these systems are themselves subject to injection in some circumstances.

In addition, cloud‐based databases can still be used for old attacks like persistent cross‐site scripting. Using cloud services to host public and semi‐public files may introduce new information disclosure vulnerabilities. Finally, owners of applications must ensure that the application’s cloud storage endpoints are adequately protected and do not allow
unauthorized access by other applications or users.

Luckily for developers, modern development platforms offer mitigations that can make use of cloud services much safer.

Conducting database access via frameworks like Windows Communication Foundation and SOAP toolkits can greatly reduce the opportunity for attacks, and cloud service providers themselves are beginning to offer multifactor authentication and other protections for the back‐end databases. Finally, traditional defense‐in‐depth measures like input validation and output encoding remain as important as ever in the new world of cloud‐based data.

Introduction

This paper covers background on cloud storage, an overview of database attacks, and specific examples using two major cloud storage APIs – Windows Azure Storage and Amazon S3/SimpleDB – of exploitable and non‐exploitable applications. The purpose of this paper is to teach developers how to safely leverage cloud storage without creating vulnerable applications.

Grant Bugher is Lead Security Program Manager, Online Services Security and Compliance
Global Foundation Services, Microsoft Corporation.


Bruce Maches posted Validation of Public Cloud Infrastructure: Satisfying FDA Requirements While Balancing Risk Vs. Reward to HPC in the Clouds’ Behind the Cloud blog on 8/3/2010:

image In a prior post I provided an overview of the 21 CFR Part 11 validation guidelines and the impact of these requirements on the validation of public cloud infrastructure services. The main thrust of that post was discussing how current industry validation practices are a potential impediment to the full-scale adoption of cloud computing in the life sciences. Especially in regards to the use of public cloud infrastructure related services for applications coming under Part 11 guidelines. See my May 5th post for additional background. I have received some feedback on potential approaches to validating public cloud based applications and thought I would provide some additional thoughts here.

image The key word here is trust – if I am an FDA auditor how do I know I can trust the installation, operation and output of a particular system? The actual Part 11 compliance process for any application includes the hardware, software, operational environment and support processes for the system itself. This allows an IT group to answer the questions:

  • Can I prove the entire system (hardware, software) was installed correctly?
  • Can I prove the system is operating correctly?
  • Can I prove the system is performing correctly to meet the user requirements as stated in the Design Qualification documents?
  • Can I prove that the system environment is properly maintained by people with the requisite skills and that all changes are being properly documented?

The validation of public cloud offerings revolves primarily around the first and last question above. How do I ensure that the overall environment was designed, implemented and maintained per Part 11 guidelines? If a life science company wanted to leverage public cloud computing for validated applications it would have to take a hard look at the risks vs. rewards and develop a strategy for managing those risks while ensuring that the advantages of leveraging public cloud could can still be realized.

There are several steps a company can take to start down this path. The initial step would be to develop an internal strategy and supporting processes for how the organization is planning to meet Installation and Operational Qualification (IQ & OQ) portions of the Part 11 guidelines in a cloud environment. The strategy would be incorporated into the overall Validation Master Plan (VMP). This plan is the first stop for any auditor as it spells out the organizations overall validation strategy as to what systems will require validation and how that will be performed. For validating public cloud the VMP would need to address at a minimum such topics as:

  • The actual hardware (IQ) piece, since a server serial number is not available what documentation of the system physical and operating environment is acceptable?
  • What level of data center (i.e. SAS 70 Level II) is approved for use by the organization for public cloud applications and how is that certification proven?
  • What documentation can the cloud vendor provide describing how they developed and implemented the data center environment?
  • What training and certification documents are available for the vendor personnel who will be managing/maintaining the environment?
  • How detailed and accurate are the vendors change management records and processes?

Any organization creating this type of a strategy would have to assess its appetite for potential risk and balance that against the gains and cost savings that are a part of the promise of cloud computing. There are no hard and fast rules on how this can be done as every organization is unique.

Another piece of the puzzle is with the OS and associated software being deployed. This portion of the environment can be easily validated with a documented IQ and a pre-qualified image built. This image can then be loaded up into the cloud as needed and used over and over again. A company can build a whole library of the pre-validated images and have them available for quick deployment which drastically cuts down the time it takes to bring a new environment on-line. There are a number of vendors who are building these pre-qualified images that provide choices in the OS (Windows, Linux) databases, and other portions of the software environment.

As I have mentioned in several prior posts the possibilities for leveraging cloud computing in the life sciences are potentially enormous. From speeding up drug research and discovery, allowing for the rapid deployment of new systems, providing the needed compute power required by resource hungry scientific applications, to cutting costs and migrating legacy applications into the cloud there are a myriad of ways that life science CIO’s can leverage cloud environments, both public and private. As with any change, part of the CIO’s responsibility is to make sure that the organization has a clear and well thought out strategy for incorporating cloud computing into its overall IT strategic direction.

Bruce Maches is a former Director of Information Technology for Pfizer’s R&D division, current CIO for BRMaches & Associates and a contributing editor for HPC in the Cloud.

<Return to section navigation list> 

Cloud Computing Events

Eric Nelson (@ericnel) posted a Call for speakers for UK Windows Azure Platform online conference on 20th of September on 8/4/2010:

image I have decided to try and put together a top notch online conference delivered predominantly via UK based speakers, inspired  by the very enjoyable and useful community driven MVC Conference http://mvcconf.com/ (see my write up)

imageWhat I need now is top notch session proposals from folks who know what they are talking about and ideally are UK based. Which means – you! Possibly :-)

The plan is:

  • The conference will take place 10am to 5pm UK time, delivered using Live Meeting and recorded for on-demand access
  • Two “rooms”(or three is we have loads of speakers/sessions!)
    • which equates to 10 (or 15) session of 55mins with 5min breaks between
  • One room will be more about introducing the Windows Azure Platform (lap around etc.), likely predominantly MS delivered.
    • Aim is to help developers new to Azure.
  • Second room is about detailed topics, learning, tips etc. Delivered by a mix of MS, community and early adopters.
    • For developers who already understand the basics
  • + panel Q&A
  • + virtual goody bag
  • + ?
  • for FREE

What I am after is folks with strong knowledge of the Windows Azure Platform to propose a session (or two). All I need at this stage is a very short draft session proposal by ideally this Thursday (as I’m on holiday next week). This is a short proposal – I just need a one liner so I can start to think what works overall for the day. Please send to eric . nelson AT microsoft . com (without the spaces).

If you are not UK based, you are still welcome to propose but you need to be awake during UK time and speak good English.

And the wonderful bit is… you can present it from the comfort of your home or office :-)

P.S. I’m sure I don’t need to say it, but even if you don’t fancy speaking, do block of the 20th now as it promises to be a great day.


The International Supercomputing Conference on 8/4/2010 announced that the ISC Cloud’10 Conference to Help Attendees See the Cloud More Clearly will be held 10/28 to 10/29/2010 in Frankfurt, Germany:

image The organizers of the International Supercomputing Conference (ISC), building on their 25 years of leadership and expertise in supercomputing events, introduce the inaugural ISC Cloud’10 conference to be held October 28-29, 2010 in Frankfurt, Germany.

The ISC Cloud’10 conference will focus on computing and data intensive applications, the resources they require in a computing cloud, and strategies for implementing and deploying cloud infrastructures. Finally, the conference will shed light on how cloud computing will impact HPC.

“Although cloud computing is already being addressed at other conferences, there is definitely a need for a dedicated international cloud computing event at which both researchers and industry representatives can share ideas and knowledge related to compute and data intensive services in the cloud,” said ISC Cloud General Chair Prof. Dr. Wolfgang Gentzsch, an international expert in Grids who now brings his expertise to the clouds.

Questions like “How will cloud computing benefit my organization?”, “What are the roadblocks I have to take into account (and remove)?”, “What kind of applications and services are suitable for clouds?”, “How does virtualization impact performance?”, “Will clouds replace supercomputers and Grids?”, “Can clouds finally help us deal with our data deluge?”, and “Replace our internal data vaults?”, will be addressed and answered at the conference.

ISC Cloud attendees will have a chance to participate in discussions, receive useful help for making decisions (taking cost-benefit, security, economics and other issues into account), clear up misconceptions, identify the limitations of cloud computing and, last but no less important, make contacts to facilitate decisions in this key field of IT.

Participants will also enjoy outstanding opportunities to network with leading minds from around the world.

The key topics of this conference are:

• Cloud computing models: private, public and hybrid clouds
• Virtualization techniques for the data center
• Perspective of scientific cloud computing
• Migrating large data sets into a cloud
• Trust in cloud computing: security and legal aspects
• Cloud computing success stories from industry and research; and
• Lessons learnt and recommendations for building your cloud or using cloud services.

Among the renowned speakers are Kathy Yelick from Lawrence Berkeley National Laboratory, USA; Dan Reed from Microsoft; Matt Wood from Amazon and John Barr from the 451 Group on Cloud Computing. The presentations are available here.

The Details

ISC Cloud ’10 will be held Thursday, October 28 and Friday, October 29, at the Frankfurt Marriott Hotel. Participation will be limited and early registration is encouraged.

Registration will open on August 16. For more information, visit the ISC Cloud website.

About ISC Cloud’10

Organized by Prof. Hans Meuer and his Prometeus Team, ISC Cloud’10 brings together leading experts in cloud computing from around the world presenting valuable information about their own experience with designing, building, managing and using clouds, in a collegial atmosphere.

Subscribe to our newsletter to find out more about our conference and about cloud computing. Plus join our Facebook group to share your thoughts and follow us on twitter for latest updates.

<Return to section navigation list> 

Other Cloud Computing Platforms and Services

imageSee Black Hat Briefings USA 2010 distributed Grant Bugher’s Secure Use of Cloud Storage PDF whitepaper of July 2010, which covers Windows Azure Table storage and Amazon S3/SimpleDB security in the Cloud Security and Governance section above.

Stephen O’Grady asked AWS: Forget the Revenue, Did You See the Margins? in this 8/4/2010 Redmonk post:revenue-slide

imageTwo days ago, two analysts from UBS – Brian Pitz and Brian Fitzgerald – projected Amazon Web Services revenues at $500 million. Many were disappointed, expecting more from the widely acknowledged market leader: a half a billion dollars is approximately what Microsoft spent per datacenter pre-2010.

Those who would focus on the actual revenue figure, however, are likely to miss the more important margin numbers.

It has been long assumed that cloud computing – at least as currently practiced by the lower value add Infrastructure-as-a-Service practitioners – is a low margin business by enterprise infrastructure standards. Larger systems players such as HP and IBM have shown little appetite for the public cloud market in part because of this, depending on who you talk to. One senior executive I spoke with from a large systems vendor two years ago was blunt in his assessment of the prospects for a public cloud offering: “I don’t want to be in the hosting business.” The implication being that hosting offered insufficient margins.

Certainly nothing in Amazon’s pricing, either at launch through to today, has seemed to contradict this conventional wisdom. True, the actual cost of full-time EC2 instances exceeded competitive offerings from traditional hosts, but the dynamic consumption of AWS servers would presumably mitigate the moderate margin Amazon could realize. A half month of a $72 dollar server is worth less than a full month of a $40 server and so on.

Except that it apparently isn’t.

At the OSCON Cloud Summit, I delivered a presentation on cloud lockin. The concept as it relates to margins was simple: margins at the foundational layers were slimmer, which was spurring the development of various platform services which besides creating the potential for lock-in would theoretically provide higher margins. It’s standard technology value-add thinking: if I can charge $10 for a basic, bare bones server, I should be able charge $20 for a platform in which you don’t worry about servers, capacity planning and such any longer. That the market has largely rejected platform services in favor of more elemental infrastructure building blocks doesn’t change the basic economic assumptions being made.

My presentation was, I believe, generally well received. The reactions, both on Twitter and in person following my talk, were positive and question oriented. With one notable exception.

James Watters argued vigorously that I was underestimating, substantially, Amazon’s margins.

WOW @sogrady couldn’t have been more wrong when comparing the margins of HP to EC2; EC2 much higher than HP average 25%.

Partially the disconnect is that I hadn’t meant to imply a comparison of cloud margins to that of HP generally. The intent was rather to contrast typical cloud margins to enterprise technology businesses, where ideal margins generally begin at 40%. But as it turns out, James was right and I was wrong, irrespective of that framing error.

According to UBS, Amazon Web Services gross margins for the years 2006 through 2014 are 47%, 48%, 48%, 49%, 49%, 50%, 50.5%, 51%, 53%. Granted, this is an analyst projection. And the inherent risk of projecting four years out in a volatile market is acknowledged.

But even should we trim the figures liberally, the fact is that the margins that Amazon is realizing on basic infrastructure services are substantial. For context, look at the income statements for a Cisco, an IBM or an HP.

If this is true, most of what what we’ve believed about Amazon’s business – that it was in fact a high volume over low margin business – is wrong. And if that’s wrong, it changes the way we must evaluate the cloud industry and the attendant economic opportunities. Revenue is a function of volume and margin. The volume, with respect to the cloud, is not a concern for me. The margin always has been. If that concern can be erased through combinations of automation, efficiencies and scale, then the economics of the cloud look even brigher than they did before. The current market size may portend less upside that we’ve historically seen from technology sectors because it’s more significantly driven by volume than in years past, but I have few concerns about the market potential long term.

Which is good news for the industry as a whole, I think. Sometimes it’s good to be wrong.

Todd Hoff posted Dremel: Interactive Analysis of Web-Scale Datasets - Data as a Programming Paradigm to the High Scalability blog on 8/4/2010:

If Google was a boxer then MapReduce would be a probing right hand that sets up the massive left hook that is Dremel, Google's—scalable (thousands of CPUs, petabytes of data, trillions of rows), SQL based, columnar, interactive (results returned in seconds), ad-hoc—analytics system. If Google was a magician then MapReduce would be the shiny thing that distracts the mind while the trick goes unnoticed. I say that because even though Dremel has been around internally at Google since 2006, we have not heard a whisper about it. All we've heard about is MapReduce, clones of which have inspired entire new industries. Tricky.

Dremel, according to Brian Bershad, Director of Engineering at Google, is targeted at solving BigData class problems:

While we all know that systems are huge and will get even huger, the implications of this size on programmability, manageability, power, etc. is hard to comprehend. Alfred noted that the Internet is predicted to be carrying a zetta-byte (1021 bytes) per year in just a few years. And growth in the number of processing elements per chip may give rise to warehouse computers of having 1010 or more processing elements. To use systems at this scale, we need new solutions for storage and computation.

How Dremel deals with BigData is describe in this paper, Dremel: Interactive Analysis of Web-Scale Datasets, which is the usual high quality technical paper from Google on the architecture and ideas behind Dremel. To learn more about the motivation behind Dremel you might want to take a look at The Frontiers of Data Programmability, a slide deck from a Key Note speech given by Dremel paper co-author, Sergey Melnik.

Why is a paper about Dremel out now? I assume it's because Google has released BigQuery, a web service that enables you to do interactive analysis of massively large datasets, which is based on Dremel. To learn more about what Dremel can do from an analytics perspective, taking a look at BigQuery would be a good start.

You may be asking: Why use Dremel when you have MapReduce? I think Kevin McCurley, a Google Research Scientist, answers this nicely:

The first step in research is to form a speculative hypothesis. The real power of Dremel is that you can refine these hypotheses in an interactive mode, constantly poking at massive amounts of data. Once you come up with a plausible hypothesis, you might want to run a more complicated computation on the data, and this is where the power of MapReduce comes in. These tools are complementary, and together they make a toolkit for rapid exploratory data intensive research.

So, Dremel is a higher level of abstraction than MapReduce and it fits as part of an entire data slice and dice stack. Another pancake in the stack is Pregel, a distributed graph processing engine. Dremel can be used against raw data, like log data, or together with MapReduce, where MapReduce is used to select a view of the data for deeper exploration.

Dremel occupies the interactivity niche because MapReduce, at least for Google, isn't tuned to return results in seconds. Compared to MapReduce, Dremel's query latency is two orders of magnitude faster. MapReduce is "slow" because it operates on records spread across a distributed file system comprised of many thousands of nodes. To see why, take an example of search clicks. Whenever you search and click on a link from the results, Google will store all the information it can about your interaction: placement on the page, content around the link, browser type, time stamp, geolocation, cookie info, your ID, query terms, and anything else they can make use of. Think about the hundreds of millions of people clicking on links all day every day. Trillions of records must be stored. This data is stored, in one form or another, in a distributed file system. Since that data is spread across thousands of machines, to run a query requires something like MapReduce, which sends little programs out to the data and aggregates the results through intermediary machines. It's a relatively slow process that requires writing a computer program to process the data. Not the most accessible or interactive of tools.

Instead, with Dremel, you get to write a declarative SQL-like query against data stored in a very efficient for analysis read-only columnar format. It's possible to write queries that analyze billions of rows, terabytes of data, trillions of records—in seconds.

Others think MapReduce is not inherently slow, that's just Google's implementation. The difference is Google has to worry about the entire lifecycle of data, namely handling incredibly high write rates, not just how to query already extracted and loaded data. In the era of BigData, data is partitioned and computation is distributed, bridges must be built to cross that gap.

It's interesting to see how the Google tool-chain seems to realize many of the ideas found in Frontiers of Data Programmability, which talks about a new paradigm where data management is not considered just as a storage service, but as a broadly applicable programming paradigm. In that speech the point is made that the world is full of potential data-driven applications, but it's still too difficult to develop them. So we must:

  • Focus on developer productivity 
  • Broaden the notion of a “database developer” to target the long tail of developers

Productivity can be increased and broadened by using something called Mapping-Driven Data Access:

Unsurprisingly Google has created a data as a programming paradigm for their internal use. Some of their stack is also openish. The datastore layer is available through Google App Engine, as is the mapper part of MapReduce. BigQuery opens up the Dremel functionality. The open source version of the data stack is described in Meet the Big Data Equivalent of the LAMP Stack, but there doesn't appear to be a low latency, interactive data analysis equivalent to Dremel...yet. The key insight for me has been to consider data as a programming paradigm and where a tool like Dremel fits in that model.

Related Articles

Mary Jo Foley reported on 8/4/2010 Salesforce pays Microsoft to settle patent infringement suit. From the summary:

image Microsoft announced on August 4 that it has settled its patent infringement case with Salesforce. While the terms of the agreement aren’t being disclosed “Microsoft is being compensated by Salesforce.com,” according to a Microsoft press release.

<Return to section navigation list> 

blog comments powered by Disqus