Friday, January 14, 2011

Windows Azure and Cloud Computing Posts for 1/14/2011+

A compendium of Windows Azure, Windows Azure Platform Appliance, SQL Azure Database, AppFabric and other cloud-computing articles.

AzureArchitecture2H640px3   
Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:

To use the above links, first click the post’s title to display the single article you want to navigate.


Azure Blob, Drive, Table and Queue Services

imageNo significant articles today.


<Return to section navigation list> 

SQL Azure Database and Reporting

imageNo significant articles today.


<Return to section navigation list> 

MarketPlace DataMarket and OData

imageNo significant articles today.


<Return to section navigation list> 

Windows Azure AppFabric: Access Control and Service Bus

image722322No significant articles today.


<Return to section navigation list> 

Windows Azure Virtual Network, Connect, RDP and CDN

imageNo significant articles today.


<Return to section navigation list> 

Live Windows Azure Apps, APIs, Tools and Test Harnesses

Adron Hall (@adronbh) delivered Gritty Technical Info on Windows Azure Web Roles in a 1/14/2011 post:

image This is a follow up to the previous blog entry I wrote pertaining to Windows Azure Roles.  I wanted to cover the bases on the various technical aspects of creating a Windows Azure Web Role & Worker Role in Visual Studio 2010.  Without interruption let’s just dive right in.  Start Visual Studio 2010 and initiate a new project. File, new, and then project will open the new project dialog.

Windows Azure Project

Windows Azure Project

imageSelect a cloud template type and name your project.  Click OK and the New Windows Azure Project Dialog will appear to select the role types you can choose from.

Windows Azure Project Templates

Windows Azure Project Templates

Select an ASP.NET MVC Web Application, name it appropriately, and then click OK.  When prompted for a test project select yes and click OK.  When the solution is finished generating from the chosen templates there will be a SampleWebRole ASP.NET MVC Web Application, the test project titled SampleWebRole.Tests, and a Windows Azure Project titled Windows Azure Web Role Sample.

Solution

Solution

After that, run the application to assure that the Development Fabric & other parts of the web application startup appropriately.

With the web application still running, click on the Development Fabric Icon in the status bar of Windows 7 and select the Show Computer Emulator UI.

Show Compute Emulator UI

Show Compute Emulator UI

The Windows Azure Compute Emulator will display. Click on the Service Deployments tree until you can see each individual instance (the little green lights should be showing). Figure 4.6 shows this tree opened with one of the instances selected to view the status trace.

Windows Azure Compute Emulator

Windows Azure Compute Emulator

Select Shift + F5 to stop the web application from running.  In the Solution Explorer right click on the SampleWebRole under the Windows Azure Web Role Sample Project and select Properties.

Properties for SampleWebRole

Properties for SampleWebRole

Under the configuration tab of the SampleWebRole Properties set the Instance Count to 6 and the VM Size to Extra Large.

Windows Azure Instance Properties

Windows Azure Instance Properties

Now select F5 to run the web application again in the Windows Azure Development Fabric.  The Windows Azure Compute Emulator (if it is closed right click back on the status icon to launch it again) will now display each of the 6 instances launching under the SampleWebRole.

Windows Azure Compute Emulator

Windows Azure Compute Emulator

Click on one of the green lights to show that specific instance status in the primary window area.

Windows Azure Compute Instance 2

Windows Azure Compute Instance 2

When you select the specific instance the status of that instance is displayed. The instance that is displayed in figure 4.10 has a number of events being recorded with the diagnostics, MonAgentHost, and the runtime. This particular instance had gone through a rough start. During the lifecycle of a Windows Azure Web, Worker, or CGI Role there are a number of events similar to these that can occur.

Read through the first few lines. These lines show that another agent was running, which could be a number of things that conflicted with this web role starting up cleanly. Eventually the web role was able to startup appropriately as shown in the runtime lines stating that the OnStart() is called and then complete, with the Run() executing next.

Reading further through the diagnostics the web role eventually requests a shutdown and then prepares for that shutdown pending the exit of the parent process 6924.

These types of events are common place when reviewing the actions a web role will go through; generally, don’t get too alarmed by any particular set of messages. As long as the role has green lights on the instances, things are going swimmingly. When the lights change to purple or red then it is important to really start paying attention to the diagnostics.

Windows Azure Worker Roles

In the next blog entry (Part II) I want to show is how to add a worker role and how to analyze the activities within the role. The worker role is somewhat different than a web role. The primary difference between a web role and a worker role is that one is built around providing compute work, while one is built around providing web compute. Think of the worker role as something similar to a Windows Service, which runs ongoing to execute jobs & other processes, often backend type processes. A web role is what is built to host Silverlight and web applications such as ASP.NET or ASP.NET MVC.

Part II [will be] published on Monday the 17th.


Bill Zack posted Hand on Labs on Moving Applications to the Cloud to the Ignition Showcase blog on 1/14/2011:

image Based on the recently published Microsoft Patterns and Practices guide Developing Applications for the Cloud on the Microsoft Windows Azure™ Platform, these individual labs correlate with the chapters of the guide, and demonstrate or expand upon the techniques shown in the guide.

image

The labs are designed to enhance your understanding of the Windows Azure platform through practical, hands-on work with the complete, sample Surveys application developed by the fictitious Tailspin company.

See here and/or here to download them. 

Thanks to Tejaswi Redkar, Architect, Worldwide Services Community Lead – Windows Azure for calling this to our attention.


Microsoft Learning added a Windows Azure – Training Portal page on 1/14/2011:

Windows Azure – Training Portal

Find training resources for Windows Azure

Windows Azure

Windows Azure is a flexible cloud-computing (Internet-based computing) platform that allows you to shift your focus from managing and maintaining physical servers to solving business problems and addressing customer needs online. Without the upfront investment in an expensive infrastructure, you pay only for the services that you use. You can scale up when you need capacity, and pull back when you don’t. Microsoft handles all the updates and maintenance of the platform, with more than 99.9 percent uptime.

Take full advantage of the platform's powerful features by acquiring the right training. Explore the Windows Azure training and certification options offered by Microsoft Learning.

On This Page
Special offers

Check this section often for new and limited-time special offers for training on Windows Azure.

Windows Azure 30-day free pass

Receive a 30-day free pass offer to use the Windows Azure platform. No purchase is required; just enter the following promotion code: MSL001. (Available only in the United States.)

Secure a Second Shot for your Microsoft Certification exam (Prometric.com)

Whether you are just beginning to explore the benefits of Microsoft Certification, or you need to pass just one more exam to achieve a Microsoft Certified IT Professional (MCITP) certification, the Second Shot offer (regular exam price plus 15 percent) allows you to retake a Microsoft IT professional, developer, or Microsoft Dynamics exam if you do not pass it on the first attempt.

Learning Plans

A Learning Plan is a collection of training resources that are organized into an interactive path that focuses on your specific learning needs.

Created by Microsoft experts, Learning Plans save you time by organizing the most relevant training content into a self-guided path. Resources include online courses, instructor-led classes, books, articles, and more.

Top of page

Certification

Microsoft Certification is one of the best ways to display your skills on a technology. Certification will help you demonstrate your understanding of features and will inspire confidence.

Make sure hiring managers and project leads notice you. By earning a Microsoft Certification on Microsoft Visual Studio and the Windows Azure platform, you help secure an industry-recognized validation of your technical knowledge and your ability to perform critical developer roles.

View upcoming certification exams:

  • Exam 70-583: PRO: Designing and Developing Windows Azure Applications (available in February 2011)

Top of page

Classroom training

Classroom training is designed to help you build expertise by using world-class learning content. The authorized source for Microsoft training, Microsoft Learning Partners are uniquely positioned to help you develop expertise on the latest Microsoft technology through classroom training; online, instructor-led training; and facilitated, blended learning solutions.

View the class syllabus for the Microsoft Learning course on Windows Azure, and search for a Microsoft Learning Partner near you.

  • Course 50466A: Windows Azure Solutions with Microsoft Visual Studio 2010 (available late 2011)

Top of page

Learning Snacks

If you want to learn more about Windows Azure platform but are short on time, try Microsoft Silverlight Learning Snacks. Learning Snacks are short, interactive presentations about popular topics that are created by Microsoft Learning experts. Each Snack is delivered by using innovative Microsoft Silverlight technology and includes various media, such as animations and recorded demos. At the end of each presentation, you can view more Snacks, learn more about the topic, or visit a related website.

Note To view Learning Snacks, you need to install Microsoft Silverlight and enable pop-up windows.

Secure Your Second Shot

Receive a Second Shot on your Azure exam

Visual Studio 2010 Certification Paths

Download the Visual Studio 2010 certification paths

Visual Studio 2010 Certification Path Roadmap*

*Includes a certification on Windows Azure

Additional Windows Azure Resources


The Windows Azure Team posted a Real World Windows Azure: Interview with Nicklas Andersson, Chief Technology Officer at eCraft; Peter Löfgren, Project Manager at eCraft; and Jörgen Westerling, Chief Communications Officer at eCraft case study on 1/13/2011:

The Real World Windows Azure series spoke to Nicklas Andersson, Chief Technology Officer at eCraft; Peter Löfgren, Project Manager at eCraft; and Jörgen Westerling, Chief Communications Officer at eCraft about using the Windows Azure platform to deliver cloud-based website solutions for the company's customers:

MSDN: Can you give as a quick summary of what eCraft does and who you serve?

Andersson: We are based in Finland and Sweden, and we help companies around the globe integrate IT systems with specific business practices by providing consulting and IT services, and developing customized, easy-to-use interfaces that give our customers access to powerful business systems and software.

For example, a part of our business is set up to help our small to midsized manufacturing- and energy-industry customers use Microsoft Dynamics NAV software to manage business processes such as financial administration, manufacturing, distribution, customer relationships, and e-commerce.

MSDN: Was there a particular challenge you were trying to overcome that led you to develop solutions that use cloud computing?

Andersson: We wanted to begin developing and offering our own software to work with Microsoft Dynamics NAV. However, the customized software that we had built for our larger customers had required significant investments in hardware, and our Microsoft Dynamics NAV customers tend to be smaller companies. They are often averse to high costs associated with buying, operating, and managing new software and hardware on-premises.

At the same time, we had to offer these often fast-growing companies the flexibility to scale solutions up quickly, so we began looking for ways to deliver solutions as Internet-based services, rather than as software that customers needed to install and manage themselves.

imageWesterling: Windows Azure was clearly the most cost-effective alternative. The other services offered virtual machines in the cloud that are still yours to manage. But we could use Windows Azure to actually build a true multitenant solution. Then, we could use the Windows Azure framework itself to achieve the scalability we wanted without additional servers to manage, virtual or not.

MSDN:  Can you describe the solution that you developed? Which components of the Windows Azure platform did you use? 

Andersson: In 2010, we were contacted by a customer called PowerStation Oy that wanted to run a cluster of webshops that it could use to sell ecologically responsible office supplies and energy-saving products online. We took the opportunity to develop a Microsoft Dynamics NAV multitenant webshop integrated with computing and storage resources supplied through Windows Azure. The databases are managed with Microsoft SQL Azure, and the webshop links to Microsoft Dynamics NAV through the AppFabric Service Bus in the Windows Azure platform.

A key design goal was to keep the solution as broadly applicable as possible. We use Windows Azure to deliver a Microsoft Dynamics NAV webshop that works no matter what you sell on the web. For instance, Powerstation does not manufacture the products it sells, but a company could link the webshop to the manufacturing module in Microsoft Dynamics NAV and it would work just as well.

Figure 1: eCraft developed a Microsoft Dynamics NAV multitenant webshop integrated with computing and storage resources supplied through Windows Azure.

MSDN: How will using Windows Azure help eCraft deliver more advanced solutions to its customers?

Lofgren: We used Windows Azure to build a service that young, fast-growing companies can use to not only cut costs, but focus on their business, sell more products, and make more money. We've already begun using Windows Azure to develop more offerings, including an Ideation Process Management tool, a sales management and tracking tool, and a parts-ordering webshop for manufacturers. With Windows Azure, we can deliver services to our customers faster, and remove a lot of the cost, complexity, and uncertainty that's often associated with adopting a new solution.

Andersson: By using Windows Azure, we are saving up to 70 percent of what we would have spent to operate the Microsoft Dynamics NAV-based webshop on-premises or in a Finnish data center. When we calculate the number of customers we expect for the solution over the next two years, we expect to save more than U.S.$750,000. And by delivering the solution as a service through Windows Azure, we can save our customers up to U.S.$50,000 in hardware and other startup costs. 

Read the full story at: http://www.microsoft.com/casestudies/casestudy.aspx?casestudyid=4000008842

To read more Windows Azure customer success stories, visit: www.windowsazure.com/evidence

To read more about eCraft and PowerStation, visit: www.ecraft.com and www.powerstation.fi

Could the author(s) add one or two more “eCraft”s to the post title somehow? Sounds like extreme SEO to me.


<Return to section navigation list> 

Visual Studio LightSwitch

image2224222No significant articles today.


Return to section navigation list> 

Windows Azure Infrastructure

Eric Nelson (@ericnel) reported FREE Windows Azure Platform Compute and Storage through the Cloud Essentials Pack for Partners on 1/14/2011:

image It can be difficult to find something to look forward to in January – but this year it was a little easier as a) I got lots of great Xbox 360 games and b) the Windows Azure Platform element of the Cloud Essentials Pack for Microsoft Partner Network partners went live.

imageI have previously explained what the Cloud Essentials Pack is and how you can access – but at the time I couldn’t share the details of the Windows Azure Platform element.

The Windows Azure Platform element is now available. It gives you each month, for FREE:

Windows Azure:

  • 750 hours of extra small compute instance
  • 25 hours of small compute instance
  • 3GB of storage and 250,000 storage transactions

SQL Azure:

  • 1 SQL Azure Web Edition database (5GB)

Windows Azure AppFabric:

  • App Fabric with 100,000 Access Control transactions and 2 Service Bus connections

Plus:

  • Data Transfer:  3GB in and 6GB out

(More details of the offer)

To activate this offer

You need to:

My Windows Azure Compute Extra-Small VM Beta Now Available in the Cloud Essentials Pack and for General Use post of 1/9/2011 provides many illustrations and more detail.


Mark Bower explained How Cloud Computing Changes the Economics of Software Architecture in a 1/13/2011 post:

image I’ve been thinking a lot about architecting software applications for the cloud lately – particularly for Windows Azure, as that’s the platform we have chosen to build Connectegrity’s SaaS solution on.

imageLots has been written about the impact of PaaS and IaaS services like Amazon Web Services and Windows Azure on software architecture.  I’ve seen plenty of commentary arguing that architects need to change the way they design systems to consider the platform billing model and on-going costs.

But is that really a change? In my opinion, it’s no different to what we architects have always done – only in the past the cost considerations were different. It was about numbers of servers, software licences, software versions etc.

I believe the cloud computing model changes our approach in a much more fundamental way.

I see the shift that is happening right now as the modern equivalent of what happened when Windows went from 16-bit to 32-bit. 

Freed from the memory limitations of 16-bit computing we all stopped optimizing our Windows code, as there were simply so many system resources available to play with… it was effectively limitless.

Cloud computing platforms bring the same philosophical shift to web applications.

Yes, I could spend time architecting for the billing model. I could spend money getting programmers to performance tune their code to reduce billing charges. But that doesn’t mean I should.

Let me ask you this: Why spend 50$ an hour on getting a programmer to tune their code, when instead I can pay Microsoft or Amazon another $50 a month and throw another web front end at the problem. Then, instead I can have my developers doing something much more useful to the business: adding new product features more quickly than the competition so that I can sell more and make more revenue.

And that is the true economics of cloud computing.

Mark is co-founder and Chief Technology Officer at Connectegrity – a provider of technology solutions for legal, accounting and other professional service firms.


<Return to section navigation list> 

Windows Azure Platform Appliance (WAPA), Hyper-V and Private Clouds

Klint Finley reported Touring Texas with the Bloggers Part 2: Smart Power Distribution, a Little Cloud History and More first-hand in a 1/13/2011 post to the ReadWriteCloud blog:

image This week I attended HP ISS Tech Day at Hewlett-Packard's Houston facility along with several other bloggers. In part one we talked a bit about the definition of cloud computing and toured the POD-Works facility for manufacturing private clouds. In part two we'll look at HP's technologies for building private clouds, including Intelligent Power Discovery and Virtual Connect. We'll also take a brief look at HP's original private cloud offering.

Intelligent Power Discovery

image HP cites its expertise in power management as a key advantage to its manufactured data centers that it ships in containers to customers. But even if you don't want to have HP ship you a pre-built data center, you can take advantage of its Intelligent Power Discovery technology.

Intelligent Power Discovery is the name for the combination of HP's highly efficient server power supplies, its power distribution units and its power management software. The system is designed to make it easy to add new servers to a data center and automatically adjust power allotment. Using HP's software you can manage power allocation and find overheating servers. You can use HP's remote server management console iLO to manage practically all of your power requirements.

Notably, all of this happens automatically. Sensors are built into all the necessary cables to make monitoring and reporting as simple as possible.

BladeSystem Matrix

BladeSystem Matrix is the core of HP's "converged infrastructure" strategy. It's a framework that integrates servers, storage, networking and software and can be used as the foundation for building private clouds. Its primary software offering is the Matrix Operating Environment, which includes templates for deploying virtualized servers. HP gives customers the option to choose between Citrix, Microsoft and VMWare for virtualization and includes templates for fully configured servers for common products from companies like Oracle, Microsoft and SAP. For example, as part of the demo a Microsoft Exchange server was deployed from a template in just a few clicks.

BladeSystem Matrix was preceded by the HP Utility Data Center (UDC) in 2001, back when cloud computing was still referred to as utility computing. UDC was discontinued in 2004. CNET's Gordon Haff wrote in 2009 that UDC was ahead of its time, expensive and tied to proprietary HP software. Haff wrote that BladeSystem Matrix is much more rooted open standards and components than UDC was.

Matrix competes with other converged systems such as Cisco's Unified Computing System.

Virtual Connect

Virtualizing servers can put an excessive I/O load on the host server since those servers will be handling many more concurrent connections. HP Virtual Connect attempts to solve this problem while simultaneously reducing network infrastructure complexity. This IDC white paper offers the best explanation of Virtual Connect I could find.

Virtual machine environments typically require six to eight physical network cards per server. Virtual Connect creates virtual network cards that look just the same as physical network cards to hypervisors. One physical network card can support four virtual network cards. This doesn't just reduce the need for additional physical network cards, it also reduces the number of switches and cables required to support all those cards.

Virtual Connect can also provide bandwidth throttling on the fly. Let's say you have a server that needs more bandwidth than one gigabit. Traditionally, that server would need a 10 gigabit network card, even if it doesn't actually need 10 gigabits of capacity. With Virtual Connect, you could create multiple virtual network cards using the same physical 10 gigabit network cards and split the bandwidth between them any way you want. For example, you could create four virtual network cards: one network card with five gigabits of bandwidth, two network cards with two gigabits of bandwidth each and one network card with one gigabit of bandwidth.

Virtual Connect competes with other I/O virtualization appliances such as Dell's FlexAdress.

Conclusion

HP is serious about helping customers build private clouds as simply as possible with the lowest possible total cost of ownership. Whether you want them to build something for you or build it yourself, HP has all the products and services required - from building and shipping entire data centers, to cutting power bills to virtualizing both servers and network infrastructure. HP seems to be the leader in each of the technologies we looked at, but there is plenty of competition. It will be exciting to see how the industrialization of physical infrastructure and virtualization of everything else transforms data centers in the next few years.

Disclosure: HP is a ReadWriteWeb sponsor, and paid for Klint Finley's travel and accommodations to attend HP ISS Tech day.

I’m surprised HP didn’t promote the WAPA hardware they touted at the Microsoft Partners Summit last year.


<Return to section navigation list> 

Cloud Security and Governance

image

No significant articles today.


<Return to section navigation list> 

Cloud Computing Events

Channel9’s Cloud Cover page contains links to 34 Cloud Cover episodes starting with 2/19/2010. The latest is a 00:07:33 Cloud Cover (34) Hard Hat Edition segment with Wade Wegner (@WadeWegner, left) and Steve Marx (@smarx, right) :

image

image

Join Wade and Steve each week as they cover the Microsoft cloud. You can follow and interact with the show at @cloudcovershow.

In this episode, Wade and Steve couldn't resist the urge to visit the Channel 9 studio while it's under construction.  While deftly avoiding the danger of falling studio equipment, Wade and Steve:

  • Showcase the Channel9 studio construction
  • Lament Ryan's departure from Microsoft and introduce Wade as the new co-host
  • Announce that the Windows Azure Platform Evangelism Team is looking for a new Technical Evangelist
  • Talk about Cloud Cover shows coming up in the near future (teaser: guest appearance by Mark Russinovich)

Show Links:


<Return to section navigation list> 

Other Cloud Computing Platforms and Services

Sam Diaz reported Google Apps makes a new promise: No downtime in a 1/14/2011 post to ZDNet’s Between the Lines blog:

image Anyone buying into a Web-based service knows about the SLA - the service level agreement. That’s where the Web company makes a promise about uptime, the amount of time that the service will be up and running without any service disruption.

image In most cases, there’s a clause in the agreement that allows for scheduled downtime for maintenance. Now, Google - in an effort to further set itself apart from competitors - is removing that downtime clause from its customers SLA’s.

image From here on out, any downtime will be counted and applied toward the customer’s SLA. In addition, the company is amending the SLA so that any intermittent downtime is counted, as well, eliminating the previous provision that any downtime less than 10 minutes was not counted. In a blog post[*], Google Enterprise Product Management Director Matthew Glotzbach wrote:

People expect email to be as reliable as their phone’s dial tone, and our goal is to deliver that kind of always-on availability with our applications… In 2010, Gmail was available 99.984 percent of the time, for both business and consumer users. 99.984 percent translates to seven minutes of downtime per month over the last year. That seven-minute average represents the accumulation of small delays of a few seconds, and most people experienced no issues at all.

And, for those wondering how the downtime compares to on-premise email - specifically for Exchange customers, Google says that seven minutes compares “very favorably” and points to research by the Radicati Group that suggests that Gmail is 46 times more available than Exchange.

The company said that data about Microsoft’s BPOS cloud offering was unavailable but that service notifications showed 113 incident last year, with 74 planned outages and 33 days with planned downtime.

* The blog post Sam quoted was Matthew Glotzbach’s Destination: Dial Tone -- Getting Google Apps to 99.99% of 1/14/2011. There’s no indication that I could find on Friday that Google plans to extend the new Google App SLA’s terms to Google App Engine, which now has no SLA (per Is Google App Engine Enterprise-Ready? by JohannaBeatrice), or Google App Engine for business (App Engine for Business SLA Draft). However, I expect Microsoft’s Office 365 will match Google Apps’ new SLA ultimately.

image

A more interesting question to me is": Will Google’s new terms for downtime measurement trickle into the IaaS and PaaS sectors and become the default for major players like Amazon Web Services and the Windows Azure Platform?

CloudFail.net “aggregates RSS feeds from leading cloud services providers to bring you a unified place to find information about outages and maintenance (pre-planned outages), as they happen.” CloudFail.net syndicates notifications from Google App Engine and Windows Azure, among others.


Stephen O’Grady (@sogrady) asked and answered What Factors Justify the Use of Apache Hadoop? in a 1/13/2011 post to his RedMonk blog:

image The question posed at this week’s San Francisco Hadoop User Group is a common one: “what factors justify the use of an Apache Hadoop cluster vs. traditional approaches?” The answer you receive depends on who you ask.

image Relational database authors and advocates have two criticisms of Hadoop. First, that most users have little need for Big Data. Second, that MapReduce is more complex than traditional SQL queries.

imageBoth of these criticisms are valid.

In a post entitled “Terabytes is not big data, petabytes is,” Henrik Ingo argued that the gigabytes and terabytes I referenced as Big Data did not justify that term. He is correct. Further, it is true that the number of enterprises worldwide with petabyte scale data management challenges is limited.

MapReduce, for its part, is in fact challenging. Challenging enough that there are two separate projects (Hive and Pig) that add SQL-like interfaces as a complement to the core Hadoop MapReduce functionality. Besides being more accessible, SQL skills are an order of magnitude more common from a resource availability standpoint.

Hadoop supporters, meanwhile, counter both of those concerns.

It was Hadoop sponsor Cloudera, in fact, that originally coined the term “Medium Data” as an acknowledgement that data complexity was not purely a function of volume. As Bradford Cross put it:

Companies do not have to be at Google scale to have data issues. Scalability issues occur with less than a terabyte of data. If a company works with relational databases and SQL, they can drown in complex data transformations and calculations that do not fit naturally into sequences of set operations. In that sense, the “big data” mantra is misguided at times…The big issue is not that everyone will suddenly operate at petabyte scale; a lot of folks do not have that much data. The more important topics are the specifics of the storage and processing infrastructure and what approaches best suit each problem.

Big Data, like NoSQL, has become a liability in most contexts. Setting aside the lack of a consistent definition, the term is of little utility because it is single-dimensional. Larger dataset sizes present unique computational challenges. But the structure, workload, accessibility and even location of the data may prove equally challenging.

We use Hadoop at RedMonk, for example, to attack unstructured and semi-structured datasets without the overhead of an ETL step to insert them into a traditional relational database. From CSV to XML, we can load in a single step and begin querying.

There are a variety of options for data mining at the scale we practice it. From the basic grep to the Perl CPAN modules Henrik points to, there are many tools that would provide us with similar capabilities. Why Hadoop? Because the ecosystem is growing, the documentation is generally excellent, the unstructured nature of our datasets and, yes, its ability to attack Big Data. Because while our datasets – at least individually – do not constitute Big Data, they are growing rapidly.

Nor have we had to learn MapReduce. The Hadoop ecosystem at present is rich enough already that we have a variety of front end options available, from visual spreadsheet metaphors (Big Sheets) to SQL-style queries (Hive) with a web UI (Beeswax). No Java necessary.

Brian Aker’s comparison of MapReduce to an SUV in Henrik’s piece is apt; whether you’re a supporter of Hadoop or not, curiously. Brian’s obviously correct that a majority of users will use a minority of its capabilities. Much like SUVs and their owners.

While the overkill of an SUV is offset by its higher fuel costs and size, however, the downside to Hadoop usage is less apparent. Its single node performance is merely adequate and the front ends are immature relative to the tooling available in the relational database world, but the build out around the core is improving by the day.

When is Hadoop justified? For a petabyte workloads, certainly. But the versatility of tool makes it appropriate for a variety of workloads beyond quote unquote big data. It’s not going to replace your database, but your database isn’t likely to replace Hadoop either.

Different tools for different jobs, as ever.

Disclosure: Cloudera is a RedMonk customer.


<Return to section navigation list> 

0 comments: