October | 2009 | Data in Beta

Monthly Archives: October 2009

MapReduce Online! (and some gimmes)

October 18, 2009 – 7:22 pm
Posted in database, gimmes, map reduce, parallelism, research
Tagged hadoop, HOP, online aggregation, stream queries
Comments (10)

Hadoop MapReduce is a batch-processing system. Why? Because that’s the way Google described their MapReduce implementation.

But it doesn’t have to be that way. Introducing HOP: the Hadoop Online Prototype [updated link to final NSDI ’10 version]. With modest changes to the structure of Hadoop, we were able to convert it from a batch-processing system to an interactive, online system that can provide features like “early returns” from big jobs, and continuous data stream processing, while preserving the simple MapReduce programming and fault tolerance models popularized by Google and Hadoop. And by the way, it exposes pipeline parallelism that can even make batch jobs finish faster. This is a project led by Tyson Condie, in collaboration with folks at Berkeley and Yahoo! Research.

NetDB 2009: Declare Your Declarativity!

Thanks to Boon Thau Loo and Stefan Sariou for a very interesting workshop on Networking Meets Databases (NetDB), and especially for inviting a high-octane panel to debate the success and directions of Declarative Networking.

The panel members included:

Fred Baker, Cisco
Joe Hellerstein, Berkeley
Eddie Kohler, UCLA and Meraki
Arvind Krishnamurthy, U Washington
Petros Maniatis, Intel Research
Timothy Roscoe, ETH Zurich

Butler Lampson made numerous comments from the audience, and given his insight and stature was viewed by most as something of an additional panelist.

I was happy to see a very vigorous debate! Lots of interesting points made, no punches pulled. My slides are posted here, and include an ad hoc manifesto for how to move forward. Read More »

I Do Declare: Pocket-Sized Paxos and 2PC

Headline: We now have a robust declarative implementation of MultiPaxos with leader election, which is radically simpler than most existing implementations. It’s compact, suprisingly readable (as Paxos implementations go!) and live. It forms a key part of our Boom Analytics implementation of a high-availability Hadoop File System.

Maybe more interesting are the lessons we learned about how distributed protocols and declarative languages go together, and the design patterns that emerged. We’re using this to ground the design of our new language, code-name Lincoln. A paper on the topic is being presented this Wednesday at NetDB 2009, after SOSP.

Data in Beta

Monthly Archives: October 2009

MapReduce Online! (and some gimmes)

NetDB 2009: Declare Your Declarativity!

I Do Declare: Pocket-Sized Paxos and 2PC

« Home

Pages

Categories

Archives

Search

Blogroll

RSS Feeds

Meta