I am spoiled — I get to work with a brilliant bunch of students and colleagues. They’ve been doing some really amazing research recently, and I’m happy to report that they’re getting some of the recognition they deserve:

I’ve blogged about all these projects before, and since most of them are in their initial stages I fully expect there will be more to report in future.  Meanwhile, it’s nice to see these folks getting recognized for their work, and it will be interesting to get some feedback at the conferences.

Congrats, folks!

It’s been about 6 years now that we’ve been working on declarative programming for distributed systems — starting with routing protocols, then network overlays, query optimizers, sensor network stacks, and more recently scalable analytics and consensus protocols.

Through that time, we’ve struggled to find a useful middle ground between the pure logic roots of classical declarative languages like Datalog, and the practical needs of real systems managing state across networks. Our compromises over the years allowed us to move forward, build real things, and learn many lessons. But they also led to some semantic confusion — as noted in papers by colleagues at Max Planck and AT&T.

Well, no more. We recently released a tech report on Dedalus, a formal logic language that can serve as a clean foundation for declarative programming going forward.  The Dedalus work is fairly theoretical, but having tackled it we’re in a strong position to define an approachable and appealing language that will let programmers get their work done in distributed environments. That’s the goal of our Bloom language.

The key insight in Dedalus is roughly this:

Time is essential; space is a detail.

Read More »

It’s official: the name of the programming language for the BOOM project is:  Lincoln Bloom.

I didn’t intend to post about Bloom until it was cooked, but two things happened this week that changed my plans.  The first was the completion of a tech report on Dedalus, our new logic language that forms the foundation of Bloom.  The second was more of a surprise: Technology Review decided to run an article on our work, and Bloom was the natural way to talk about it.

More soon on our initial Dedalus results.

Papers are being solicited for the ACM’s new symposium on cloud computing (SOCC) — and they’re due pretty soon, January 15. Both research and industrial papers are welcome. The folks involved (present company excepted) are really strong, and we expect to have some very interesting invited speakers as well. Interesting enough to entice folks to Indianapolis!

The best parts of these smaller symposia are the give-and-take of people in the room talking about each other’s work. So send in your best ideas and plan to come.

More info including the call for papers at http://research.microsoft.com/socc2010

oscilloHadoop MapReduce is a batch-processing system.  Why?  Because that’s the way Google described their MapReduce implementation.

But it doesn’t have to be that way. Introducing HOP: the Hadoop Online Prototype. With modest changes to the structure of Hadoop, we were able to convert it from a batch-processing system to an interactive, online system that can provide features like “early returns” from big jobs, and continuous data stream processing, while preserving the simple MapReduce programming and fault tolerance models popularized by Google and Hadoop.  And by the way, it exposes pipeline parallelism that can even make batch jobs finish faster.  This is a project led by Tyson Condie, in collaboration with folks at Berkeley and Yahoo! Research.

Read More »

argueThanks to Boon Thau Loo and Stefan Sariou for a very interesting workshop on Networking Meets Databases (NetDB), and especially for inviting a high-octane panel to debate the success and directions of Declarative Networking.

The panel members included:

  • Fred Baker, Cisco
  • Joe Hellerstein, Berkeley
  • Eddie Kohler, UCLA and Meraki
  • Arvind Krishnamurthy, U Washington
  • Petros Maniatis, Intel Research
  • Timothy Roscoe, ETH Zurich

Butler Lampson made numerous comments from the audience, and given his insight and stature was viewed by most as something of an additional panelist.

I was happy to see a very vigorous debate!  Lots of interesting points made, no punches pulled.  My slides are posted here, and include an ad hoc manifesto for how to move forward. Read More »

Agreement Protocol

Headline: We now have a robust declarative implementation of MultiPaxos with leader election, which is radically simpler than most existing implementations.  It’s compact, suprisingly readable (as Paxos implementations go!) and live.  It forms a key part of our Boom Analytics implementation of a high-availability Hadoop File System.

Maybe more interesting are the lessons we learned about how distributed protocols and declarative languages go together, and the design patterns that emerged.  We’re using this to ground the design of our new language, code-name Lincoln.  A paper on the topic is being presented this Wednesday at NetDB 2009, after SOSP.

Read More »

lightning

For the last year or so, my team at Berkeley — in collaboration with Yahoo Research — has been undertaking an aggressive experiment in programming.  The challenge is to design a radically easier programming model for infrastructure and applications in the next computing platform: The Cloud.  We call this the Berkeley Orders Of Magnitude (BOOM) project: enabling programmers to develop OOM bigger systems in OOM less code.

To kick this off we built something we call BOOM Analytics [link updated to new version]: a clone of Hadoop and HDFS built largely in Overlog, a declarative language we developed some years back for network protocols.  BOOM Analytics is just as fast and scalable as Hadoop, but radically simpler in its structure.  As a result we were able — with amazingly little effort — to turbocharge our incarnation of the elephant with features that would be enormous upgrades to Hadoop’s Java codebase.  Two of the fanciest are: Read More »

 

stripped down VW

I just heard through the Berkeley grapevine about the BashReduce effort at Last.fm: MapReduce in 126 lines of bash script! Awesome. I’m sure it doesn’t do X, Y and Z. So ask yourself: do you need X? Y? Z? Maybe instead you want V and W. Maybe you should roll your own tool.  

 

Makes you think.

428397739_e5ac735923_bWas intrigued last week by the confluence of two posts:

  • Owen O’Malley and Arun Murthy of Yahoo’s Hadoop team posted about sorting a petabyte using Hadoop on 3,800 nodes.
  • Curt Monash posted that eBay hosts a 6.5 petabyte Greenplum database on 96 nodes

Both impressive.  But wildly different hardware deployments. Why??  It’s well known that Hadoop is tuned for availability not efficiency.  But does it really need 40x the number of machines as eBay’s Greenplum cluster?  How did smart folks end up with such wildly divergent numbers?

Read More »