For the last year or so, my team at Berkeley — in collaboration with Yahoo Research — has been undertaking an aggressive experiment in programming. The challenge is to design a radically easier programming model for infrastructure and applications in the next computing platform: The Cloud. We call this the Berkeley Orders Of Magnitude (BOOM) project: enabling programmers to develop OOM bigger systems in OOM less code.
To kick this off we built something we call BOOM Analytics [link updated to Eurosys10 final version]: a clone of Hadoop and HDFS built largely in Overlog, a declarative language we developed some years back for network protocols. BOOM Analytics is just as fast and scalable as Hadoop, but radically simpler in its structure. As a result we were able — with amazingly little effort — to turbocharge our incarnation of the elephant with features that would be enormous upgrades to Hadoop’s Java codebase. Two of the fanciest are:
- High availability: Hadoop has a single point of failure at its HDFS master (“name”) node. BOOM Analytics provides hot-standby name node failover, courtesy of a concise Overlog implementation of MultiPaxos.
- Scale-Out: Hadoop has two chokepoints for scalability at its master nodes, which must run on a single box. BOOM Analytics provides name node scaleout via data partitioning of the namespace. When the name node gets full, you just buy more machines and repartition. And this composes with the previous feature: each partition enjoys high availability via MultiPaxos.
The whole effort of building BOOM Analytics took 12 months for 4 PhD students — including the time taken to write an Overlog interpreter from scratch. The scaleout feature took one grad-student developer a day to implement. One day.
BOOM Analytics is serious distributed fu. But it really turned out to be pretty simple with the right programming model.
Why did we do this? What’s next? Well, you can read the tech report for details. Just a couple notes here. First, we have no designs on displacing Hadoop. BOOM Analytics is intended as an example of the power of declarative programming. If it has any utility for the Hadoop community it would probably be as a rapid prototyping environment for new features. The other point is that we’re not trying to sell Overlog as the “right” language for anything. In fact, a key motivation for building BOOM Analytics was to gain some practical experience in order to design a better language. We are calling that language Lincoln. I look forward to talking about it more soon.