We were happy to find out this week that our BOOM project and and Bloom langauge have been selected by Technology Review magazine as one of the TR10, their “annual list of the emerging technologies that will have the biggest impact on our world.” This was news to us — we knew they were going to run an article, but weren’t aware of the TR10 distinction. Pretty neat.
I’ve been getting a lot of questions since the article launched about the project and language. So while folks are paying attention, here’s a quick FAQ to answer what the project is all about and its status.
Q: What is BOOM?
BOOM is the name of a research project based at Berkeley, which seeks to enable programmers to build Orders Of Magnitude bigger systems in O.O.M. less code. Our focus is on enabling developers to easily harness the power of many computers at once, e.g. in the setting of cloud computing. We maintain a website for BOOM here with research papers, talk slides, and information about the team.
Q: What is the Bloom language?
Bloom is the name of the language we are developing in the BOOM project. It is a high-level declarative language targeted at programming distributed and parallel computers, for environments like datacenters and cloud computing. Bloom is based on the observation that although most programmers find parallel and distributed programming difficult, they find data-parallelism (e.g., SQL or MapReduce) fairly easy. But data-parallel programming is usually confined to data analysis tasks. The goal for Bloom is to take data-parallelism as a kernel, and extend it to a general-purpose language that programmers will find powerful but natural.
Q: What is the plan for Bloom, and what is its status?
Our goal for Bloom is that it grow into a generally-useful, open-source language environment for distributed programming. We hope to release an initial version of Bloom and its runtime late this year. (Academic researchers are notoriously bad at predicting ship dates, so check back frequently. That said, our research team has clocked a lot of hours building both commercial and open-source software, and we do take pride in building real things.)
Q: What is Dedalus and how does it relate to Bloom?
The formal basis of Bloom is a temporal logic programming language called Dedalus, described here. Dedalus simplifies many challenges in specifying and analyzing programs, in ways that we’re still discovering. But Dedalus is not intended for general use as a practical programming language. Dedalus grew out of our earlier work on Declarative Networking, including a language called Overlog, which was in turn based on a language from database theory called Datalog. (If you’re curious about the origin of the name Dedalus, see footnote 1 in the technical report.)
Q: What is the killer app for Bloom?
Well, that’s the billion-dollar question, isn’t it! When a new platform appears (think of PCs, or smartphones), it usually takes time and tools for killer apps to emerge. The Cloud is a new platform in many ways. Bloom is intended to make it easy for creative programmers to write code that harnesses the unique features of the Cloud. So we don’t have a snappy answer for the one killer app. Instead, our goal is to provide a language suited to the platform and then, as the saying goes…let a million flowers Bloom.
Q: What do BOOM and Bloom have to do with Hadoop?
In the early days of the BOOM project, we wanted to convince ourselves our ideas had some practical merit. To that end, we wanted to build a big and recognizable piece of cloud software in Overlog, a predecessor language to Bloom. We found the MapReduce phenomenon pretty fascinating, so as a first effort we built what we call BOOM Analytics, which reimplements large portions of Hadoop in Overlog. We used Overlog to build a complete, API-compliant reimplementation of the Hadoop Distributed File System. We also gave Hadoop’s MapReduce implementation a “brain transplant”, replacing its scheduler with a much smaller and cleaner one implemented in Overlog. Finally, we showed how easy it was to extend our edition of HDFS with complex new features unavailable in traditional HDFS, including master scale-out and hot-standby in the case of master node failures. This is described in more detail in a paper from Eurosys 2010. The Overlog code for BOOM Analytics is available, as is a standalone version of our Paxos package.
Q: So BOOM Analytics compares Overlog to MapReduce?
No! It compares Overlog to Java. That is, we didn’t run Overlog on top of the Hadoop infrastructure. We rebuilt the Hadoop infrastructure out of Overlog. BOOM Analytics provides a MapReduce engine that is roughly equivalent to (and based on) Hadoop.
Q: Who is working on BOOM and Bloom?
The project is based at UC Berkeley, in close collaboration with researchers at Yahoo! Research. We also keep in contact with colleagues at Microsoft Research and IBM Research who fund our work. A list of the main researchers is on the BOOM website.