Category Archives: academia

It’s official: the name of the programming language for the BOOM project is:  Lincoln Bloom.

I didn’t intend to post about Bloom until it was cooked, but two things happened this week that changed my plans.  The first was the completion of a tech report on Dedalus, our new logic language that forms the foundation of Bloom.  The second was more of a surprise: Technology Review decided to run an article on our work, and Bloom was the natural way to talk about it.

More soon on our initial Dedalus results.

Papers are being solicited for the ACM’s new symposium on cloud computing (SOCC) — and they’re due pretty soon, January 15. Both research and industrial papers are welcome. The folks involved (present company excepted) are really strong, and we expect to have some very interesting invited speakers as well. Interesting enough to entice folks to Indianapolis!

The best parts of these smaller symposia are the give-and-take of people in the room talking about each other’s work. So send in your best ideas and plan to come.

More info including the call for papers at http://research.microsoft.com/socc2010

argueThanks to Boon Thau Loo and Stefan Sariou for a very interesting workshop on Networking Meets Databases (NetDB), and especially for inviting a high-octane panel to debate the success and directions of Declarative Networking.

The panel members included:

  • Fred Baker, Cisco
  • Joe Hellerstein, Berkeley
  • Eddie Kohler, UCLA and Meraki
  • Arvind Krishnamurthy, U Washington
  • Petros Maniatis, Intel Research
  • Timothy Roscoe, ETH Zurich

Butler Lampson made numerous comments from the audience, and given his insight and stature was viewed by most as something of an additional panelist.

I was happy to see a very vigorous debate!  Lots of interesting points made, no punches pulled.  My slides are posted here, and include an ad hoc manifesto for how to move forward. Read More »

lightning

For the last year or so, my team at Berkeley — in collaboration with Yahoo Research — has been undertaking an aggressive experiment in programming.  The challenge is to design a radically easier programming model for infrastructure and applications in the next computing platform: The Cloud.  We call this the Berkeley Orders Of Magnitude (BOOM) project: enabling programmers to develop OOM bigger systems in OOM less code.

To kick this off we built something we call BOOM Analytics [link updated to new version]: a clone of Hadoop and HDFS built largely in Overlog, a declarative language we developed some years back for network protocols.  BOOM Analytics is just as fast and scalable as Hadoop, but radically simpler in its structure.  As a result we were able — with amazingly little effort — to turbocharge our incarnation of the elephant with features that would be enormous upgrades to Hadoop’s Java codebase.  Two of the fanciest are: Read More »

1463574952_dd400430e5

[Update 11/5/2009: the first paper on Usher will appear in the ICDE 2010 conference.]

Data quality is a big, ungainly problem that gets too little attention in computing research and the technology press. Databases pick up “bad data” – errors, omissions, inconsistencies of various kinds — all through their lifecycle, from initial data entry/acquisition through data transformation and summarization, and through integration of multiple sources.

While writing a survey for the UN on the topic of quantitative data cleaning, I got interested in the dirty roots of the problem: data entry. This led to our recent work on Usher [11/5/2009: Link updated to final version], a toolkit for intelligent data entry forms, led by Kuang Chen.

Read More »

1463574952_dd400430e5_m2Relational databases are for structured data, right? And free text lives in the world of keyword search?

Well.  

Another paper we recently finished up was on Declarative Information Extraction in a Probabilistic Database System.  In a nutshell (as my buddy Minos is wont to say), this is about

  1. automatically converting free text into structured data,
  2. using the state of the art machine learning technique (Conditional Random Fields), which is 
  3. coded up in a few lines of SQL that integrates with the rest of your query processing.

This is Daisy Wang’s baby, and it’s really cool.  She’s achieved a convergence where free text, relational data and statistical models all come together in an elegant and very practical way.  

Read More »

Some of my colleagues and I have a pretty nifty idea that we’ve submitted to SIGMOD 2009.  But I’m not gonna tell you what it is.  Because my friends at SIGMOD won’t let me.

Blogs democratize publication and reduce delays and friction in scientific dialog. Right?  A small step toward Open Notebook Science.  Too bad SIGMOD has moved in the opposite direction with double blind reviewing, bottling up ideas for months at a time. Actually,  they’d really prefer I didn’t tell you about other ideas I’m playing with that I might submit next year.  Then you might guess who I am!  As if my work didn’t give me away.

Read More »