One more post on MapReduce and parallel SQL, this time for the folks at O’Reilly Radar.
Just for the record, I think MapReduce is fine, but not especially interesting technology. The thing is, the “teachable moment” it presents is really great stuff, because it is bringing people toward data-centric parallel programming. So it’s good for the data-centric research business in general, and especially for data-centric approaches to parallelism.
I.e. chum in the water for our research on Lincoln…
One thing I plan to do here is jot down ideas I don’t have time to work on myself. Here’s the first installment in what will hopefully be a running series of “Research Gimme‘s”. Anybody who wants to run with this, I’d love to hear what you’re up to.
So…. who’s going to re-examine Online Aggregation in the Hadoop context? Goodness knows it’d be useful. It will require moving Hadoop beyond a slavish implementation of the Google MapReduce paper. That’s got to be a good thing… Here’s the start of the program:
Read More »
here … and it smells like the Internet out there in the comment thread. Go figure.
The first of two invited posts at GigaOm are up. These are not researchy, they’re intended to be informative to a broad audience. They describe the state of affairs in data parallelism, and some of the reasons why this is an increasingly hot topic.
This started out as an exercise for Greenplum, a company I advise that sells a massively parallel DBMS based on PostgreSQL. I’ve been helping them with their recent launch of a MapReduce interface to their system. That’s been an interesting project. I’ll write about it more soon.
Along the way, they asked if I’d write a blog post for them about parallelism, SQL and MapReduce to put things into perspective. I sat down to write a few paragraphs on the subject and ended up with a seven-page essay. Too long for a blog post so I just turned it into a Tech Report. (a.k.a. a white paper in industrial terms). We excerpted it for GigaOm to run in a couple posts. The original is more nuanced and playful, but hey — blogging isn’t about 7-page essays. I’ll try to control myself here too, and stick with a few paragraphs per post. And if that causes me to write more tech reports, so be it — I’ll link them in.