I attended the Greenplum customer advisory board meeting this week, including a public briefing in San Francisco for analysts and potential customers. The Greenplum folks asked me to speak at the briefing about parallelism and analytics in the large, outside the scope of Greenplum per se. I cooked up a little slide deck for the occasion on why and whither parallelism and analytics. A familiar story about how the future is parallel, and the practical future is dataflow parallelism. (Familiar yes, but with some nice Flickr clip-art and approachable analogies to explain it.)
The big aha moment occured for me during our panel discussion, which included Luke Lonergan from Greenplum, Roger Magoulas from O’Reilly, and Brian Dolan from Fox Interactive Media (which runs MySpace among other web properties).
Roger talked about using MapReduce to extract structured entities from text for doing tech trend analyses from billions of rows of online job postings. Brian (who is a mathematician by training) was talking about implementing conjugate gradiant and Support Vector Machines in parallel SQL to support “hypertargeting” for advertisers. I mentioned how Jonathan Goldman at LinkedIn was using SQL and MapReduce to do graph algorithms for social network analysis.
And it occurred to me — this is definitely not your mom’s database workload. This isn’t even recognizably data warehousing or “Business Intelligence”. There was no mention of OLAP or cubes or drill-downs during the panel. These guys are doing pretty sophisticated algorithmics. And they’re implementing it atop a parallel database not because the database is a jail they’re locked inside of, but because the database provides the right data-centric parallel compute platform.
As I listened to them, it struck me that there’s a real opportunity for the high-end database vendors, along with the Hadoop boosters, to turn the corner from being mere data managers, to being the high-end programming environment for all tasks that scale. High-end computing is data-centric, and if you can make the algorithmicists productive at that scale, you win.