As mentioned in my previous post, Peter Alvaro turned in his PhD thesis a month back, and is now in full swing as a professor at UC Santa Cruz. In the midst of that nifty academic accomplishment, he succeeded in taking the last chapter of his thesis from our BOOM project out of the ivory tower and into use at Netflix. Peter’s collaborators at Netflix recently blogged about the use of his ideas alongside their (very impressive) testing infrastructure, famously known as ChaosMonkey, a.k.a. the Netflix Simian Army. This also generated some press, vaguely inappropriate headline and all. Their work raised the flag on five potential failure scenarios in a single request interface. That’s nice. Even nicer is what would have happened without the ideas from Peter’s research:
Brute force exploration of this space would take 2^100 iterations (roughly 1 with 30 zeros following), whereas our approach was able to explore it in ~200 experiments.
It always feels good when your good ideas carve 28 zeroes of performance off the end of standard practice.
I encourage you to read the Netflix blog post, as well as Peter’s paper on the research prototype, Molly. Or you can watch his RICON 2015 keynote on the subject.
It’s been a while since I’ve taken the time to write a blog post here. If there’s one topic that deserves a catchup post in the last few months, it’s the end of an era for my former students Peter Alvaro and Peter Bailis—henceforth Professor Peter A of UC Santa Cruz, and Professor Peter B of Stanford. Each of them officially turned in their dissertation in December. Both spanned an impressive range of theory, practice and genuine applicability. There’s tons of good stuff in each document. Here’s a bit of an overview with references for those of you who might not be diving in to read them cover-to-cover.
Peter Alvaro was a pillar of my BOOM research project from its inception. His thesis is entitled Data-centric Programming for Distributed Systems, and it covers a beautiful arc of results:
- It starts with his insightful exploration of commit and consensus protocols in a declarative language, and his collaboration on the BOOM Analytics work that build a ridiculously high-function HDFS clone in ridiculously few lines of code and hours of developer time
- It also includes his foundational design of the Dedalus logic for distributed programming that has become a touchstone for the database theory community, in addition to our team
- and his contributions to the Bloom language including the core semantics and many pragmatic features
- It covers in depth his work on the Blazes system for analyzing eventual consistency at the level of program semantics both for Bloom and for dataflow languages like Storm, and automatically synthesizing coordination code where needed including a high-performance solution called sealing
- and finally it presents his work on Lineage Driven Fault Injection (LDFI) and the Molly prototype, which extracted new benefits from declarative programming in large-scale testing, and was recently adapted for use at Netflix.
The thesis leaves out a bunch of additional work he did at Berkeley, including contributions to the much-cited MapReduce Online effort, and his work on distributed system testing with BloomUnit. But what I’ll remember most from his graduate years is the team-teaching we did on Programming the Cloud, where we used our work on Bloom to get undergraduates learning the fundamentals of distributed systems via live coding. This was without question the most creative and audacious teaching I’ve been involved with, and it worked surprisingly well thanks in large part to Peter’s hard work and more importantly his warm and thoughtful spirit. I’m excited to see Peter A teaching it again this coming quarter at UC Santa Cruz.
Peter Bailis’ thesis is called Coordination Avoidance in Distributed Databases, and it’s a timely tour de force of fertile ideas found in what many considered a picked-over wasteland—transaction processing. Peter’s thesis includes a range of big ideas married to practical observations, including:
- An empirical level-set on the costs of coordination in modern distributed databases.
- The notion of Invariant Confluence, which attacks the distributed database problem of consistency without coordination by taking Church-Rosser graphs and applying them to databases with invariants.
- An analysis of Invariant Confluence in the wild, via mining Github repos with Ruby on Rails apps to determine how “real” programmers tradeoff application constraints and database constraints. Not only did Peter do the legwork here to understand what programmers do, he brought it home to force us all to ask the questions of why they do what they do, and how the push and pull of technical communities can lead to better outcomes.
- A new and very sensible (if you’re into that kind of thing) weak isolation level for transactions called Read Atomic, with a range of efficient implementations for distributed systems via RAMP protocols.
Peter B’s thesis also leaves out a range of important work he did at Berkeley, including the popular PBS statistical-empirical explanation of why NoSQL stores seem to work, his bolt-on causal consistency work and analysis, the initial design of the Velox model-serving system with colleagues in the AMPLab, and his popular shaming of the SQL transaction world by exposing how few SQL systems provide ACID transactions by default (or at all). I remember with gratitude how Peter took on half the work of teaching graduate databases at Berkeley (the first offering in years!) while I was deeply involved in running Trifacta. And finally, it has been a bracing dose of research and academic politics having him join in the latest edition of Readings in Database Systems; he did it with grace and intelligence.
Without question the best part of teaching at Berkeley is the students you get to work with. Peter & Peter: it has been a great pleasure. I suspect that being colleagues could be even more fun. Good to have you both still in town!