It’s been about 6 years now that we’ve been working on declarative programming for distributed systems — starting with routing protocols, then network overlays, query optimizers, sensor network stacks, and more recently scalable analytics and consensus protocols.
Through that time, we’ve struggled to find a useful middle ground between the pure logic roots of classical declarative languages like Datalog, and the practical needs of real systems managing state across networks. Our compromises over the years allowed us to move forward, build real things, and learn many lessons. But they also led to some semantic confusion — as noted in papers by colleagues at Max Planck and AT&T.
Well, no more. We recently released a tech report on Dedalus, a formal logic language that can serve as a clean foundation for declarative programming going forward. The Dedalus work is fairly theoretical, but having tackled it we’re in a strong position to define an approachable and appealing language that will let programmers get their work done in distributed environments. That’s the goal of our Bloom language.
The key insight in Dedalus is roughly this:
Time is essential; space is a detail.
Said differently, the important aspect of distributed programs is not that they run on physically dispersed hardware, but rather that the programmer needs to think carefully about the time-oriented issues that result –particularly delays and atomicity. By focusing on time as a main consideration in the language, we enable and encourage the programmer to focus on the semantic issues that affect correctness. Issues of spatial layout are set aside where they belong: as performance details that need to be addressed as part of tuning, or managed by an optimizer.
This outlook allowed us to solve the two key semantic problems in our prior work:
- modeling persistence and state update: it may surprise some people that Datalog, the favored language of database theory, had no way to model updates and deletes. By incorporating time into the Dedalus logic, updates and deletes can be captured as deduction of logically timestamped (versioned) data.
- modeling delay and failure: using a formal notion of non-determinism in logic, we capture network delay and component failure as non-determinism in the derivation of timestamp. More simply, we have the programmer distinguish between state that is derived now, next, or asynchronously. Asynch derivations may be delayed arbitrarily — even infinitely, in the case of failure — and the program semantics capture that possibility formally.
Having incorporated these issues into a formal logic, we are able to extend years of theoretical research on Datalog to the case of live, distributed systems that evolve over time. We are also hopeful that we can use this as a “PODS-PODC” bridge: a linkage between database theory and distributed systems theory.
If you’ve read this far, please read the tech report. And let us know your thoughts.