Choreographies meet Communication Failures
The paradigm of choreographic programming promises a simple approach to the programming of concurrent and distributed systems: write the collective communication behaviour for a system of processes as a choreography, and then the local programs for these processes are automatically compiled by a provably-correct procedure known as endpoint projection. This promise has prompted substantial theoretical research, but to date the programming of protocols that can deal with realistic communication failures in a distributed network remains elusive.
We here present a preliminary report on work which proposes the first theory of choreographic programming that captures failures related to communication taken from the literature of distributed systems: processes can send or receive fewer messages than they should (send and receive omission), and the network can fail at transporting messages (omission failure). Inevitably, this realism introduces a sophisticated semantics where a choreography may have many possible executions depending on what failures occur at runtime. To deal with this complexity, we carefully design a language that supports two ways of recovering simplicity. First, arbitrarily-sophisticated recovery strategies can be wrapped in procedures and be offered as syntactic sugar. Developers can therefore choose to operate at a low level, where they can deal with failures directly, or at a high level where they use `off-the-shelf’ solutions. Second, we offer a static analysis that guarantees a choreography to be robust wrt failures. A robust choreography implements at-most-once delivery regardless of any failure, and exactly-once delivery in the absence of omission failures.