July 3, 2018
Nuno Machado
Developers typically rely on log data to reason about the runtime behavior of distributed systems. Unfortunately, the inherently distributed nature and complexity of such systems often leads to multiple independent logs, scattered across different physical machines, with thousands or millions of entries poorly causally-related. This renders log analysis a tedious, time-consuming, and potentially inconclusive task.
In this talk, I will present Falcon, a tool aimed at making the analysis of distributed system logs practical and effective. Falcon is able to seamlessly combine several distinct logging sources and generate a visual space-time diagram of a distributed execution without requiring custom instrumentation. To preserve event causality, even in the presence of data collected from independent unsynchronized machines, Falcon introduces a novel happens-before symbolic formulation and relies on an off-the-shelf constraint solver to obtain a coherent event schedule. Our case study with the popular distributed coordination service Apache Zookeeper shows that Falcon eases the analysis of complex distributed protocols and is helpful in bridging the gap between protocol design and implementation.