Skip to main content
Engineering, Backend

CRISP: Critical Path Analysis for Microservice Architectures

November 18, 2021 / Global
Featured image for CRISP: Critical Path Analysis for Microservice Architectures
Image
Figure 1: Computation DAG formed in parallel program execution. The nodes represent weighted units of serial work and edges represent dependencies. The longest weighted path through the DAG is called the critical path. The parallel computation cannot run any faster than the length of the critical path. Speeding up the execution time must reduce the critical path length.
Image
Figure 2: Large, industry-scale microservice interactions result in complex call graphs (above) and timelines (below) when traced and viewed through Jaeger UI.
Image
Image
Figure 3: Jaeger microservice trace time visualized as a parallel computing DAG. The nodes of the DAG are the horizontal boxes elongated to show their execution length. The critical path through the DAG is represented in red color.
Image
Figure 4: An example of how to find the critical path with Jaeger trace with labeling.
Image
Figure 5: A Jaeger trace where the child span has small overlap with its adjacent sibling span.
Image
Figure 6: A Jaeger trace where the child span overflows the end time of its parent span.
Image
Figure 7: Children spans of Figure 6 are truncated to the finish time of the parent.
Image
Figure 8: Flamegraph showing a breakdown of services on the critical path. This visualization is easier to digest and investigate compared to the default call graphs and timeline views of Jaeger.
Image
Figure 9: A critical path summary view (HTML) consisting of an interactive heat map and flame graphs.
Image
Figure 10: Violin plots of the exclusive execution time of a critical path operation “Op-M” on two different CPUs. The slower clock CPU-A results in ~15% higher latency compared to the faster clock CPU-B. Moreover, the slower-clock CPU-A results in 1.5x higher tail latency.
Milind Chabbi

Milind Chabbi

Milind Chabbi is a Staff Researcher in the Programming Systems Research team at Uber. He leads research initiatives across Uber in the areas of compiler optimizations, high-performance parallel computing, synchronization techniques, and performance analysis tools to make large, complex computing systems reliable and efficient.

Chris Zhang

Chris Zhang

Chris Zhang is a Software Engineer at Uber. He interned at Uber previously on the Programming System team and worked on GOCC and Critical Path Analysis of Microservice Traces. His research interests include computer architecture, compiler, and JIT optimization.

Murali Krishna Ramanathan

Murali Krishna Ramanathan

Murali Krishna Ramanathan is a Senior Staff Software Engineer and leads multiple code quality initiatives across Uber engineering. He is the architect of Piranha, a refactoring tool to automatically delete code due to stale feature flags. His interests are building tooling to address software development challenges with feature flagging, automated code refactoring and developer workflows, and automated test generation for improving software quality.

Posted by Milind Chabbi, Chris Zhang, Murali Krishna Ramanathan