Personal tools
You are here: Home Publications Diagnosing Distributed Systems with Self-Propelled Instrumentation
Document Actions

A.V. Mirgorodskiy and B.P. Miller (2008)

Diagnosing Distributed Systems with Self-Propelled Instrumentation

Lecture Notes in Computer Science , 5346.

We present a three-part approach for diagnosing bugs and performance problems in production distributed environments. First, we introduce a novel execution monitoring technique that dynamically injects a fragment of code, the agent, into an application process on demand. The agent inserts instrumentation ahead of the control flow within the process and propagates into other processes, following com- munication events, crossing host boundaries, and collecting a distributed function-level trace of the execution. Second, we present an algorithm that separates the trace into user-meaningful activities called flows. This step simplifies manual examination and enables automated analysis of the trace. Finally, we describe our automated root cause analysis tech- nique that compares the flows to help the analyst locate an anomalous flow and identify a function in that flow that is a likely cause of the anomaly. We demonstrate the effectiveness of our techniques by diagnos- ing two complex problems in the Condor distributed scheduling system

by Jennifer Harris last modified 2009-04-20 19:32
« April 2018 »
Su Mo Tu We Th Fr Sa

Powered by Plone

CScADS Collaborators include:

Rice University ANL UCB UTK WISC