Center for Scalable Application Development Software

Sections

Personal tools

You are here: Home → Publications → Diagnosing Distributed Systems with Self-Propelled Instrumentation

Log in: Login Name

Password

Cookies are not enabled. You must enable cookies before you can log in.; Forgot your password?

Document Actions

A.V. Mirgorodskiy and B.P. Miller (2008)

Diagnosing Distributed Systems with Self-Propelled Instrumentation

Lecture Notes in Computer Science , 5346.

Abstract

We present a three-part approach for diagnosing bugs and performance problems in production distributed environments. First, we introduce a novel execution monitoring technique that dynamically injects a fragment of code, the agent, into an application process on demand. The agent inserts instrumentation ahead of the control flow within the process and propagates into other processes, following com- munication events, crossing host boundaries, and collecting a distributed function-level trace of the execution. Second, we present an algorithm that separates the trace into user-meaningful activities called flows. This step simplifies manual examination and enables automated analysis of the trace. Finally, we describe our automated root cause analysis tech- nique that compares the flows to help the analyst locate an anomalous flow and identify a function in that flow that is a likely cause of the anomaly. We demonstrate the effectiveness of our techniques by diagnos- ing two complex problems in the Condor distributed scheduling system

URL ftp://ftp.cs.wisc.edu/paradyn/papers/Mirgorodskiy08DistDiagnosis.pdf

by Jennifer Harris — last modified 2009-04-20 19:32

CScADS Collaborators include: