Navigation
Log in


Forgot your password?
 
Document Actions

Brim-abstract

by John Mellor-Crummey last modified 2011-08-01 14:27

Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

Michael Brim (University of Wisconsin)
John DelSignore (Rogue Wave Software)

ABSTRACT

A common requirement among various tools and middleware is performing process control and inspection on a distributed group of processes. In prior work, we introduced group file operations, a simple, intuitive interface to scalable group operations on distributed files that can be easily adopted by existing tools and middleware or used to create new scalable tools. Group file operations avoid the linear costs typically associated with dealing with a large file space by extending existing file system abstractions and operations with group semantics that eliminate iterative access. We also developed the TBON-FS distributed file system that leverages a tree-based overlay network to provide scalable group operations on files from thousands of independent file servers. Recently, we have developed proc++, a new synthetic file system that provides control and inspection of groups of processes and threads. In this talk, we report on our ongoing effort to use group file operations, TBON-FS, and proc++ to improve the scalability of TotalView, the most widely used commercial debugger for HPC systems. We report the performance benefits achieved when using the modified debugger on parallel applications with up to 49,152 processes on a Cray XT5 system, and discuss our observations on building tools that can scale for use on upcoming systems containing millions of processor cores and (potentially) billions of debugging targets.

« April 2018 »
Su Mo Tu We Th Fr Sa
1234567
891011121314
15161718192021
22232425262728
2930
 

Powered by Plone

CScADS Collaborators include:

Rice University ANL UCB UTK WISC