P. Beckman, K. Iskra, K. Yoshii, S. Coghlan, and A. Nataraj (2008)
Benchmarking the Effects of Operating System Interference on Extreme-Scale Parallel Machines
Cluster Computing, 11(1):3-16.
We investigate operating system noise, which we identify as one of the main reasons for a
lack of synchronicity in parallel applications. Using a microbenchmark, we measure the noise
on several contemporary platforms and find that, even with a general-purpose operating system,
noise can be limited if certain precautions are taken. We then inject artificially generated
noise into a massively parallel system and measure its influence on the performance of collective
operations. Our experiments indicate that on extreme-scale platforms, the performance is
correlated with the largest interruption to the application, even if the probability of such an interruption
on a single process is extremely small.We demonstrate that synchronizing the noise
can significantly reduce its negative influence.