Continuous Profiling: Where have all the cycles gone?

Monika Rauch Henzinger
DEC-Research, Palo Alto
Monday, 25 August 97
45 - FB14


Processors are getting faster (600 MHz and climbing) and issue widths
are increasing (4- and 8-way becoming common), yet application
performance is not keeping pace. On large commercial applications,
average CPI (cycles-per-instruction) numbers may be as high as 4 or 5.
With 8-way issue, a CPI of 5 means that only 1 issue slot in every 40
is being put to good use!

It is common to blame such problems on memory, and in fact many
applications spend many cycles waiting for memory. But other problems
-- e.g., branch mispredicts -- also waste cycles, and independent of
the general causes, if one hopes to improve the performance of aparticular application, one needs to know which instructions are
stalling and why.

The Digital Continuous Profiling Infrastructure provides an efficient
and accurate way of answering such questions. It samples various
events (cycles, imisses, branch mispredicts, etc.) at a high rate
(every 62K cycles, or about 5200 samples per second on a 333-MHz
processor). Samples are then processed by a suite of analysis tools
that accurately characterize where the time is being spent in complex
workloads, from the fraction of cycles spent in each executable image
to the CPI for each instruction and the reasons for any static or
dynamic stalls.

In this talk I will focus on the design of our profiling system,
including the techniques used to achieve efficient data collection and
the analysis methods that determine the CPI for each instruction. I
will also discuss our plans for building profile-driven optimizations.


(This is joint work with Jennifer Anderson (DEC WRL), Lance Berc, Jeff
Dean (DEC WRL), Sanjay Ghemawat, Shun-Tak Leung, Dick Sites (Adobe),
Mark Vandevoorde, Carl Waldspurger, and Bill Weihl.)

Interessenten sind zum Vortrag herzlich eingeladen.

Die Dozenten des Fachbereichs Informatik

