New for: D1, D2, D3, D4, D5
It is common to blame such problems on memory, and in fact many
applications spend many cycles waiting for memory. But other problems
-- e.g., branch mispredicts -- also waste cycles, and independent of
the general causes, if one hopes to improve the performance of aparticular application, one needs to know which instructions are
stalling and why.
The Digital Continuous Profiling Infrastructure provides an efficient
and accurate way of answering such questions. It samples various
events (cycles, imisses, branch mispredicts, etc.) at a high rate
(every 62K cycles, or about 5200 samples per second on a 333-MHz
processor). Samples are then processed by a suite of analysis tools
that accurately characterize where the time is being spent in complex
workloads, from the fraction of cycles spent in each executable image
to the CPI for each instruction and the reasons for any static or
dynamic stalls.
In this talk I will focus on the design of our profiling system,
including the techniques used to achieve efficient data collection and
the analysis methods that determine the CPI for each instruction. I
will also discuss our plans for building profile-driven optimizations.