It is always desirable to deliver a software product to the end user with
the least number of defects. For that purpose, it is important to know in
which components more effort should be spent, in other words which
components are more likely to contain defects. Such components should be
considered as "risky" and examined more carefully.
Our approach uses tokens derived from source code and aims to identify
relations between source code and the risk of components as measured by
post release failures.