Benchmarking Points-to Analysis
Abstract: Points-to analysis is a static program analysis that, simply put, computes which objects created at certain points of a given program might show up at which other points of the same program. In particular, it computes possible targets of a call and possible objects referenced by a field. Such information is essential input to many client applications in optimizing compilers and software engineering tools.Comparing experimental results with respect to accuracy and performance is required in order to distinguish the promising from the less promising approaches to points-to analysis. Unfortunately, comparing the accuracy of two different points-to analysis implementations is difficult, as there are many pitfalls in the details. In particular, there are no standardized means to perform such a comparison, i.e, no benchmark suite - a set of programs with well-defined rules of how to compare different points-to analysis results - exists. Therefore, different researchers use their own means to evaluate their approaches to points-to analysis. To complicate matters, even the same researchers do not stick to the same evaluation methods, which often makes it impossible to take two research publications and reliably tell which one describes the more accurate points-to analysis.In this thesis, we define a methodology on how to benchmark points-to analysis. We create a benchmark suite, compare three different points-to analysis implementations with each other based on this methodology, and explain differences in analysis accuracy.We also argue for the need of a Gold Standard, i.e., a set of benchmark programs with exact analysis results. Such a Gold Standard is often required to compare points-to analysis results, and it also allows to assess the exact accuracy of points-to analysis results. Since such a Gold Standard cannot be computed automatically, it needs to be created semi-automatically by the research community. We propose a process for creating a Gold Standard based on under-approximating it through optimistic (dynamic) analysis and over-approximating it through conservative (static) analysis. With the help of improved static and dynamic points-to analysis and expert knowledge about benchmark programs, we present a first attempt towards a Gold Standard.We also provide a Web-based benchmarking platform, through which researchers can compare their own experimental results with those of other researchers, and can contribute towards the creation of a Gold Standard.
This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.