Automatic Collection Selection using Machine Learning

Abstract: Most recent programming languages include a collection framework as part of their standard library (or runtime). Examples are Java, C#, Python and Ruby. The Java Collection Framework provides a number of collection classes, some of which implement the same abstract data type, which makes them interchangeable. Developers can therefore choose between several functionally equivalent options. Since collections have different performance characteristics, and may be allocated in thousands of programs locations, the choice of collection has an important impact on performance. Unfortunately, programmers often make sub-optimal choices when picking their collections.In this licentiate thesis, we consider the problem of building automated tooling, which would help the developer choose among several collection implementations. We consider an existing tool called Brainy, which targets C++, and adapt it to the Java context. In doing so, we investigate how to synthesize benchmarks and analyze their behavior to create training data for automated classification. We propose one new generative model for collection benchmarks and present the challenges that porting JBrainy to Java entails. Finally, we compare JBrainy's suggestions versus greedy search, on five well known benchmarks. Our investigations show that JBrainy's suggestions were almost as effective than those of greedy search in minimizing the running time of programs. However, we also find that Brainy's benchmark synthesis methods do not apply well to the Java context, since they introduce some significant biases.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)