Guarding the Boundary: Information Flow Tracking in the Presence of Libraries

Abstract: In modern software development, the use of libraries is prevalent. Libraries pose a big security challenge. How can we ensure that sensitive data is not being leaked through libraries? This is the first question of the thesis. We propose the use of information-flow control, by developing a principled approach for allowing information-flow tracking in libraries, even if they are written in a language not supporting information-flow control. With this approach, we allow for library functions to have unlabel and relabel models, explaining how values are unlabeled and relabeled when being marshaled between the labeled program and the library. These models are used in combination with lazy marshaling to handle structured data such as lists and records, higher-order functions and references. Modern browsers allow for browser modifications through browser extensions, which have special privileges and can, e.g., modify the DOM. As extensions can be intrusive, it is in a webpage's interest to know which extensions are installed in a browser. The second question of the thesis is if it is possible for a webpage to know which extensions are installed in the browser? We conduct a large-scale study to determine how many extensions that are detectable from a webpage based on the extension's resources, showing over 50% of the top 1000 Chrome extensions can be detected, as well as how many of the Alexa top 100,000 webpages employ the technique of the paper.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)