Intelligent data acquisition for drug design through combinatorial library design

Abstract: A problem that occurs in machine learning methods for drug discovery is a need for standardized data. Methods and interest exist for producing new data but due to material and budget constraints it is desirable that each iteration of producing data is as efficient as possible. In this thesis, we present two papers methods detailing different problems for selecting data to produce. We invest- igate Active Learning for models that use the margin in model decisiveness to measure the model uncertainty to guide data acquisition. We demonstrate that the models perform better with Active Learning than with random acquisition of data independent of machine learning model and starting knowledge. We also study the multi-objective optimization problem of combinatorial library design. Here we present a framework that could process the output of gener- ative models for molecular design and give an optimized library design. The results show that the framework successfully optimizes a library based on molecule availability, for which the framework also attempts to identify using retrosynthesis prediction. We conclude that the next step in intelligent data acquisition is to combine the two methods and create a library design model that use the information of previous libraries to guide subsequent designs.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)