Discovery of Chemical Probes through Structure-based Virtual Screening of Vast Compound Databases

Abstract: Bioactive molecules have traditionally been discovered through labor-intensive screening methods in which individual compounds are tested against specific protein targets or cells to identify those that produce the desired biological effect. However, these approaches have significant limitations. Firstly, the number of molecules that can be tested in a standard laboratory is restricted, and the acquisition and curation of these compounds come at a high cost. Secondly, these methods are time-consuming because each compound must be tested individually, and they are confined to small libraries with very limited chemical space coverage. In contrast, structure-based virtual screening can rapidly predict a molecule's interaction with a target protein, allowing for the evaluation of enormous libraries of chemical substances. Furthermore, this approach is not restricted to physically available molecules and can be extended to virtual compounds. Commercial chemical space has recently grown exponentially and currently contains several billion molecules that can be readily synthesized and delivered for experimental testing within weeks. Despite the enormous potential of these databases for drug discovery, they also pose new challenges, and development of effective strategies is required to explore ultralarge libraries. The goal of this thesis was to develop and apply novel strategies focused on exploring the potential of ultralarge chemical libraries using structure-based virtual screening. Publication I summarizes best practices on large-scale virtual screening and benchmarking protocols for molecular docking calculations. Publication II describes a docking screen of several hundred million lead-like molecules against the SARS-CoV-2 main protease, leading to promising starting points for development of coronavirus inhibitors. The binding modes predicted by docking were confirmed experimentally by X-ray crystallography. After several rounds of optimization, nanomolar broad-spectrum inhibitors with antiviral effects against coronaviruses in cell models were discovered. Manuscript III demonstrates how machine learning can be used to accelerate virtual screening campaigns. Classification models were trained on docking scores to identify promising molecules in ultralarge libraries relevant to the protein target of interest. The classification algorithms were able to reduce a multi-billion-scale library to a subset of high-confidence candidates with improved docking scores. Manuscript IV focuses on large-scale fragment docking to identify compounds binding to 8-oxoguanine glycosylase 1 and how to efficiently optimize them to potent inhibitors. The docking scoring function was able to correctly predict binding modes of the experimental hits and optimization led to submicromolar inhibitors with anti-inflammatory and anti-cancer effects in cell models. Publication V presents how docking of tailored virtual libraries of nature-inspired macrocycles led to potent disruptors of the KEAP1-Nrf2 complex. The results of this thesis highlight that large-scale virtual screening is a resourceful tool to discover ligands of a wide variety of drug targets.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)