Unsupervised Learning of Biomolecular Dynamics with Multi-Modal Data

Abstract: The functioning of cells critically depends on the dynamics of biomolecular systems, such as proteins or nucleic acids. Biophysical experiments as well as Molecular Dynamics (MD) simulations are the primary techniques to model and understand the kinetics and thermodynamics of biomolecules. Despite their shared focus on molecular dynamics, their results often yield differing conclusions due to computational or observational limitations. Combining the two approaches in multi-modal models leads to a more accurate kinetic and thermodynamic understanding of the systems by compensating for their respective weaknesses. However, this integration presents its own set of challenges due to the differences in resolution and timescales between experimental data and MD simulations. In this thesis, we explore the reconciliation of simulation data with experimental evidence as well as the potential of machine learning (ML) to alleviate some of MD’s fundamental problems. By incorporating experimental constraints, we demonstrate how integrative kinetic models are more accurate with respect to the “true” ensemble while retaining atomic-level detail. Additionally, we discuss ML’s evolving role in the analysis of MD simulation and its potential as an independent method for sampling molecular conformations. The work concludes by highlighting current limitations and future directions for these integrative approaches and proposes potential remedies for ML models to achieve enhanced accuracy and generalizability across different chemical spaces, physical conditions, and timescales. These approaches offer the potential to provide deeper insights into the complex dynamics of biomolecules, which has profound implications for drug design and our understanding biological processes.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)