Understanding the World Through Code

Funded through the NSF Expeditions in Computing Program

Organic Chemistry

Many functions of small molecules and proteins depend on their 3D structure. Deep learning techniques to predict these 3D structures, such as AlphaFold, are revolutionizing the field of structural biology. Meanwhile, the prediction of binding structures has seen less progress than protein folding due to the limited data and the enormous space of combinatorial binding structure possibilities. To provide effective solutions for these problems it is, therefore, crucial to use scientific insights to build the right degrees of freedom and symmetries into the models. With this in mind, we have developed a novel SE(3)-equivariant approach to detect binding pockets and superimpose them using deep learning and the Kabsch algorithm, with which binding structures can be predicted very fast. The application of this technique to the problems of protein-protein and protein-small molecule binding has led to the development of EquiDock ganea2021independent-ours and EquiBind stark2022equibind-ours. These two methods have shown competitive performances with established physics-based methods while being orders of magnitude faster. These results promise great potential for large-scale screening applications. However, most biological structures are not rigid but present a level of flexibility. Modeling the whole distribution of possible molecular poses and, therefore, molecular flexibility is a very challenging task that the scientific community has tried to solve for decades due to its critical applications in understanding many biological mechanisms and developing new drugs. With this goal in mind, we started by modeling the structural flexibility of small molecules and developed torsional diffusion jing2022torsional-ours, a deep generative model that can generate conformations by iteratively refining its position over this torsional manifold. Intuitively, our model incorporates chemical and physical knowledge by reasoning about the structure in 3D space where forces operate, while restricting the diffusion to torsion angles where most of the molecule’s flexibility lies. Empirically, torsional diffusion generates superior conformer ensembles compared to previous machine learning methods and is the first ML framework to outperform state-of-the-art cheminformatics methods. We are now extending this work to model larger and more complex molecular systems. Finally, we have also designed novel deep generative models to directly design new materials xie2021crystal-ours and proteins trippe2022diffusion-ours. Protein design relies on building techniques that learn mechanisms of protein function through the chemistry and physics of binding and folding then building new proteins for novel functions. We will evaluate our ability to design proteins with structures and functions never seen before in nature with experimental validation from our collaborations.