Understanding the World Through Code

Funded through the NSF Expeditions in Computing Program

Organic Chemistry

One of the seminal problems in organic chemistry is to be able to engineer molecules with particularly desirable properties, or at the very least to predict the properties that a particular molecule is going to have. In recent years, deep learning has been making tremendous strides in both of these problems. Modern techniques developed by the team of co-PIs Barzilay and Jaakkola have demonstrated remarkable abilities in both property prediction and molecular optimization. However, deep learning approaches also have some important limitations: First, they are extremely data intensive, limiting their application to domains where very large amounts of data are available. Second, they operate as a black box; from a scientific standpoint, we would like to abstract specific substructures or functional group descriptions that caused a particular molecule to screen high on the desired property. The promise of neurosymbolic models is that they will be able to better incorporate expert knowledge. Models that are better able to incorporate expert knowledge would be useful in settings where data is difficult to gather.