Wednesday, September 14, 2022

CCMB Seminar: Andrew White, Ph.D.

4:00 PM to 5:00 PM

Forum Hall, 4th Floor, Palmer Commons Building

"Deep learning for sequence design with a few data points"

Abstract

Deep learning has begun a renaissance in chemistry and materials. We can devise and fit models to predict molecular properties in a few hours and deploy them in a web browser. We can create novel generative models that were previously PhD theses in an afternoon. In my group, we’re exploring deep learning in peptides. We are focused on two major problems: interpretability and data scarcity. Now that we can make deep learning models to predict any molecular property ad naseum, what can we learn? I will discuss our recent efforts on interpreting deep learning models through symbolic regression and counterfactuals. Data scarcity is a common problem in biochemistry: how can we learn new properties without significant expense of experiments? One method is in judicious choice of experiments, which can be done with active learning. Another approach is self-supervised learning and constraining symmetries, which both try to exploit structure in data. I will cover recent progress in these areas.

Andrew White

Andrew D. White

Associate Professor of Chemical Engineering (University of Rochester)

My group uses experiments, molecular simulations, and machine-learning to design new materials. Experiments answer the essential question of if and how well a material works for a particular application. Molecular simulation provides the molecular insight into why a material works. Machine-learning provides the tool to optimize a material so that it works best. Members of my group apply these three techniques to craft new materials for biomedical devices and lithium ion batteries. One of the main class of materials we study is peptides, which are derived from the constituent amino acids that make up proteins. Peptides have a great chemical diversity yet can be controlled on the near atomic scale.