ML and Molecules Reading Group

Machine learning and AI have transformed scientific data analysis. But the experiments that generate data are usually taken as given. Better algorithmic control of scientific experiments could lead to better learning, enabling the next generation of scientific AI.

This semester, we will study the fundamental theory of experimental design. We’ll read foundational statistical ideas about how to quantify the amount of information an experiment provides. We’ll study emerging ML methods and algorithms for optimizing experimental designs. And we’ll look at how these ideas are beginning to be applied in science, with a focus on protein design and engineering. Overall, our aim is to provide a background for researchers interested in advancing the frontier of ML-guided experimentation.

Thursdays 1-2:30pm, Room 207-222 (DTU Chemistry). First meeting: September 25.

Please sign up here for the mailing list (scheduling announcements, etc.)

Reading list:

Introduction and overview.
Rainforth et al. 2024. Modern Bayesian Experimental Design https://projecteuclid.org/journals/statistical-science/volume-39/issue-1/Modern-Bayesian-Experimental-Design/10.1214/23-STS915.short
Lindley’s foundational paper.
D.V. Lindley. 1956. On a Measure of the Information Provided by an Experiment. https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-27/issue-4/On-a-Measure-of-the-Information-Provided-by-an-Experiment/10.1214/aoms/1177728069.full
Experimental design for protein design.
Romero et al. 2012. Navigating the protein fitness landscape with Gaussian processes. https://www.pnas.org/doi/10.1073/pnas.1215251110
Advances in experimental design for proteins.
Frey et al. 2025. Lab-in-the-loop therapeutic antibody design with deep learning. https://www.biorxiv.org/content/10.1101/2025.02.19.639050v2
Optimizing designs with stochastic gradients.
Foster et al. 2020. A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments. https://proceedings.mlr.press/v108/foster20a.html
Building training datasets with large scale screens.
Krishnan et al. 2025. A generative deep learning approach to de novo antibiotic design. https://www.cell.com/cell/abstract/S0092-8674(25)00855-4
Optimal information gathering for prediction.
Smith et al. 2023. Prediction Oriented Bayesian Active Learning. https://proceedings.mlr.press/v206/bickfordsmith23a/bickfordsmith23a.pdf
Compressed screens.
Liu et al. 2024. Scalable, compressed phenotypic screening using pooled perturbations. https://www.nature.com/articles/s41587-024-02403-z
Stochastic variational design for compressed experiments.
Grover and Ermon. Uncertainty Autoencoders: Learning Compressed Representations via Variational Information Maximization. https://proceedings.mlr.press/v89/grover19a.html
Optimal nonparametric design and theory.
Huszar and Duvenaud. 2012. Optimally-Weighted Herding is Bayesian Quadrature. https://www.auai.org/uai2012/papers/213.pdf

About ML & Molecules reading group: ML & Molecules is a new reading group organized by Asst. Prof. Eli Weinstein (DTU Chemistry) and Assoc. Prof. Jes Frellsen (DTU Compute). It focuses on fundamental probabilistic machine learning and its intersection with the molecular sciences. Everyone in the group presents a paper, in a rotating schedule. It is open to students, postdocs, faculty and staff. It is not a formal course, but can optionally be taken as a project course, for which you will receive course credit.

Contact: enawe [at] dtu.dk