ML & Molecules Reading Group

Machine learning and AI have transformed scientific data analysis. But the experiments that generate data are usually taken as given. Better algorithmic control of scientific experiments could lead to better learning, enabling the next generation of scientific AI.

This semester, we will study the fundamental theory of experimental design. We’ll read foundational statistical ideas about how to quantify the amount of information an experiment provides. We’ll study emerging ML methods and algorithms for optimizing experimental designs. And we’ll look at how these ideas are beginning to be applied in science, with a focus on protein design and engineering. Overall, our aim is to provide a background for researchers interested in advancing the frontier of ML-guided experimentation.

Thursdays 1-2:30pm, Room 207-222 (DTU Chemistry). First meeting: September 25.

Please sign up here for the mailing list (scheduling announcements, etc.)

Reading list:

  1. Introduction and overview.
    Rainforth et al. 2024. Modern Bayesian Experimental Design https://projecteuclid.org/journals/statistical-science/volume-39/issue-1/Modern-Bayesian-Experimental-Design/10.1214/23-STS915.short

  2. Lindley’s foundational paper.
    D.V. Lindley. 1956. On a Measure of the Information Provided by an Experiment. https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-27/issue-4/On-a-Measure-of-the-Information-Provided-by-an-Experiment/10.1214/aoms/1177728069.full

  3. Experimental design for protein design.
    Romero et al. 2012. Navigating the protein fitness landscape with Gaussian processes. https://www.pnas.org/doi/10.1073/pnas.1215251110

  4. Advances in experimental design for proteins.
    Frey et al. 2025. Lab-in-the-loop therapeutic antibody design with deep learning. https://www.biorxiv.org/content/10.1101/2025.02.19.639050v2

  5. Optimizing designs with stochastic gradients.
    Foster et al. 2020. A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments. https://proceedings.mlr.press/v108/foster20a.html

  6. Building training datasets with large scale screens.
    Krishnan et al. 2025. A generative deep learning approach to de novo antibiotic design. https://www.cell.com/cell/abstract/S0092-8674(25)00855-4

  7. Optimal information gathering for prediction.
    Smith et al. 2023. Prediction Oriented Bayesian Active Learning. https://proceedings.mlr.press/v206/bickfordsmith23a/bickfordsmith23a.pdf

  8. Compressed screens.
    Liu et al. 2024. Scalable, compressed phenotypic screening using pooled perturbations. https://www.nature.com/articles/s41587-024-02403-z

  9. Stochastic variational design for compressed experiments.
    Grover and Ermon. Uncertainty Autoencoders: Learning Compressed Representations via Variational Information Maximization. https://proceedings.mlr.press/v89/grover19a.html

  10. Optimal nonparametric design and theory.
    Huszar and Duvenaud. 2012. Optimally-Weighted Herding is Bayesian Quadrature. https://www.auai.org/uai2012/papers/213.pdf

About ML & Molecules reading group: ML & Molecules is a new reading group organized by Asst. Prof. Eli Weinstein (DTU Chemistry) and Assoc. Prof. Jes Frellsen (DTU Compute). It focuses on fundamental probabilistic machine learning and its intersection with the molecular sciences. Everyone in the group presents a paper, in a rotating schedule. It is open to students, postdocs, faculty and staff. It is not a formal course, but can optionally be taken as a project course, for which you will receive course credit.

Contact: enawe [at] dtu.dk