ML and Molecules Reading Group

Machine learning and AI have transformed scientific data analysis. But the experiments that generate data are usually taken as given. Better algorithmic control of scientific experiments could lead to better learning, enabling the next generation of scientific AI.

This semester, we will study the fundamental theory of experimental design. We’ll read foundational statistical ideas about how to quantify the amount of information an experiment provides. We’ll study emerging ML methods and algorithms for optimizing experimental designs. And we’ll look at how these ideas are beginning to be applied in science, with a focus on protein design and engineering. Overall, our aim is to provide a background for researchers interested in advancing the frontier of ML-guided experimentation.

Thursdays 1-2:30pm, Room 207-222 (DTU Chemistry). First meeting: September 25.

Please sign up here for the mailing list (scheduling announcements, etc.)

Reading list (subject to change).

Part 1: Introduction and Overview

A recent review. Rainforth et al. 2024. Modern Bayesian Experimental Design https://projecteuclid.org/journals/statistical-science/volume-39/issue-1/Modern-Bayesian-Experimental-Design/10.1214/23-STS915.short
Lindley’s foundational paper. D.V. Lindley. 1956. On a Measure of the Information Provided by an Experiment. https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-27/issue-4/On-a-Measure-of-the-Information-Provided-by-an-Experiment/10.1214/aoms/1177728069.full Recommended additional reading: Chaloner & Verdinelli https://projecteuclid.org/journals/statistical-science/volume-10/issue-3/Bayesian-Experimental-Design-A-Review/10.1214/ss/1177009939.full

Part 2: ML Methods

Stochastic gradient based design. Foster et al. 2020. A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments. https://arxiv.org/abs/1911.00294
Improved numerics and estimators. Goda et al. 2021. Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs. https://arxiv.org/abs/2005.08414
Incorporating generative models Iolla et al. 2025. Bayesian Experimental Design via Contrastive Diffusions. https://arxiv.org/abs/2410.11826
Kernel methods and statistical discrepancies. Huszár & Duvenaud. 2012. Optimally-weighted herding is Bayesian quadrature. https://arxiv.org/abs/1204.1664

Part 3: Emerging Applications

Foundational work on ML-guided protein design. Romero et al. 2012. Navigating the protein fitness landscape with Gaussian processes. https://www.pnas.org/doi/10.1073/pnas.1215251110
Recent large scale systems for protein therapeutics design. Frey et al. 2025. Lab-in-the-loop therapeutic antibody design with deep learning. https://www.biorxiv.org/content/10.1101/2025.02.19.639050v2

Part 4: Theory

G-designs and their generalizations Agarwal et al. 2024. The Non-linear F-Design and Applications to Interactive Learning https://openreview.net/pdf?id=MMMHufVc2v
Christoffel functions and their applications Adcock et al. 2023. CS4ML: A general framework for active learning with arbitrary data based on Christoffel functions. https://arxiv.org/pdf/2306.00945

About ML & Molecules reading group: ML & Molecules is a new reading group organized by Asst. Prof. Eli Weinstein (DTU Chemistry) and Assoc. Prof. Jes Frellsen (DTU Compute). It focuses on fundamental probabilistic machine learning and its intersection with the molecular sciences. Everyone in the group presents a paper, in a rotating schedule. It is open to students, postdocs, faculty and staff. It is not a formal course, but can optionally be taken as a project course, for which you will receive course credit.

Contact: enawe [at] dtu.dk