Eli N. Weinstein
Eli N. Weinstein

Eli N. Weinstein

I am an assistant professor of chemistry at the Technical University of Denmark (DTU). I am a member of the European Laboratory for Learning and Intelligent Systems (ELLIS), a member of the Pioneer Centre for AI in Copenhagen, and a member of the Center for Translational Protein Design (CTPD) at DTU. I also serve part-time as Director of Machine Learning Research at Jura Bio, a genomic medicines startup.

I work on fundamental machine learning methodology for molecules. I develop probabilistic and generative methods to steer chemical synthesis and high-throughput screening, and to learn from natural experiments in humans and the environment. I aim to advance our basic understanding of how to learn about the molecular world, from both an applied and a theoretical perspective.

Previously, I was a postdoctoral research scientist with David Blei in the Data Science Institute at Columbia University. I received my PhD in Biophysics from Harvard University in 2022, advised by Debora Marks and Jeff Miller, as a Hertz Foundation Fellow. I received my A.B. in Chemistry and Physics with highest honors from Harvard in 2016, working with Adam Cohen.

Please reach out if you are interested in working with me or collaborating.

Open position: I am recruiting a postdoc to work on generative modeling and experimental design for combatting antimicrobial resistance. Apply here.
Manufacturing-aware generative model architectures enable biological sequence design and synthesis at petascale
Eli N. Weinstein*, Mattia G. Gollub*, Andrei Slabodkin*, Kerry Dobbs*, Xiao-Bing Cui*, Cameron L. Gardner, Ryan J. Grant, Kristina Gurung, Amira Bailey, Alan N. Amin, George M. Church, Elizabeth B. Wood
Nature Biotechnology, 2026
A method that reduces the cost of synthesizing proteins designed by a generative model by as much as a trillion-fold. Best paper award (top 4) at MoML 2024.
Hierarchical causal models
Eli N. Weinstein, David M. Blei
Journal of Machine Learning Research, 2026
Methods to draw causal inferences from nested datasets, e.g., datasets with many molecules per person.
Non-identifiability and the blessings of misspecification in models of molecular fitness and phylogeny
Eli N. Weinstein*, Alan N. Amin*, Jonathan Frazer, Debora S. Marks
NeurIPS 2022 (Oral)
Analysis of the fundamental limits of generative sequence models for protein evolution, and biophysical/statistical reasons for their empirical success as fitness estimators.
Lifting biomolecular data acquisition
Eli N. Weinstein*, Andrei Slabodkin*, Mattia G. Gollub*, Kerry Dobbs*, Xiao-Bing Cui*, Fang Zhang, Kristina Gurung, Elizabeth B. Wood
arXiv, 2025
Accelerated learning on large scale screens using generative library models
Eli N. Weinstein*, Andrei Slabodkin*, Mattia G. Gollub*, Elizabeth B. Wood
Forthcoming at AISTATS 2026
Adaptive nonparametric perturbations of parametric Bayesian models
Bohan Wu*, Eli N. Weinstein*, Sohrab Salehi, Yixin Wang, David M. Blei
Forthcoming at JMLR 2026

Machine Learning for Molecules

Our research is on fundamental machine learning methodology for molecules. We develop probabilistic and generative methods to steer chemical synthesis and high-throughput screening, and to learn from natural experiments in humans and the environment.

We aim to advance our basic understanding of how to learn about the molecular world. How can we estimate molecules' effects on complex biological systems? How can we design and synthesize molecules with desired properties? How can we traverse chemical space to find useful molecules?

We investigate these questions from both an applied and a theoretical perspective.

Variational synthesis, Weinstein et al. Nature Biotechnology 2026

One major line of research is on the co-design of experiments and inference algorithms. We develop new ML methods to actively steer chemical synthesis, optimize high-throughput screening, and improve model training. This tight integration of laboratory data generation and algorithmic development accelerates molecular discovery. We apply these techniques to design and discover therapeutics, enzymes and more.

Our work intersects with experimental design, generative modeling, Bayesian optimization, and laboratory automation. We are especially motivated by a desire to understand how the unique properties of molecular systems might let us overcome the limits of conventional ML experimental design and optimization. We develop algorithms that can exploit randomness, stochasticity and heterogeneity at the molecular level to pack more information into experiments. We use these techniques to overcome the unique learning challenges of molecular discovery, such as rare properties, sparse structure-activity relationships, and discrete combinatorial spaces.

Blessings of misspecification, Weinstein et al., NeurIPS 2022.

A second major line of research learns from natural experiments, outside the controlled environment of the laboratory. This includes large-scale evolutionary processes, environmental samples and patient-derived data. Molecules made in nature often have properties that are difficult to reproduce synthetically, such as the bioactivity of natural products or the function of proteins in the human body. We develop ML methods that can learn from natural experiments to design molecules with properties that are challenging to test in the laboratory, such as therapeutic safety and efficacy.

Our work intersects with generative models, causal inference, and human and evolutionary biology. Generative models have proven to be powerful tools for learning the constraints on molecules imposed by evolution, but their inferences are biased, imperfect and incomplete. We investigate methods that can exploit the scale and diversity of natural experiments to learn the true range of molecular possibilities, and to align generative models of molecules with underlying biology. We aim to systematically bridge generalization gaps between the laboratory and the outside world, such as the valley of death between preclinical experiments and clinical trials in humans.

*Equal contribution  ·  Google Scholar

Steering Laboratory Experiments
Lifting biomolecular data acquisition
Eli N. Weinstein*, Andrei Slabodkin*, Mattia G. Gollub*, Kerry Dobbs*, Xiao-Bing Cui*, Fang Zhang, Kristina Gurung, Elizabeth B. Wood
arXiv, 2025
A method to pack and extract more information from high throughput experiments, using variational synthesis. Based on an extension of compressed sensing, from learning vectors to learning functions parameterized by neural networks.
Accelerated learning on large scale screens using generative library models
Eli N. Weinstein*, Andrei Slabodkin*, Mattia G. Gollub*, Elizabeth B. Wood
Forthcoming at AISTATS 2026
A method to scale up models of protein function via co-design of data collection and inference algorithms, taking advantage of variational synthesis to accelerate learning.
Manufacturing-aware generative model architectures enable biological sequence design and synthesis at petascale
Eli N. Weinstein*, Mattia G. Gollub*, Andrei Slabodkin*, Kerry Dobbs*, Xiao-Bing Cui*, Cameron L. Gardner, Ryan J. Grant, Kristina Gurung, Amira Bailey, Alan N. Amin, George M. Church, Elizabeth B. Wood
Nature Biotechnology, 2026
A method that reduces the cost of synthesizing proteins designed by a generative model by as much as a trillion-fold. Best paper award (top 4) at MoML 2024.
Optimal design of stochastic DNA synthesis protocols based on generative sequence models
Eli N. Weinstein, Alan N. Amin, Will Grathwohl, Adeline Kassler, Jean Disset, Debora S. Marks
AISTATS 2022
An experimental design method to efficiently manufacture samples from generative models in the real world.
Co-evolution of interacting proteins through non-contacting and non-specific mutations
David Ding, Anna G Green, Boyuan Wang, Thuy-Lan Vo Lite, Eli N. Weinstein, Debora S Marks, Michael T Laub
Nature Ecology and Evolution, 2022
One of the first deep mutational scans of a protein-protein interaction.
Learning from Natural Experiments
Hierarchical causal models
Eli N. Weinstein, David M. Blei
Journal of Machine Learning Research, 2026
Methods to draw causal inferences from nested data.
Estimating the causal effects of T cell receptors
Eli N. Weinstein, Elizabeth B. Wood, David M. Blei
arXiv, 2024
Estimating the effect of T cells with a specific TCR on patient outcomes, using hierarchical causal models.
ProGen2: exploring the boundaries of protein language models
Erik Nijkamp, Jeffrey Ruffolo, Eli N. Weinstein, Nikhil Naik, Ali Madani
Cell Systems, 2023
Exploration of very large scale generative sequence models trained on massive sequence datasets.
Non-identifiability and the blessings of misspecification in models of molecular fitness and phylogeny
Eli N. Weinstein*, Alan N. Amin*, Jonathan Frazer, Debora S. Marks
NeurIPS 2022 (Oral)
Analysis of the fundamental limits of generative sequence models for protein evolution. Oral presentation at NeurIPS.
A structured observation distribution for generative biological sequence prediction and forecasting
Eli N. Weinstein, Debora S. Marks
ICML 2021
An observation distribution for biological sequences enabling regression from covariates to sequences, applied to forecasting viral antigen evolution.
Probabilistic Foundations
Adaptive nonparametric perturbations of parametric Bayesian models
Bohan Wu*, Eli N. Weinstein*, Sohrab Salehi, Yixin Wang, David M. Blei
Forthcoming at JMLR 2026
A technique to robustify Bayesian models, guarding against model misspecification while preserving data efficiency.
Biological sequence kernels with guaranteed flexibility
Alan N. Amin, Debora S. Marks*, Eli N. Weinstein*
Journal of Machine Learning Research, 2025
Analysis of the flexibility of kernels for biological sequences, with fixes enabling consistent nonparametric regression and two-sample tests. Best student paper, NESS 2023.
Bayesian Empirical Bayes: Simultaneous Inference from Probabilistic Symmetries
Bohan Wu, Eli N. Weinstein, David M. Blei
arXiv, 2025
Methods to reconstruct latent variables from complex structured data, leveraging group theory.
A kernelized Stein discrepancy for biological sequences
Alan N. Amin, Eli N. Weinstein*, Debora S. Marks*
ICML 2023
A new discrepancy for biological sequence distributions, usable for measuring goodness-of-fit or sample quality of generative sequence models.
Bayesian data selection
Eli N. Weinstein, Jeffrey W. Miller
Journal of Machine Learning Research, 2023
A technique to discover which aspects of a data set a Bayesian model can explain. Best student paper, NESS 2021.
A generative nonparametric Bayesian model for whole genomes
Alan N. Amin*, Eli N. Weinstein*, Debora S. Marks
NeurIPS 2021
A scalable nonparametric model for biological sequences with established asymptotic consistency and convergence rate.
PhD Thesis
Generative Statistical Methods for Biological Sequences
Eli N. Weinstein
Harvard University, 2022
Older
All-optical electrophysiology for high-throughput functional characterization of a human iPSC-derived motor neuron model of ALS
Evangelos Kiskinis*, Joel M Kralj*, Peng Zou*, Eli N. Weinstein*, Hongkang Zhang, Konstantinos Tsioras, Ole Wiskow, J Alberto Ortega, Kevin Eggan, Adam E Cohen
Stem Cell Reports, 2018
A high-throughput screening platform for measuring electrophysiological properties of human neurons.
Genetically targeted all-optical electrophysiology with a transgenic Cre-dependent optopatch mouse
Shan Lou, Yoav Adam, Eli N. Weinstein, Erika Williams, Katherine Williams, Vicente Parot, Nikita Kavokine, Stephen Liberles, Linda Madisen, Hongkui Zeng, Adam E Cohen
Journal of Neuroscience, 2016
A transgenic mouse for optical electrophysiology studies in genetically-defined subsets of cells.
Eli N. Weinstein
Assistant Professor, DTU Chemistry

Affiliations: DTU Department of Chemistry · Jura Bio

Office: Building 206, Room 258, Kemitorvet, 2800 Kgs. Lyngby

Email: enawe [at] dtu.dk

Brief bio (plain text)

Kasper Krunderup Jakobsen
DTU Chemistry

Email: kkrja [at] kemi.dtu.dk

I am always looking for motivated students, postdocs, and collaborators interested in probabilistic machine learning and its applications to the molecular sciences. Please reach out if you are interested: enawe [at] dtu.dk

Open position: I am recruiting a postdoc to work on generative modeling and experimental design for combatting antimicrobial resistance. Apply here.
ML & Molecules Reading Group, Spring 2026

ML and Molecules Reading Group — Spring 2026

The ML & Molecules reading group covers fundamental machine learning methodology and its intersection with the molecular sciences. This semester we are covering two major topics: first, the interface of machine learning and simulation, and second, experimental design and planning in the time domain.

Thursdays 1–2:30 pm, Building 207 Room 222 (DTU Chemistry). First meeting: February 5.
Sign up for the mailing list

Simulators

Large scale computing has transformed scientific discovery both through physical simulation and through machine learning. Yet these two forms of scientific computing sit in uneasy relationship, operating with distinct logic and methodology. We will investigate the interface between machine learning and simulation

1. Feb. 5: Fast sampling from energy-based models for molecular systems.
Noé et al. 2019. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. link
2. Feb. 12: Learning energy-based models.
Hyvärinen. 2005. Estimation of non-normalized statistical models by score matching. link
3. Feb. 19: Learning, sampling and simulating trajectories with energy-based models for molecular systems.
Plainer et al. 2025. Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models link
4. Feb. 26: Learning in implicit generative models.
Mohamed and Lakshminarayanan. 2016. Learning in implicit generative models. link
5. Mar. 5: Learning energy-based models for molecular systems by learning forces.
Kabylda et al. 2025. Molecular Simulations with a Pretrained Neural Network and Universal Pairwise Force Fields. link
6. Mar. 12: Inference in implicit models with simulation based inference.
Papamakarios and Murray. 2016. Fast ε-free Inference of Simulation Models with Bayesian Conditional Density Estimation. link

Experimental Design in Time

Current machine learning techniques for molecular discovery typically rely on separate controlled experiments of different molecules. These techniques are often bottlenecked in throughput. We study alternative experimental designs that test the effects of different interventions by turning them on or off over time. In these experiments, the time dynamics of the system set the speed of information gain. To understand these dynamics, we further study biological systems' response to perturbations.

1. Mar. 19: Adaptive experiments in time.
Glynn et al. 2020. Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach. link
2. Mar. 26: N-of-1 experiments.
Liang and Recht. 2023. Randomization Inference When N Equals One. link
3. Apr. 16: Time dynamics of cells and populations.
Schiebinger et al. 2019. Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming. link
4. Apr. 23: Time dynamics of cells in response to perturbation.
Lorch et al. 2026. Latent Causal Diffusions for Single-Cell Perturbation Modeling. link
About ML & Molecules: ML & Molecules is a reading group organized by Asst. Prof. Eli Weinstein (DTU Chemistry) and Assoc. Prof. Jes Frellsen (DTU Compute). It focuses on fundamental probabilistic machine learning and its intersection with the molecular sciences. Everyone in the group presents a paper, in a rotating schedule. It is open to students, postdocs, faculty and staff. It is not a formal course, but can optionally be taken as a project course for course credit.
Contact: enawe [at] dtu.dk
ML & Molecules Reading Group, Fall 2025

ML and Molecules Reading Group — Fall 2025

Machine learning and AI have transformed scientific data analysis. But the experiments that generate data are usually taken as given. Better algorithmic control of scientific experiments could lead to better learning, enabling the next generation of scientific AI.

This semester, we studied the fundamental theory of experimental design. We read foundational statistical ideas about how to quantify the amount of information an experiment provides, studied emerging ML methods and algorithms for optimizing experimental designs, and looked at how these ideas are being applied in science, with a focus on protein design and engineering.

Thursdays 1–2:30 pm, Room 207-222 (DTU Chemistry). First meeting: September 25.
Sign up for the mailing list

Reading List

1. Introduction and overview.
Rainforth et al. 2024. Modern Bayesian Experimental Design. link
2. Lindley's foundational paper.
D.V. Lindley. 1956. On a Measure of the Information Provided by an Experiment. link
3. Experimental design for protein design.
Romero et al. 2012. Navigating the protein fitness landscape with Gaussian processes. link
4. Advances in experimental design for proteins.
Frey et al. 2025. Lab-in-the-loop therapeutic antibody design with deep learning. link
5. Optimizing designs with stochastic gradients.
Foster et al. 2020. A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments. link
6. Building training datasets with large scale screens.
Krishnan et al. 2025. A generative deep learning approach to de novo antibiotic design. link
7. Optimal information gathering for prediction.
Smith et al. 2023. Prediction Oriented Bayesian Active Learning. link
8. Compressed screens.
Liu et al. 2024. Scalable, compressed phenotypic screening using pooled perturbations. link
9. Stochastic variational design for compressed experiments.
Grover and Ermon. 2019. Uncertainty Autoencoders. link
10. Information for optimization.
Russo and van Roy. 2014. Learning to Optimize via Information-Directed Sampling. link
11. Optimal nonparametric design and theory.
Huszar and Duvenaud. 2012. Optimally-Weighted Herding is Bayesian Quadrature. link
About ML & Molecules: ML & Molecules is a reading group organized by Asst. Prof. Eli Weinstein (DTU Chemistry) and Assoc. Prof. Jes Frellsen (DTU Compute). It focuses on fundamental probabilistic machine learning and its intersection with the molecular sciences. Everyone in the group presents a paper, in a rotating schedule. It is open to students, postdocs, faculty and staff. It is not a formal course, but can optionally be taken as a project course for course credit.
Contact: enawe [at] dtu.dk