Eli N. Weinstein

drawing

I am incoming assistant professor of chemistry at the Technical University of Denmark (DTU).

I work on fundamental methodology in probabilistic machine learning and its application to molecules. My research focuses on developing machine learning methods to design laboratory experiments (experimental design) and to learn from natural experiments and observation (causal inference). I am principally interested in developing algorithms that exploit unique features of molecular systems, such as molecular sources of randomness, or billions of years of evolutionary history.

If you would like to read more, I recommend my papers on steering stochastic chemical reactions to synthesize DNA from generative models at scale, or on hierarchical causal modeling and its application to human immune receptor repertoires.

My fundamental research is often motivated by applied problems in therapeutics and drug discovery. I am part-time Director of Machine Learning Research at Jura Bio, a genomic medicines startup, where my methods have been translated into therapeutic discovery and development.

Previously, I was a postdoctoral research scientist with David Blei in the Data Science Institute at Columbia University. I received my PhD in Biophysics from Harvard University in 2022, advised by Debora Marks and Jeff Miller, as a Hertz Foundation Fellow. I received my A.B. in Chemistry and Physics with highest honors from Harvard in 2016, working with Adam Cohen.

Please reach out if you are interested in working with me as a PhD student or postdoc, or otherwise collaborating.

Email: eli.n.weinstein [at] gmail.com

Publications and preprints

*Equal contribution (first or last author)

Causal Inference

Eli N. Weinstein, David M. Blei. Hierarchical causal models. 2024. In submission. paper. code. talk.
_{We develop methods to draw causal inferences from nested data.}

Eli N. Weinstein, Elizabeth B. Wood, David M. Blei. Estimating the causal effects of T cell receptors. 2024. In submission. paper.
_{We estimate the effect of T cells with a specific TCR on patient outcomes. This work uses hierarchical causal models (above).}

Experimental Design

Eli N. Weinstein*, Mattia G. Gollub*, Andrei Slabodkin*, Cameron L. Gardner, Kerry Dobbs, Xiao-Bing Cui, Alan N. Amin, George M. Church, Elizabeth B. Wood. Manufacturing-aware generative model architectures enable biological sequence design and synthesis at petascale. 2024. In submission. paper. blog.
_{We develop a method that reduces the cost of synthesizing proteins designed by a
generative model by as much as a trillion-fold. This work implements variational synthesis (below) in the laboratory.
At Jura Bio, we are using it for therapeutic discovery. A short version of this paper won a best paper award (top 4) at
the Molecular Machine Learning Conference (MoML) 2024.}

Eli N. Weinstein, Alan N. Amin, Will Grathwohl, Adeline Kassler, Jean Disset, Debora S. Marks. Optimal design of stochastic DNA synthesis protocols based on generative sequence models. Artificial Intelligence and Statistics (AISTATS). 2022. paper. code. talk.
_{We develop an experimental design method to efficiently manufacture samples from generative models in the real world.}

Model Misspecification

Eli N. Weinstein, Jeffrey W. Miller. Bayesian data selection. Journal of Machine Learning Research. 2023. paper. code. talk.
_{We develop a technique to discover the aspects of a data set that a Bayesian model can explain, and those it cannot.
This work received a best student paper award at the New England Statistics Symposium, 2021.
A workshop version was selected for a contributed talk at the Your Model is Wrong workshop, NeurIPS 2021.}

Eli N. Weinstein*, Alan N. Amin*, Jonathan Frazer, Debora S. Marks. Non-identifiability and the blessings of misspecification in models of molecular fitness and phylogeny. Advances in Neural Information Processing Systems (NeurIPS). 2022. paper. talk.
_{We analyze the fundamental limits of what generative sequence models can learn about protein evolution, and propose biophysical and statistical reasons for their empirical success as fitness estimators.
This paper won an oral presentation at NeurIPS.}

Bohan Wu*, Eli N. Weinstein*, Sohrab Salehi, Yixin Wang, David M. Blei. Adaptive nonparametric perturbations of parametric Bayesian models. 2024. In submission. paper. code.
_{We develop a technique to robustify Bayesian models, guarding against model misspecification while preserving data efficiency.}

Nonparametric Methods

Alan N. Amin*, Eli N. Weinstein*, Debora S. Marks. A generative nonparametric Bayesian model for whole genomes. Advances in Neural Information Processing Systems (NeurIPS). 2021. paper. code. talk.
_{We develop a scalable nonparametric model for biological sequences, and establish its asymptotic consistency and convergence rate.
We also apply this model to develop goodness-of-fit and two-sample tests for biological sequences.}

Alan N. Amin, Eli N. Weinstein*, Debora S. Marks*. Biological sequence kernels with guaranteed flexibility. 2023. In submission. paper.
_{We analyze the flexibility of kernels for biological sequences.
We find problems with many popular kernels, and propose fixes, enabling consistent nonparametric regression and two-sample tests.
Alan won a best student paper award at the New England Statistics Symposium 2023 for this work.}

Alan N. Amin, Eli N. Weinstein*, Debora S. Marks*. A kernelized Stein discrepancy for biological sequences. International Conference on Machine Learning (ICML). 2023. paper.
_{We develop a new discrepancy for biological sequence distributions. It can be used (for instance) to measure the goodness-of-fit of a generative sequence model, or the quality of samples drawn from a generative sequence model.}

Additional work

Erik Nijkamp, Jeffrey Ruffolo, Eli N. Weinstein, Nikhil Naik, Ali Madani. ProGen2: exploring the boundaries of protein language models. Cell Systems. 2023. paper
_{The team explored very large scale generative sequence models, trained on a massive sequence datasets.
The results they found were in line with the theory developed in our previous paper.
I contributed by helping to interpret and explain these results.}

David Ding, Anna G Green, Boyuan Wang, Thuy-Lan Vo Lite, Eli N. Weinstein, Debora S Marks, Michael T Laub. Co-evolution of interacting proteins through non-contacting and non-specific mutations. Nature Ecology and Evolution. 2022. paper
_{The team performed one of the first deep mutational scans of a protein-protein interaction. I helped build the Bayesian model used to analyze the data.}

Eli N. Weinstein, Debora S. Marks. A structured observation distribution for generative biological sequence prediction and forecasting. International Conference on Machine Learning (ICML). 2021. paper. Pyro code. Edward2 code. talk.
_{We develop an observation distribution for biological sequences, which enables (for instance) regression from covariates to sequences.
We apply this model to develop a generative forecast of viral antigen evolution.}

Evangelos Kiskinis*, Joel M Kralj*, Peng Zou*, Eli N. Weinstein*, Hongkang Zhang, Konstantinos Tsioras, Ole Wiskow, J Alberto Ortega, Kevin Eggan, Adam E Cohen. All-optical electrophysiology for high-throughput functional characterization of a human iPSC-derived motor neuron model of ALS. Stem Cell Reports. 2018. paper
_{The team developed a high-throughput screening platform for measuring the electrophysiological properties of human neurons. I built a computational pipeline for analyzing the data.
I also helped translate these tools, together with additional machine learning techniques, into the therapeutic discovery platform at Q-State Biosciences (now Quiver Biosciences).}

Shan Lou, Yoav Adam, Eli N. Weinstein, Erika Williams, Katherine Williams, Vicente Parot, Nikita Kavokine, Stephen Liberles, Linda Madisen, Hongkui Zeng, Adam E Cohen. Genetically targeted all-optical electrophysiology with a transgenic Cre-dependent optopatch mouse. Journal of Neuroscience. 2016. paper
_{The team developed a transgenic mouse for optical electrophysiology studies in genetically-defined subsets of cells. I built a computational pipeline for analyzing the data.}

PhD Thesis

Eli N. Weinstein. Generative Statistical Methods for Biological Sequences. Harvard University. 2022. pdf