Eli N. Weinstein

drawing

I’m a postdoctoral research scientist working with David Blei at Columbia University. I also lead machine learning research at Jura Bio, a genomic medicines startup. Broadly, I work in the fields of probabilistic machine learning and Bayesian statistics, and their application to biophysics and genomics.

I received my PhD in Biophysics from Harvard University in 2022, advised by Debora Marks and working closely with Jeff Miller, as a Hertz Foundation Fellow. I received my A.B. in Chemistry and Physics with highest honors from Harvard in 2016, working with Adam Cohen.

My research spans from theoretical machine learning to applied computational biology. I am especially motivated by foundational methodological questions in computational biology, particularly in synthetic biology. If you would like to learn about some of my recent work, I recommend my papers on hierarchical causal models or variational DNA synthesis.

Email: ew2760 [at] columbia.edu

Publications and preprints

*Equal contribution (first or last author)

General machine learning methodology

Eli N. Weinstein, David M. Blei. Hierarchical causal models. 2024. In submission. paper. code. talk.
We develop methods to draw causal inferences from nested data.

Eli N. Weinstein, Jeffrey W. Miller. Bayesian data selection. Journal of Machine Learning Research. 2023. (IBM Student Paper Award at the New England Statistics Symposium 2021, Contributed talk at the Your Model is Wrong workshop, NeurIPS 2021.) paper. code. talk.
We develop a technique to discover the aspects of a data set that a Bayesian model can explain, and those it cannot.

Methodology for biological sequences

Alan N. Amin, Eli N. Weinstein*, Debora S. Marks*. Biological Sequence Kernels with Guaranteed Flexibility. 2023. In submission. (Alan won the MassMutual Student Research Award at the New England Statistics Symposium 2023 for this work). paper.
We analyze the flexibility of kernels for biological sequences. We find problems with many popular kernels, and propose fixes.

Alan N. Amin, Eli N. Weinstein*, Debora S. Marks*. A Kernelized Stein Discrepancy for Biological Sequences. International Conference on Machine Learning (ICML). 2023. paper.
We develop a new discrepancy for biological sequence distributions. It can be used (for instance) to measure the goodness-of-fit of a generative sequence model, or the quality of samples drawn from a generative sequence model.

Eli N. Weinstein*, Alan N. Amin*, Jonathan Frazer, Debora S. Marks. Non-identifiability and the blessings of misspecification in models of molecular fitness and phylogeny. Advances in Neural Information Processing Systems (NeurIPS). 2022. Oral presentation. paper. talk.
We analyze the fundamental limits of what generative sequence models can learn about protein evolution, and propose biophysical and statistical reasons for their empirical success as fitness estimators.

Eli N. Weinstein, Alan N. Amin, Will Grathwohl, Daniel Kassler, Jean Disset, Debora S. Marks. Optimal design of stochastic DNA synthesis protocols based on generative sequence models. Artificial Intelligence and Statistics (AISTATS). 2022. paper. code. talk.
We develop an experimental design strategy for efficiently synthesizing samples from generative sequence models in the laboratory (variational synthesis). At Jura Bio, we are using this method to build high quality, large scale libraries for therapeutic discovery.

Alan N. Amin*, Eli N. Weinstein*, Debora S. Marks. A generative nonparametric Bayesian model for whole genomes. Advances in Neural Information Processing Systems (NeurIPS). 2021. paper. code. talk.
We develop a scalable nonparametric model for biological sequences, and establish its asymptotic consistency and convergence rate. We also apply this model to develop goodness-of-fit and two-sample tests for biological sequences.

Eli N. Weinstein, Debora S. Marks. A structured observation distribution for generative biological sequence prediction and forecasting. International Conference on Machine Learning (ICML). 2021. paper. Pyro code. Edward2 code. talk.
We develop an observation distribution for biological sequences, which enables (for instance) regression from covariates to sequences. We apply this model to develop a generative forecast of viral antigen evolution.

Collaborations / applications

Erik Nijkamp, Jeffrey Ruffolo, Eli N Weinstein, Nikhil Naik, Ali Madani. ProGen2: Exploring the Boundaries of Protein Language Models. Cell Systems. 2023. paper
The team explored very large scale generative sequence models, trained on a massive sequence datasets. The results they found were in line with the theory developed in our previous paper. I contributed by helping to interpret and explain these results.

David Ding, Anna G Green, Boyuan Wang, Thuy-Lan Vo Lite, Eli N Weinstein, Debora S Marks, Michael T Laub. Co-evolution of interacting proteins through non-contacting and non-specific mutations. Nature Ecology and Evolution. 2022. paper
The team performed one of the first deep mutational scans of a protein-protein interaction. I helped build the Bayesian model used to analyze the data.

Evangelos Kiskinis*, Joel M Kralj*, Peng Zou*, Eli N Weinstein*, Hongkang Zhang, Konstantinos Tsioras, Ole Wiskow, J Alberto Ortega, Kevin Eggan, Adam E Cohen. All-optical electrophysiology for high-throughput functional characterization of a human iPSC-derived motor neuron model of ALS. Stem Cell Reports. 2018. paper
The team developed a high-throughput screening platform for measuring the electrophysiological properties of human neurons. I built a computational pipeline for analyzing the data. I also helped translate these tools, together with additional machine learning techniques, into the therapeutic discovery platform at Q-State Biosciences (now Quiver Biosciences).

Shan Lou, Yoav Adam, Eli N Weinstein, Erika Williams, Katherine Williams, Vicente Parot, Nikita Kavokine, Stephen Liberles, Linda Madisen, Hongkui Zeng, Adam E Cohen. Genetically targeted all-optical electrophysiology with a transgenic Cre-dependent optopatch mouse. Journal of Neuroscience. 2016. paper
The team developed a transgenic mouse for optical electrophysiology studies in genetically-defined subsets of cells. I built a computational pipeline for analyzing the data.

PhD Thesis

Eli N. Weinstein. Generative Statistical Methods for Biological Sequences. Harvard University. 2022. pdf

Google scholar.