Previous Wellner Lectures
The Fall 2024 Jon A. Wellner Lecture
Data Integration for Heterogeneous Data
Tuesday, October 15, 2024
Annie Qu, Ph.D.
Chancellor's Professor, Department of Statistics, University of California, Irvine
In this presentation, I will showcase advanced statistical machine learning techniques and tools designed for the seamless integration of information from multi-source datasets. These datasets may originate from various sources, encompass distinct studies with different variables, and exhibit unique dependent structures. One of the greatest challenges in investigating research findings is the systematic heterogeneity across individuals, which could significantly undermine the power of existing machine learning methods to identify the underlying true signals. This talk will investigate the advantages and drawbacks of current methods such as multi-task learning, optimal transport, missing data imputations, matrix completions and transfer learning. Additionally, we will introduce a new latent representation method aimed at mapping heterogeneous observed data to a latent space, facilitating the extraction of shared information and knowledge, and disentanglement of source-specific information and knowledge. The key idea of the proposal is to project heterogeneous raw observations to the representation retriever library, and the novelty of our method is that we can retrieve partial representations from the library for a target study. The main advantages of the proposed method are that it can increase statistical power through borrowing common representation retrievers from multiple sources of data. This approach ultimately allows one to extract information from heterogeneous data sources and transfer generalizable knowledge beyond observed data and enhance the accuracy of prediction and statistical inference.
Annie Qu is Chancellor's Professor, Department of Statistics, University of California, Irvine. She received her Ph.D. in Statistics from the Pennsylvania State University in 1998. Qu's research focuses on solving fundamental issues regarding structured and unstructured large-scale data and developing cutting-edge statistical methods and theory in machine learning and algorithms for personalized medicine, text mining, recommender systems, medical imaging data, and network data analysis for complex heterogeneous data. The newly developed methods can extract essential and relevant information from large volumes of intensively collected data, such as mobile health data. Her research impacts many fields, including biomedical studies, genomic research, public health research, social and political sciences. Before joining UC Irvine, Dr. Qu was a Data Science Founder Professor of Statistics and the Director of the Illinois Statistics Office at the University of Illinois at Urbana-Champaign. She was awarded the Brad and Karen Smith Professorial Scholar by the College of LAS at UIUC and was a recipient of the NSF Career award from 2004 to 2009. She is a Fellow of the Institute of Mathematical Statistics (IMS), the American Statistical Association, and the American Association for the Advancement of Science. She is also a recipient of IMS Medallion Award and Lecturer in 2024. She serves as Journal of the American Statistical Association Theory and Methods Co-Editor from 2023 to 2025 and as IMS Program Secretary from 2021 to 2027.
The Spring 2024 Jon A. Wellner Lecture
Data thinning and its applications
Tuesday, January 30, 2024
Daniela Witten, Ph.D.
Professor, Biostatistics Dorothy Gilford Endowed Chair of Mathematical Statistics, University of Washington
We propose data thinning, a new approach for splitting an observation from a known distributional family with unknown parameter(s) into two or more independent parts that sum to yield the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a parameter. This proposal is very general, and can be applied to a broad class of distributions within he natural exponential family, including the Gaussian, Poisson, negative binomial, Gamma, and binomial distributions, among others. Furthermore, we generalize data thinning to enable splitting an observation into two or more parts that can be combined to yield the original observation using an operation other than addition; this enables the application of data thinning far beyond the natural exponential family. Data thinning has a number of applications to model selection, evaluation, and inference. For instance, cross-validation via data thinning provides an attractive alternative to the “usual” approach of cross-validation via sample splitting, especially in unsupervised settings in which the latter is not applicable. We will present an application of data thinning to single-cell RNA-sequencing data, in a setting where sample splitting is not applicable. This is joint work with Anna Neufeld (Fred Hutch), Ameer Dharamshi (University of Washington), Lucy Gao (University of British Columbia), and Jacob Bien (University of Southern California).
Daniela Witten is a professor of Statistics and Biostatistics at University of Washington, and the Dorothy Gilford Endowed Chair in Mathematical Statistics. She develops statistical machine learning methods for high-dimensional data, with a focus on unsupervised learning.
She has received a number of awards for her research in statistical machine learning: most notably the Spiegelman Award from the American Public Health Association for a (bio)statistician under age 40, and the Presidents' Award from the Committee of Presidents of Statistical Societies for a statistician under age 41.
Daniela is a co-author of the textbook "Introduction to Statistical Learning", and beginning in 2023 will serve as Joint Editor of Journal of the Royal Statistical Society, Series B.
Recording unavailable due to technical difficulties.
The Fall 2022 Jon A. Wellner Lecture
Inference for Longitudinal Data After Adaptive Sampling
Thursday, September 29, 2022
Susan A. Murphy, Ph.D.
Mallinckrodt Professor of Statistics and of Computer Science,
Radcliffe Alumnae Professor at the Radcliffe Institute,
Harvard University
Adaptive sampling methods, such as reinforcement learning (RL) and bandit algorithms, are increasingly used for the real-time personalization of interventions in digital applications like mobile health and education. As a result, there is a need to be able to use the resulting adaptively collected user data to address a variety of inferential questions, including questions about time-varying causal effects. However, current methods for statistical inference on such data (a) make strong assumptions regarding the environment dynamics, e.g., assume the longitudinal data follows a Markovian process, or (b) require data to be collected with one adaptive sampling algorithm per user, which excludes algorithms that learn to select actions using data collected from multiple users. These are major obstacles preventing the use of adaptive sampling algorithms more widely in practice. In this work, we proved statistical inference for the common Z-estimator based on adaptively sampled data. The inference is valid even when observations are non-stationary and highly dependent over time, and (b) allow the online adaptive sampling algorithm to learn using the data of all users. Furthermore, our inference method is robust to miss-specification of the reward models used by the adaptive sampling algorithm. This work is motivated by our work in designing the Oralytics oral health clinical trial in which an RL adaptive sampling algorithm will be used to select treatments, yet valid statistical inference is essential for conducting primary data analyses after the trial is over.
Susan Murphy’s research focuses on improving sequential, individualized, decision making in digital health. She developed the micro-randomized trial for use in constructing digital health interventions; this trial design is in use across a broad range of health-related areas. Her lab works on online learning algorithms for developing personalized digital health interventions. Dr. Murphy is a member of the National Academy of Sciences and of the National Academy of Medicine, both of the US National Academies. In 2013 she was awarded a MacArthur Fellowship for her work on experimental designs to inform sequential decision making. She is a Fellow of the College on Problems in Drug Dependence, Past-President of Institute of Mathematical Statistics, Past-President of the Bernoulli Society and a former editor of the Annals of Statistics.
The Spring 2022 Jon A. Wellner Lecture
Fitting stochastic epidemic models to noisy surveillance data: are we there yet?
Tuesday, March 29, 2022
Vladimir N. Minin, Ph.D.
Professor, Department of Statistics and Associate Director of the Infectious Disease Science Initiative, University of California, Irvine
Stochastic epidemic models describe how infectious diseases spread through a population of interest. These models are constructed by first assigning individuals to compartments (e.g., susceptible, infectious, and recovered) and then defining a stochastic process that governs the evolution of sizes of these compartments through time. I will review multiple lines of attack of a challenging and not fully solved problem of fitting these models to noisy infectious disease surveillance data. These solutions involve a range of mathematical techniques: particle filter Markov chain Monte Carlo algorithms, approximations of stochastic differential equations, and Poisson random measure-based Bayesian data augmentation. Importantly, many of these computational strategies open the door for integration of multiple infectious disease surveillance data streams, including less conventional ones (e.g., pathogen wastewater monitoring and genomic surveillance). Such data integration is critical for making key parameters of stochastic epidemic models identifiable. I will illustrate the state-of-the-art statistical inference for stochastic epidemic models using Influenza, Ebola, and SARS-CoV-2 surveillance data and will conclude with open problems and challenges that remain to be addressed.
Minin’s research interests revolve around developing statistically rigorous solutions to problems that arise in biological sciences. These solutions often involve formulating stochastic models that can describe complex dynamics of biological systems and devising computationally efficient algorithms to fit these models to data. Minin is currently most active in infectious disease epidemiology, working on Bayesian estimation of disease transmission model parameters. His other research interests include phylogenetics, population genetics, computational immunology, and systems biology. Minin received a B.S. in Mathematics from Odesa National University, an M.S. in Mathematics from the University of Idaho, and a Ph.D. in Biomathematics from the University of California, Los Angeles.
The 2019 Jon A. Wellner Lecture
Nonparametric Inference Under Shape Constraints: Past, Present and Future
Tuesday, September 24, 2019
Richard J. Samworth, Ph.D.
Professor of Statistical Science and Director of the Statistical Laboratory, University of Cambridge
Traditionally, we think of statistical methods as being divided into parametric approaches, which can be restrictive, but where estimation is typically straightforward (e.g. using maximum likelihood), and nonparametric methods, which are more flexible but often require careful choices of tuning parameters. The area of nonparametric inference under shape constraints sits somewhere in the middle, seeking in some ways the best of both worlds. I will give an introduction to this currently very active area, providing some history, recent developments and a future outlook.
Professor Richard J. Samworth's main research interests are in nonparametric and high-dimensional statistics. Particular topics include shape-constrained estimation problems; data perturbation methods (e.g. subsampling, bootstrap sampling, random projections, knockoffs); nonparametric classification; (conditional) independence testing; estimation of entropy and other functionals; changepoint detection and estimation; missing data; variable selection; and applications, including genetics, archaeology and oceanography.
Awards
- Fellow of the American Statistical Association
- Fellow of the Institute of Mathematical Statistics
- The Royal Statistical Society?s Research Prize in 2008
- The Guy Medal in Bronze, Royal Statistical Society, in 2012
- The COPSS Presidents' Award 2018
- IMS Medallion Lecture 2018
- The Adams Prize 2017
- Philip Leverhulme Prize 2014
The 2018 Jon A. Wellner Lecture
New Multiplier Inequalities and Applications
Thursday, September 6, 2018
Jon A. Wellner, Ph.D.
Professor of Statistics and Biostatistics, University of Washington
Multiplier inequalities have proved to be one of the key tools of modern empirical process theory, with applications to central limit theorems, bootstrap theory, and weighted likelihood methods in statistics. In this talk I will review some classical multiplier inequalities, present a new multiplier inequality, and discuss several statistical applications. The applications include new results concerning convergence rates of least squares estimators (LSE) in regression models with possibly heavy-tailed errors. Particular cases involving sparse linear regression and shape restrictions will be mentioned.
[This talk is based on the University of Washington Ph.D. work of Qiyang (Roy) Han.]