
Presenter: Qi Xu, Postdoctoral Researcher, Department of Statistics & Data Science, Carnegie Mellon University
Description: Multi-modality data are increasingly common across science medicine and technology, such as imaging, text, sensors, and genomics. These modalities are often high dimensional or unstructured and naturally exhibit blockwise (nonmonotone) missingness where different samples observe different subsets of modalities. Such missingness creates a major obstacle for statistical analyses since classical methods either discard large portions of data or rely on strong modeling assumptions. Recent advances in AI make it possible to generate or predict unobserved modalities from observed ones, opening new opportunities for data integration. In this talk, I will focus on statistical inference for blockwise-missing multi-modality data, while rigorously incorporating modern AI tools. Rooted in semiparametric theory, there is a long-term open problem that theoretically optimal estimating function under non-monotone missingness is computationally intractable, even under the missing completely at random mechanism. I introduce a tractable approximation to the optimal estimating equation through a novel Restricted ANOVA hierarchY or RAY decomposition and its almost-eigen-operator property. This leads to a new class of estimators that leverage predictive or generative AI models to borrow information across datasets while remaining unbiased and asymptotically normal. Motivated by the property of the RAY estimator, we extend the RAY estimator to a class of unbiased, consistent, and computationally tractable estimators. The most efficient estimator in this class is then derived, named as Adaptive RAY estimator, which optimally integrating all available data and prediction from AI. Simulation studies and a single cell multi-omics application demonstrate that the proposed framework enables stable and efficient inference for complex multi modality data in the AI era. This is a joint work with Lorenzo Testa, Jing Lei and Kathryn Roeder, and the paper is available on arXiv: https://arxiv.org/abs/
Bio: Qi Xu is a postdoctoral researcher in the Department of Statistics & Data Science at Carnegie Mellon University. His research interests lie broadly in statistics and machine learning, especially in data integration and AI for statistics, with their applications in genomics and mobile health. He received his Ph.D. from the Department of Statistics at University of California, Irvine, and the Master degree from University of Illinois Urbana Champaign, and the Bachelor degree (with honors) from Tongji University.
Hosted by: Statistics Department
Zoom link: https://ucsc.zoom.us/j/91740050783?pwd=joK9hfwvM7FZ48acaiow8OY4ZlBDXA.1