Zheng, Z. (STATS) – Semi-Supervised Statistical Learning for Oceanographic Data

Oceanographic data, generated by modern technologies that measure biological systems across time, space, and cell populations, are often rich, high-dimensional, and highly heterogeneous. Such data provide valuable opportunities to study subcellular organization, cellular heterogeneity, and dynamic biological processes in marine environments. However, because marine plankton systems remain relatively understudied and less well characterized than many model biological systems, both data generation and labeling are particularly challenging. Limited domain knowledge and less mature laboratory protocols often produce noisy observations, while reliable annotation requires substantial expert effort and is therefore difficult to obtain at scale.
This proposal develops statistical methodology for oceanographic data settings in which a small amount of expert-labeled data must be combined with a much larger collection of unlabeled or imperfectly processed data. A central goal is to incorporate limited scientific knowledge into statistical learning procedures to improve interpretability, component identifiability, and inferential reliability. In particular, I develop semi-supervised statistical methods that explicitly quantify the information contributed by expert annotation.
To address this goal, I study three related problems: semi-supervised functional clustering for subcellular spatial proteomics, anchored semi-supervised mixture-of-experts models for flow cytometry, and temporally structured latent-variable models that separate smooth trend and seasonal variation from scientific signals of interest. Together, these projects aim to develop principled and interpretable methodology for partially labeled, structured, and high-dimensional oceanographic data, with an emphasis on valid uncertainty quantification.
Event Host: Ziyue Zheng, Ph.D. Student, Statistical Science
Advisor: Sangwon Hyun
Zoom: https://ucsc.zoom.us/j/93229540289?pwd=8bsBOSBFmISlexmS4OWTmTZKp420u2.1