Statistics Seminar: Hierarchical Clustering with Confidence

Presenter: Snigdha Panigrahi, Associate Professor, Department of Statistics, University of Michigan
Description:Agglomerative hierarchical clustering is one of the most widely used approaches for exploring how observations in a dataset relate to each other. However, its greedy nature makes it highly sensitive to small perturbations in the data, often producing different clustering results and making it difficult to separate genuine structure from spurious patterns. In this talk, I will show how randomizing hierarchical clustering can be useful not just for measuring stability but also for designing valid hypothesis testing procedures based on the clustering results. We propose a simple randomization scheme to construct valid p-values at each node of a hierarchical clustering dendrogram, quantifying evidence against greedy merges while controlling the Type I error rate. Our method applies to any linkage without case-specific derivations, is substantially more powerful than existing selective inference approaches, and provides an estimate of the number of clusters with a probabilistic guarantee on overestimation.
Bio:Snigdha Panigrahi is an Associate Professor of Statistics at the University of Michigan, where she also holds a courtesy appointment in the Department of Biostatistics. She received her PhD in Statistics from Stanford University in 2018 and has been a faculty member at Michigan since then. Her research focuses on converting purely predictive machine learning algorithms into principled inferential methods. She is an elected member of the International Statistical Institute, and her work has been recognized with an NSF CAREER Award and the Bernoulli New Researcher’s Award. Her editorial service, past and present, includes Journal of Computational and Graphical Statistics, Bernoulli, and Journal of the Royal Statistical Society: Series B.
Hosted by: Statistics Department