Tu, H. (CSE) – From Evaluation to Adaptation: Building Reliable Multimodal Intelligence

Name: Tu, H. (CSE) – From Evaluation to Adaptation: Building Reliable Multimodal Intelligence
Start: 2026-05-27T09:00:00-07:00
End: 2026-05-27T11:00:00-07:00

May 27 @ 9:00 am – 11:00 am

Virtual Event

Abstract digital illustration featuring gears and interconnected technology elements.

Multimodal large language models (MLLMs) are rapidly becoming general-purpose AI systems, yet their capabilities are advancing faster than our ability to evaluate, improve, and validate their reliability in realistic use. Standard benchmarks mainly measure in-distribution final-answer accuracy, leaving critical gaps in safety, robustness, fine-grained reasoning evaluation, and reliability in real-world agentic settings. My research proposes an evaluation-to-adaptation framework for building reliable multimodal intelligence: developing rigorous evaluations that expose failures beyond conventional benchmarks, learning feedback models that guide inference-time reasoning, and studying how multimodal systems can adapt through experience. We instantiate this agenda through two completed works and two proposed directions. Unicorn evaluates safety and robustness under out-of-distribution and adversarial conditions, revealing substantial vulnerabilities across 22 vision-language models. ViLBench studies vision-language process reward modeling as both an evaluation challenge and a mechanism for inference-time improvement, showing that process-guided reasoning selection can improve reliability. Building on these foundations, we further study test-time experience accumulation and explore reliable multimodal agents for GUI and computer-use tasks. Together, my research aims to move beyond capability-driven progress alone, toward multimodal AI systems whose reliability can be evaluated, improved, and tested in realistic deployment settings.

Event Host: Haoqin Tu, Ph.D. Student, Computer Science & Engineering

Advisor: Cihang Xie

Zoom: 964 1355 0550

Passcode: zWxU8A

Details

Date: May 27
Time:
9:00 am – 11:00 am
Event Category: Ph.D. Presentations

Tu, H. (CSE) – From Evaluation to Adaptation: Building Reliable Multimodal Intelligence

Details

Organizers