Tu, H. (CSE) – From Evaluation to Adaptation: Building Reliable Multimodal Intelligence
Multimodal large language models (MLLMs) are rapidly becoming general-purpose AI systems, yet their capabilities are advancing faster than our ability to evaluate, improve, and validate their reliability in realistic use. Standard benchmarks mainly measure in-distribution final-answer accuracy, leaving critical gaps in safety, robustness, fine-grained reasoning evaluation, and reliability in real-world agentic settings. My research proposes […]