Zhou, K. (CSE) – Toward Safer Frontier AI: From Evaluation and Red-Teaming to Alignment and Oversight

Name: Zhou, K. (CSE) – Toward Safer Frontier AI: From Evaluation and Red-Teaming to Alignment and Oversight
Start: 2026-05-29T11:00:00-07:00
End: 2026-05-29T12:30:00-07:00

May 29 @ 11:00 am – 12:30 pm

Virtual Event

Abstract digital illustration featuring gears and interconnected technology elements.

This dissertation investigates how to make modern AI systems safer as they grow more capable. It addresses two central sources of risk: malicious misuse, in which adversarial users coerce models into harmful behavior, and internal misalignment, in which models themselves pursue goals that diverge from human intent through deception, sandbagging, or other covert behaviors. The dissertation identifies novel safety risks in frontier multimodal large language models and AI agents, introduces a black-box red-teaming framework for AI agents, proposes new safety alignment algorithms, and builds the first probe-based misalignment monitoring system, developing practical approaches for evaluating, red-teaming, aligning, and overseeing frontier language models and agents. The central conclusion is that responsible AI cannot rest on any single guardrail: capability-scaled evaluation, active red-teaming, training-time alignment, and scalable monitoring together form a coordinated stack for frontier AI safety.

Event Host: Kaiwen Zhou, Ph.D. Candidate, Computer Science & Engineering

Advisor: Xin Wang

Zoom: https://ucsc.zoom.us/j/94196702062?pwd=b9LJMfL232ixG2THMab8XuJ32a4FVD.1

Passcode: 584794

Details

Date: May 29
Time:
11:00 am – 12:30 pm
Event Category: Ph.D. Presentations

Zhou, K. (CSE) – Toward Safer Frontier AI: From Evaluation and Red-Teaming to Alignment and Oversight

Details

Organizers