Zhou, K. (CSE) – Toward Safer Frontier AI: From Evaluation and Red-Teaming to Alignment and Oversight
This dissertation investigates how to make modern AI systems safer as they grow more capable. It addresses two central sources of risk: malicious misuse, in which adversarial users coerce models […]