CSE Colloquium – Safety Alignment of LMs via Non-cooperative Games
Presenter: Arman Zharmagambetov, Meta Abstract: Ensuring the safety of language models (LMs) while maintaining their usefulness remains a critical challenge in AI alignment. Current approaches rely on sequential adversarial training: […]