Zheng, K. (CSE) – Towards Generalist Embodied World Models: From Neuro-Symbolic Interaction to Self-Evolving 3D World Generation

Artificial intelligence is moving beyond passive perception toward systems that can understand, interact with, and generate the world. This dissertation studies generalist embodied world models that connect language, vision, action, and 3D scene representations. It explores how multimodal systems can ground human instructions in physical environments, reason over long-horizon tasks, generate coherent text-and-visual content, and construct spatially consistent 3D worlds from limited observations. Across embodied reasoning, multimodal generation, and 3D world construction, this dissertation develops methods that combine pretrained models with structured interfaces such as symbolic reasoning, generative visual tokens, spatial priors, and iterative self-refinement. These approaches aim to improve generalization, data efficiency, interpretability, and geometric consistency without relying solely on monolithic end-to-end training. Together, the work argues for a broader view of embodied AI: intelligent systems should not only recognize or describe the world, but also act within it, imagine it, and build reusable representations of it.
Event Host: Kaizhi Zheng, Ph.D. Candidate, Computer Science & Engineering
Advisor: Xin Eric Wang
Zoom: https://ucsc.zoom.us/j/91912825272?pwd=aps1YHcJKMaqmhtgl72f51K9EbxrHt.1
Passcode: 991132