
Diffusion-based text-to-image models can generate impressive images, but they largely treat an image as a single, flat output, which makes precise editing of individual elements difficult. This proposal studies layered generative representations that align with professional editing workflows, enabling users to manipulate foreground objects while preserving the rest of the scene. A central focus is visual effects such as shadows and reflections, which are essential for realistic composition yet are often missing or inconsistent in current generative pipelines. This proposal outlines a research program toward controllable, compositional image generation that supports practical, edit-ready content creation.
Event Host: Jinrui Yang, Ph.D. Student, Computer Science and Engineering
Advisor: Yuyin Zhou
Zoom- https://ucsc.zoom.us/j/91510964517?pwd=NG5Urv2li9HxlcUKrybg6Z5ZtYj9e6.1
Passcode- 544143