Li, X. (CSE) – Compute-Efficient Scaling of Fully-Open Visual Encoders
Vision encoders have demonstrated significant performance gains in visual generation and multimodal reasoning. These improvements are primarily attributed to the scaling of data, model capacity, and compute. However, this progress […]