Li, X. (CSE) – Compute-Efficient Scaling of Fully-Open Visual Encoders
Vision encoders have demonstrated significant performance gains in visual generation and multimodal reasoning. These improvements are primarily attributed to the scaling of data, model capacity, and compute. However, this progress is becoming less accessible due to a lack of transparency in data curation and training recipes. In combination with the high compute requirements of foundation-scale […]