
Modern AI systems demand low-latency high-quality retrieval and serving over billion-scale keys and vectors. This proposal studies learned hashing and overlay networks to co-locate semantically related items and steer queries with minimal coordination. We first present LEAD, to our knowledge the first use of order-preserving learned hash functions in distributed key-value overlays, enabling efficient range queries and cutting hops/messages by 80–90% in prototypes while retaining balance and churn resilience. Second, Vortex applies learned hashing to approximate nearest-neighbor retrieval: a self-organizing overlay binding learned keys to distributed HNSW indexes to achieve high recall at low fan-out. Third, PlanetServe introduces onion-style path setup with multi-path dispersal and cache-aware forwarding for open LLM serving, reducing TTFT and latency while preserving privacy. Planned work generalizes learned hashing to embedding partitions, token/KV caches, programmable switches, and storage tiers, and provides formal convergence, load-balancing, and monotonic-progress guarantees under skew and churn. We are also working to design the first knowledge delivery network for LLM serving: an overlay that unifies data placement, retrieval, and policy-aware routing across clusters and providers with tunable cost, privacy, and quality. Evaluation on real workloads at scale will measure recall, tail latency, cost, and robustness, targeting a predictable, elastic, scalable AI-native retrieval and serving stack.
Event Host: Shengze Wang, Ph.D. Student, Computer Science & Engineering
Advisor: Chen Qian
Zoom: https://ucsc.zoom.us/j/5455463199?pwd=bHRVM01Vd20rcVpkc0FQY01kZG1UUT09&omn=98106984546
Passcode: 2121