Retrieval Augmented Generation (RAG), Retrieval Augmented Language Models (RALM), and Vector Stores are a thing of the past. A NEW AI Breakthrough UNFOLDS in this explanatory video on the latest insight in AI.
The Next Evolutionary step promises to be amazing. And a beautiful solution to all our current shortcomings (RAG, Vector Store) and AI problems.
Supported by an amazing compute optimization for CUDA Kernels, Tensor parallelism, Unified Paging (unified memory pool for LoRA Adapter weight tensors and KV cache) and minimize latency when batching different LoRA Adapters w/ NEW S-LoRA (by Stanford Univ, UC Berkeley, ..).
Literature:
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
https://arxiv.org/abs/2311.03285
#future
#challenge
#ai
32 Comments