News

  • 5/1/2024 🎉 APISERVE was accepted to ICML 2024!
  • 2/4/2024 Arixv release of APISERVE: Efficient API Support for Large-Language Model Inferencing
Today’s large language model (LLM) usages are largely augmented. First, LLMs are increasingly integrated with external tools and agents like ChatGPT plugins and an image-generation model to extend their capability beyond language-centric tasks. Second, an LLM can often be called multiple times in a sequence to carry out a conversation or to decompose a complex task into sub-tasks. However, today’s LLM serving systems are designed for standalone LLMs. They treat any interceptions to an LLM as starting a new request, causing unnecessary recomputation of already computed contexts.