Today’s large language model (LLM) usages are largely augmented. First, LLMs are increasingly integrated with external tools and agents like ChatGPT plugins and an image-generation model to extend their capability beyond language-centric tasks. Second, an LLM can often be called multiple times in a sequence to carry out a conversation or to decompose a complex task into sub-tasks. However, today’s LLM serving systems are designed for standalone LLMs. They treat any interceptions to an LLM as starting a new request, causing unnecessary recomputation of already computed contexts.