Serving
- Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism
- Can Scheduling Overhead Dominate LLM Inference Performance? A Study of CPU Scheduling Overhead on Two Popular LLM Inference Systems
- Preble: Efficient Prompt Scheduling for Augmented Large Language Models
- Efficient Augmented LLM Serving With InferCept