Beat the Long Tail: Distribution-Aware Speculative Decoding for RL Training

Sun, 17 May 2026 00:00:00 +0000

Reinforcement learning post-training spends most of its wall-clock time generating answers, and a few very long generations dominate every training step. We designed DAS [MLSys ‘26], a distribution-aware speculative decoding framework that speeds up RL rollouts without changing what the model learns. DAS uses a training-free drafter that rebuilds itself from recent rollouts and spends its speculation budget on the long generations that set the pace, cutting rollout time by up to 50% with identical training curves.

Read More…

Reinforcement Learning on

Beat the Long Tail: Distribution-Aware Speculative Decoding for RL Training