<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Reinforcement Learning on</title><link>https://mlsys.wuklab.io/tags/reinforcement-learning/</link><description>Recent content in Reinforcement Learning on</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Sun, 17 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://mlsys.wuklab.io/tags/reinforcement-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>Beat the Long Tail: Distribution-Aware Speculative Decoding for RL Training</title><link>https://mlsys.wuklab.io/posts/das/</link><pubDate>Sun, 17 May 2026 00:00:00 +0000</pubDate><guid>https://mlsys.wuklab.io/posts/das/</guid><description>Reinforcement learning post-training spends most of its wall-clock time generating answers, and a few very long generations dominate every training step. We designed &lt;a href="https://arxiv.org/abs/2511.13841" target="_blank">DAS [MLSys &amp;lsquo;26]&lt;/a>, a distribution-aware speculative decoding framework that speeds up RL rollouts without changing what the model learns. DAS uses a training-free drafter that rebuilds itself from recent rollouts and spends its speculation budget on the long generations that set the pace, cutting &lt;strong>rollout time by up to 50%&lt;/strong> with identical training curves. &lt;br/>&lt;br/> &lt;a href="https://mlsys.wuklab.io/posts/das/" target="_blank">Read More&amp;hellip;&lt;/a></description></item></channel></rss>