](https://deep-paper.org/en/paper/file-2807/images/cover.png)
Building the Ladder as You Climb: How Bootstrapped Policy Learning Solves Hard Dialogue Tasks
Introduction Imagine you are trying to teach a computer how to handle complex customer service calls—for example, booking a multi-leg flight while simultaneously reserving a hotel and buying tickets for a local attraction. In the world of Artificial Intelligence, specifically Task-Oriented Dialogue (ToD) systems, this is a massive challenge. The standard approach is Reinforcement Learning (RL). The AI talks to a user simulator, tries to fulfill the request, and gets a “reward” (a positive signal) only if it completes the entire task perfectly. If it fails, it gets nothing or a penalty. This is known as the sparse reward problem. It is akin to trying to learn how to play a piano concerto by hitting random keys and only being told “good job” if you accidentally play the whole piece perfectly on the first try. ...
](https://deep-paper.org/en/paper/2406.11375/images/cover.png)
](https://deep-paper.org/en/paper/2410.12048/images/cover.png)
](https://deep-paper.org/en/paper/2402.11129/images/cover.png)
](https://deep-paper.org/en/paper/file-2803/images/cover.png)
](https://deep-paper.org/en/paper/file-2802/images/cover.png)
](https://deep-paper.org/en/paper/file-2801/images/cover.png)
](https://deep-paper.org/en/paper/2407.10241/images/cover.png)
](https://deep-paper.org/en/paper/2406.15718/images/cover.png)
](https://deep-paper.org/en/paper/2409.15594/images/cover.png)
](https://deep-paper.org/en/paper/file-2797/images/cover.png)
](https://deep-paper.org/en/paper/2411.00173/images/cover.png)
](https://deep-paper.org/en/paper/2403.18252/images/cover.png)
](https://deep-paper.org/en/paper/2410.05183/images/cover.png)
](https://deep-paper.org/en/paper/2407.10920/images/cover.png)
](https://deep-paper.org/en/paper/2406.19764/images/cover.png)
](https://deep-paper.org/en/paper/file-2791/images/cover.png)
](https://deep-paper.org/en/paper/2404.14716/images/cover.png)
](https://deep-paper.org/en/paper/2411.04424/images/cover.png)
](https://deep-paper.org/en/paper/file-2788/images/cover.png)