Papers

[rStar2-Agent: Agentic Reasoning Technical Report 🔗](https://arxiv.org/abs/2508.20722)

rStar2-Agent: Teaching AI to Think Smarter, Not Just Longer

In the quest for more intelligent AI, we’ve often equated thinking with generating longer and more detailed chains of thought. The prevailing idea was: if a model “thinks longer,” it will eventually arrive at the right answer. This approach has driven substantial progress — but it has a fundamental ceiling. For truly complex problems — those that require creative leaps, checking intermediate steps, or course-correcting from a flawed path — simply extending a monologue isn’t enough. ...

[Visual Story-Writing: Writing by Manipulating Visual Representations of Stories 🔗](https://arxiv.org/abs/2410.07486)

Visual Story-Writing: Editing Narratives by Manipulating Interactive Story Maps

Creative writing is a juggling act. Authors must manage an intricate web of character arcs, plot points, locations, and timelines. Maintaining consistency across all these interconnected elements is a monumental task—especially when experimenting with new ideas. A seemingly small change, like moving a character to a different location, can trigger a cascade of edits, forcing the writer to hunt down every related sentence to preserve narrative coherence. To cope, many writers build external aids—scribbled timelines, relationship charts, or spreadsheets—to track their story worlds. But these tools are disconnected from the actual text. What if you could bridge this gap? What if you could rearrange a timeline or move a character on a map and watch those changes update your manuscript automatically? ...

[Universal Deep Research: Bring Your Own Model and Strategy 🔗](https://arxiv.org/abs/2509.00244)

Take Control: Build Your Own AI Research Assistant

AI-powered research assistants—like Perplexity, Gemini’s Deep Research, and others—are remarkable tools. You type in a question, and they return a polished, source-backed report. In the background, they scour the web, synthesize information, and deliver the findings in a neat, structured format. But have you ever asked yourself: What’s actually going on under the hood? How do these systems decide what queries to run, which sources to trust, and how to structure the report? The answer: in most current tools, you don’t get to know, and you definitely don’t get to change it. ...

[Disentangling the Factors of Convergence between Brains and Computer Vision Models 🔗](https://arxiv.org/abs/2508.18226)

How AI Vision Models Learn to See Like Humans: The Three Keys to Brain-Like Intelligence

Modern AI models for computer vision have become astonishingly good at recognizing objects, segmenting scenes, and even generating photorealistic images. What’s truly fascinating is that their internal workings—the complex patterns of artificial neuron activations—often bear a striking resemblance to neural activity in the human brain when viewing the same stimuli. This is not just coincidence; it’s a clue about the deep principles of information processing. For years, scientists have observed this brain–AI similarity, but the reason why has remained elusive. Is the resemblance driven by the model’s architecture, the sheer amount of training data, or the type of data it sees? Previous studies often examined pre-trained models where all these factors varied together, making it impossible to isolate their effects. ...

[Neural Turing Machines 🔗](https://arxiv.org/abs/1410.5401)

Teaching Neural Networks to Think Like Computers: Neural Turing Machines

For decades, neural networks have proven to be extraordinary pattern-recognition machines. They can classify images, translate languages, and even generate creative text. However, they’ve historically struggled with tasks that a first-year computer science student would find trivial—like copying a sequence of data, sorting a list, or performing associative recall. Why? Because traditional neural networks, even powerful ones like LSTMs, lack a fundamental component of classical computers: an external, addressable memory. They have to cram all their knowledge into the weights of their neurons, which is like trying to do complex calculations using only a mental scratchpad. ...

[Drivelology: Challenging LLMs with Interpreting Nonsense with Depth 🔗](https://arxiv.org/abs/2509.03867)

Drivelology: When AI Meets 'Nonsense with Depth'

Large Language Models (LLMs) like GPT-4 and Claude 3 can write essays, translate languages, and generate code with stunning fluency. They seem to understand us perfectly. But do they? When we move beyond straightforward questions and into the messy, creative, and often absurd world of human communication, do these models truly grasp meaning—or are they just masters of statistical pattern matching? A recent research paper, “Drivelology: Challenging LLMs with Interpreting Nonsense with Depth,” dives headfirst into this question. The authors introduce a fascinating linguistic concept they call Drivelology: utterances that are “nonsense with depth.” These are statements that seem absurd on the surface but hide layers of meaning, humor, or social commentary. ...

[UI-TARS-2 Technical Report: Advancing GUI Agents with Multi-Turn Reinforcement Learning 🔗](https://arxiv.org/abs/2509.02544)

UI-TARS-2: Teaching AI to Master Your Computer Through Trial and Error

Imagine an AI that can use your computer just like you do—browsing websites, managing files, playing games, and even writing code. This isn’t science fiction; it’s the frontier of AI research, where GUI agents are being developed to autonomously operate graphical user interfaces. But building such an agent is incredibly hard. How do you gather enough training data? How do you teach it to learn from mistakes over long, complex tasks? And how do you create a stable environment for it to practice in without constant crashes? ...

Why AI Confidently Lies: The Math Behind Language Model Hallucinations

You’ve probably seen it happen: you ask a large language model (LLM) a simple factual question, and it confidently gives you an answer that’s plausible, detailed—and completely wrong. This behavior, known as hallucination, is one of the biggest barriers to trusting and relying on AI systems today. It’s a lot like asking a student a tough exam question: instead of admitting they don’t know, they try to bluff their way to partial credit with a polished but fabricated answer. ...