[Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models 🔗](https://arxiv.org/abs/2410.13343)

Are Your LLMs Cheating? Understanding Shortcut Learning in Large Language Models

Large Language Models (LLMs) like GPT-4, Gemini, and LLaMA have taken the world by storm. We marvel at their ability to write code, compose poetry, and reason through complex logic. But there is a lingering question in the AI research community: Are these models actually understanding the content, or are they just really good at guessing based on superficial patterns? Imagine a student who aces a history exam not because they understand the geopolitical causes of a war, but because they memorized that every time the word “treaty” appears in a question, the answer is “C”. This is effective for that specific test, but useless in the real world. In machine learning, this phenomenon is called Shortcut Learning. ...

2024-10 · 10 min · 2037 words
[Do LLMs Know to Respect Copyright Notice? 🔗](https://arxiv.org/abs/2411.01136)

The Copyright Blind Spot: Do LLMs Ignore 'All Rights Reserved' in Your Prompts?

Introduction Imagine you are using a powerful Large Language Model (LLM) like GPT-4 or LLaMA-3. You have a PDF of a newly released, copyrighted novel, and you paste a chapter into the chat window. You ask the model to translate it into French or paraphrase it for a blog post. The document clearly states “All Rights Reserved” at the top. Does the model stop and refuse? Or does it proceed, acting as a high-tech tool for copyright infringement? ...

2024-11 · 8 min · 1594 words
[Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets 🔗](https://arxiv.org/abs/2205.11472)

Why You Need More Topics, Not Just More Data: A New Approach to Argument Mining

In the world of Machine Learning, there is a pervasive mantra: “More data is better.” If your model isn’t performing well, the standard advice is often to throw more training samples at it. But in specialized fields like Natural Language Processing (NLP), acquiring high-quality data is neither easy nor cheap. This is particularly true for Topic-Dependent Argument Mining (TDAM). Teaching a machine to recognize whether a specific sentence is an argument for or against a complex topic (like “nuclear energy” or “minimum wage”) requires nuance. You cannot simply scrape the web and hope for the best; you usually need human experts to label the data. This process is expensive and time-consuming. ...

2022-05 · 8 min · 1664 words
[Distributional Properties of Subword Regularization 🔗](https://arxiv.org/abs/2408.11443)

Why Your Tokenizer is Biased (and How Uniform Sampling Fixes It)

If you have ever trained a modern Natural Language Processing (NLP) model, you have likely used a subword tokenizer. Whether it is Byte-Pair Encoding (BPE), WordPiece, or UnigramLM, tokenization is the invisible foundation upon which our massive language models act. We often treat tokenization as a solved preprocessing step—a static lookup that turns text into IDs. But what if the way we feed tokens to a model during training is limiting its potential? ...

2024-08 · 8 min · 1514 words
[Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation 🔗](https://arxiv.org/abs/2402.01512)

The Art of Being Wrong: How AI Generates Distractors for Multiple-Choice Questions

If you have ever taken a multiple-choice exam, you know the drill: you read a question (the stem), identify the correct answer, and ignore the other options. Those incorrect options have a specific name: distractors. For a student, distractors are merely hurdles. For an educator, however, creating them is a massive design challenge. A good distractor must be plausible enough to test the student’s understanding but clearly incorrect to avoid ambiguity. If the distractors are too easy, the test is useless; if they are confusingly similar to the answer, the test is unfair. ...

2024-02 · 7 min · 1457 words
[Distract Large Language Models for Automatic Jailbreak Attack 🔗](https://arxiv.org/abs/2403.08424)

The Trojan Horse of AI: How Distraction Can Jailbreak Large Language Models

Large Language Models (LLMs) like ChatGPT, Claude, and LLaMA have become incredibly powerful tools for writing, coding, and analysis. To ensure these tools are safe, developers spend vast resources “aligning” them—training them to refuse harmful requests, such as instructions for illegal acts or hate speech. But what if the very mechanism that allows LLMs to process complex information—their attention span—is also their Achilles’ heel? In a fascinating paper titled “Distract Large Language Models for Automatic Jailbreak Attack,” researchers from the Shanghai University of Finance and Economics and the Southern University of Science and Technology propose a novel method to bypass these safety guardrails. Their framework, called DAP (Distraction-based Adversarial Prompts), essentially performs a magic trick on the AI: it distracts the model with a complex, harmless story while sneaking a malicious request in through the back door. ...

2024-03 · 7 min · 1413 words
[Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP 🔗](https://arxiv.org/abs/2307.09233)

Teaching CLIP to See—How Stable Diffusion Can Fix Vision-Language Reasoning

If you have dabbled in computer vision or multimodal AI recently, you have undoubtedly encountered CLIP (Contrastive Language-Image Pre-training). Since its release by OpenAI, CLIP has become the backbone of modern AI image systems. It powers zero-shot classification, image retrieval, and serves as the “eyes” for many generative pipelines. But CLIP has a secret weakness. While it is excellent at recognizing objects (knowing that an image contains a “dog” and a “couch”), it is surprisingly bad at understanding the relationship between them. If you show CLIP an image of a dog sitting on a couch and ask it to distinguish between “a dog on a couch” and “a couch on a dog,” it often guesses randomly. This phenomenon acts as a “bag-of-words” model—it checks for the presence of words but ignores the syntax and spatial structure that give a sentence its specific meaning. ...

2023-07 · 8 min · 1564 words
[Discovering Knowledge-Critical Subnetworks in Pretrained Language Models 🔗](https://arxiv.org/abs/2310.03084)

Brain Surgery for LLMs: How to Find and Remove Specific Knowledge

Large Language Models (LLMs) like GPT-4 or Llama are often described as “black boxes.” We know they work—they can write poetry, debug code, and tell you the capital of France—but we don’t fully understand how they store that information. When an LLM “knows” that a cafe is a type of restaurant, is that fact stored in a specific cluster of neurons? Or is it smeared across the entire network like jam on toast? ...

2023-10 · 9 min · 1863 words
[Direct Multi-Turn Preference Optimization for Language Agents 🔗](https://arxiv.org/abs/2406.14868)

Beyond Single Turns - How DMPO Adapts Direct Preference Optimization for Long-Horizon Agents

Introduction We are currently witnessing a paradigm shift in Large Language Models (LLMs). We are moving from “chatbots”—models that answer a single query—to Language Agents. These are systems capable of browsing the web to buy products, conducting scientific experiments in simulation, or managing complex workflows. However, training these agents is significantly harder than training a standard chatbot. While a chatbot only needs to get the next token right, an agent must take a sequence of correct actions to reach a goal. If an agent makes a mistake in step 3 of a 20-step process, the entire trajectory might fail. This introduces the problem of compounding errors, where small deviations from an optimal path spiral into complete failure. ...

2024-06 · 11 min · 2321 words
[Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction 🔗](https://arxiv.org/abs/2410.18481)

Reverse-Engineering the Chatbot: How Dialog2Flow Extracts Logic from Conversation Logs

Introduction: The Black Box of Conversation In the rapidly evolving landscape of Artificial Intelligence, conversational agents—from customer service chatbots to sophisticated virtual assistants—have become ubiquitous. We interact with them daily to book flights, check bank balances, or troubleshoot technical issues. These are known as Task-Oriented Dialogs (TOD). Behind every effective task-oriented bot lies a structured workflow: a flowchart designed by human experts that dictates: “If the user asks for X, check for Y. If Y is missing, ask for Z.” This structure ensures the agent actually helps the user achieve their goal. However, designing these workflows is a tedious, manual process. Furthermore, with the advent of Large Language Models (LLMs), the “logic” is often buried within billions of parameters, turning the bot into a black box that is hard to control or debug. ...

2024-10 · 11 min · 2276 words
[DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions 🔗](https://arxiv.org/abs/2406.19356)

Why Students Fail: How AI is Learning to Diagnose and Generate Math Mistakes

If you have ever taken a multiple-choice math test, you know the feeling: you work through a problem, arrive at an answer, and look at the options. If your answer is there, you circle it. But what if your answer was wrong, yet it was still listed as an option? These incorrect options are called distractors. In high-quality education, distractors aren’t just random numbers; they are carefully crafted traps designed to catch specific misunderstandings. For example, if the question asks for \(2^3\), a good distractor is \(6\) (catching students who multiplied \(2 \times 3\)) rather than \(5\) (which is just a random error). ...

2024-06 · 7 min · 1468 words
[DetoxLLM: A Framework for Detoxification with Explanations 🔗](https://arxiv.org/abs/2402.15951)

Beyond Deletion: How DetoxLLM Rewrites Toxic Language While Preserving Meaning

Beyond Deletion: How DetoxLLM Rewrites Toxic Language While Preserving Meaning The comment section of the internet is notorious. From social media feeds to news article discussions, toxic language—hate speech, harassment, and offensive microaggressions—is a pervasive problem. The traditional solution has been simple: moderation. If a comment is toxic, an automated system flags it, and it gets deleted or hidden. But is deletion always the best answer? Sometimes, a user might have a valid point buried under a layer of aggression. Simply removing the text stifles the communication. A more sophisticated approach is text detoxification: rewriting the text to remove the toxicity while keeping the original semantic meaning intact. ...

2024-02 · 8 min · 1661 words
[Detection and Measurement of Syntactic Templates in Generated Text 🔗](https://arxiv.org/abs/2407.00211)

Beyond Words: Uncovering the Syntactic Templates Hidden in LLM Outputs

Beyond Words: Uncovering the Syntactic Templates Hidden in LLM Outputs If you have spent enough time interacting with Large Language Models (LLMs) like GPT-4 or Llama, you might have noticed a specific “vibe” to the text they produce. Even when the content is factually new, or the specific vocabulary is varied, there is often a sense of familiarity—a robotic cadence or a structural repetitiveness that distinguishes model output from human writing. ...

2024-07 · 7 min · 1417 words
[Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood 🔗](https://arxiv.org/abs/2406.19874)

FourierGPT: Detecting AI Text by Listening to the Rhythm of Probability

In the rapidly evolving world of Large Language Models (LLMs), we are playing a high-stakes game of “cat and mouse.” As models like GPT-4 become increasingly sophisticated, their ability to mimic human writing has reached a point where distinguishing between a human author and an AI is incredibly difficult. Traditionally, we catch these models by looking at likelihood—essentially asking, “How predictable is this text?” But as models get better, they stop making the kind of statistical errors that old detection methods relied on. They are starting to “sound” just like us. ...

2024-06 · 7 min · 1340 words
[Detecting Online Community Practices with Large Language Models: A Case Study of Pro-Ukrainian Publics on Twitter 🔗](https://aclanthology.org/2024.emnlp-main.1122.pdf)

Decoding Internet Culture: Can LLMs Understand the Nuance of Online Communities?

Introduction If you spend any time on social media, you know that a picture of a Shiba Inu dog isn’t always just a cute pet photo. In specific corners of the internet, it might be a political statement, a form of activism, or a “bonk” against propaganda. Similarly, a tweet predicting the winner of a song contest might actually be a coded expression of national solidarity during wartime. These are examples of practices—distinct patterns of behavior and linguistic expression that define online communities. They are rich in “social meaning,” relying on inside jokes, shared values, and specific vernacular that outsiders might miss completely. For computer scientists and sociologists, detecting these practices at scale is a massive challenge. Traditional text classification models often fail to grasp the sarcasm, the context, or the intent behind a post. ...

9 min · 1858 words
[Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors 🔗](https://arxiv.org/abs/2406.13009)

DEEP: A New Framework for Catching Hallucinations in LLM Summaries

Large Language Models (LLMs) like GPT-4, Claude 3, and LLaMA-2 have revolutionized text generation. They can draft emails, write code, and—crucially—summarize vast amounts of information. However, despite their linguistic fluency, these models suffer from a persistent and dangerous flaw: hallucinations. They frequently generate plausible-sounding but entirely fabricated information. In the context of text summarization, a hallucination is a “factual inconsistency”—a moment where the summary contradicts the source document or invents facts not present in it. While a typo in an email is embarrassing, a factual error in a medical or legal summary can be catastrophic. ...

2024-06 · 8 min · 1574 words
[Dependency Graph Parsing as Sequence Labeling 🔗](https://arxiv.org/abs/2410.17972)

Taming the Graph: How to Turn Complex Dependency Parsing into Simple Sequence Labeling

Introduction In the world of Natural Language Processing (NLP), simplicity is often a virtue, but reality is rarely simple. For years, researchers have strived to make syntactic parsing—the task of mapping the grammatical structure of a sentence—as efficient as possible. One of the most successful approaches has been sequence labeling (or tagging). If you can reduce a complex tree structure into a simple sequence of tags (one per word), you can use standard, highly optimized hardware and algorithms to parse sentences at lightning speeds. ...

2024-10 · 9 min · 1763 words
[Dense X Retrieval: What Retrieval Granularity Should We Use? 🔗](https://arxiv.org/abs/2312.06648)

Beyond the Passage: Why 'Propositions' Are the Future of RAG and Dense Retrieval

If you have been following the explosion of Large Language Models (LLMs), you are likely familiar with Retrieval-Augmented Generation (RAG). It is the standard architecture for building AI systems that “know” things outside of their training data. The formula is generally simple: a user asks a question, a retriever hunts down relevant text chunks from a database (like Wikipedia), and an LLM synthesizes an answer based on those chunks. However, there is a hidden variable in this formula that researchers often overlook: Granularity. ...

2023-12 · 9 min · 1800 words
[Demystifying Verbatim Memorization in Large Language Models 🔗](https://arxiv.org/abs/2407.17817)

Why Your LLM Can't Keep a Secret: The Science of Verbatim Memorization

In the world of Large Language Models (LLMs), there is a ghost in the machine. Sometimes, models like GPT-4 or Claude don’t just generate novel text—they recite specific training data word-for-word. This phenomenon, known as verbatim memorization, ranges from the innocuous (reciting the Gettysburg Address) to the legally hazardous (reproducing copyrighted code or private identifying information). For years, researchers have treated this as a bug to be squashed. The prevailing assumption has been that specific “bad” weights or neurons are hoarding these memories, and if we could just locate and prune them, the problem would vanish. ...

2024-07 · 9 min · 1749 words
[Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning 🔗](https://arxiv.org/abs/2402.04401)

Your Own Private Slice of the Brain: Democratizing LLMs with One PEFT Per User (OPPU)

Imagine you have a personal assistant who has been with you for ten years. When you ask them to “write an email to the boss,” they don’t need a ten-page style guide or a stack of your previous emails to get the tone right. They just know how you sound. They know you prefer “Best regards” over “Sincerely,” and that you tend to be concise on Mondays. Now, compare that to a Large Language Model (LLM) like GPT-4 or Llama-2. These models are incredibly capable, but they are “one-size-fits-all.” To make them sound like you, you usually have to stuff the prompt with examples of your writing or detailed instructions. This is the current state of personalization in AI: it’s mostly done through prompt engineering and context retrieval. ...

2024-02 · 9 min · 1740 words