[Jailbreaking LLMs with Arabic Transliteration and Arabizi 🔗](https://arxiv.org/abs/2406.18725)

Lost in Transliteration — How Arabizi Bypasses LLM Safety Filters

Large Language Models (LLMs) like GPT-4 and Claude 3 are designed to be helpful, but they are also designed to be safe. If you ask these models to write a guide on how to create malware or build a bomb, they are trained to refuse. This safety training, often achieved through Reinforcement Learning from Human Feedback (RLHF), acts as a firewall around the model’s vast knowledge. However, security researchers are constantly searching for cracks in this firewall. While most safety training focuses heavily on English, a new vulnerability has emerged in the linguistic “blind spots” of these models. ...

2024-06 · 8 min · 1509 words
[Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs 🔗](https://arxiv.org/abs/2403.05020)

God Mode vs. Reality: Why AI Social Simulations Are Failing the Turing Test of Social Intelligence

Imagine a virtual town populated entirely by AI agents. They wake up, go to work, gossip at the coffee shop, and negotiate prices at the market. It sounds like science fiction—specifically, like Westworld or The Sims powered by supercomputers—but recent advances in Large Language Models (LLMs) have brought us tantalizingly close to this reality. Researchers and developers are increasingly using LLMs to simulate complex social interactions. These simulations are used for everything from training customer service bots to modeling economic theories and testing social science hypotheses. The assumption is simple: if an LLM can write a convincing dialogue between two people, it can simulate a society. ...

2024-03 · 10 min · 2110 words
[Is This a Bad Table? A Closer Look at the Evaluation of Table Generation from Text 🔗](https://arxiv.org/abs/2406.14829)

Beyond Rows and Columns: A New Way to Judge AI-Generated Tables

Introduction Imagine you are asking a Large Language Model (LLM) to summarize a complex financial report into a neat, easy-to-read table. The model churns out a grid of numbers and headers. At a glance, it looks perfect. The columns align, the formatting is crisp, and the headers look professional. But is it actually good? In the world of Natural Language Processing (NLP), we have become very good at generating text. However, generating structured data—like tables—from unstructured text is a different beast. More importantly, evaluating whether that generated table is accurate is a notoriously difficult problem. ...

2024-06 · 8 min · 1636 words
[Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering 🔗](https://arxiv.org/abs/2410.03466)

The Safety Trap: Why Guardrails Might Be Making AI Worse at Fighting Hate Speech

In the rapidly evolving landscape of Large Language Models (LLMs), there is a constant tug-of-war between two primary objectives: making models helpful and making them harmless. We want our AI assistants to answer our questions accurately, but we also want to ensure they don’t spew toxicity, bias, or dangerous instructions. To achieve this, developers implement “safety guardrails”—alignment techniques and system prompts designed to keep the model polite and safe. But what happens when the task requires engaging with toxic content to neutralize it? ...

2024-10 · 8 min · 1646 words
[Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment 🔗](https://arxiv.org/abs/2402.14016)

Hacking the Judge: How Universal Adversarial Attacks Fool LLM Evaluators

Hacking the Judge: How Universal Adversarial Attacks Fool LLM Evaluators In the rapidly evolving world of Artificial Intelligence, Large Language Models (LLMs) have taken on a new role: the judge. We use powerful models like GPT-4 and Llama 2 not just to write code or poetry, but to evaluate the quality of text generated by other models. This paradigm, known as “LLM-as-a-judge,” is becoming a standard for benchmarking and even grading student essays or exams. ...

2024-02 · 10 min · 1921 words
[Is It Really Long Context if All You Need Is Retrieval? 🔗](https://arxiv.org/abs/2407.00402)

The Long-Context Illusion: Why Length Isn't the Only Thing That Matters

In the rapidly evolving world of Large Language Models (LLMs), we are currently witnessing a “context window arms race.” Not long ago, a model that could remember 2,000 words was impressive. Today, we have models boasting context windows of 128k, 200k, or even 1 million tokens. The promise is alluring: you can feed an entire novel, a codebase, or a legal archive into a model and ask questions about it. But this technical leap forces us to ask a critical question: Does a longer input capacity equal better understanding? ...

2024-07 · 6 min · 1253 words
[Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models? 🔗](https://arxiv.org/abs/2406.12822)

Lost in Translation: Why Multilingual LLMs Need Native Data, Not Just Translations

If you’ve ever used Google Translate to finish a Spanish assignment or interpret a menu in Tokyo, you know the results are usually functional but often lack “soul.” The grammar might be correct, but the cultural nuance—the idiom, the local context, the vibe—is often lost. In the world of Large Language Models (LLMs), we are facing a similar crisis on a massive scale. We want LLMs to speak every language fluently. However, gathering high-quality training data in languages like Russian, Chinese, or Swahili is much harder than finding it in English. The industry standard solution? Take high-quality English data and machine-translate it into the target language. ...

2024-06 · 8 min · 1521 words
[Is Child-Directed Speech Effective Training Data for Language Models? 🔗](https://arxiv.org/abs/2408.03617)

The Data Gap: Can Language Models Learn Like Children?

The Data Gap: Can Language Models Learn Like Children? If you have ever watched a toddler learn to speak, it feels nothing short of miraculous. By the time a child is 10 years old, they have likely heard somewhere between 10 million and 100 million words. From this relatively small dataset, they achieve fluency, understand complex grammar, and grasp nuance. Contrast this with the Large Language Models (LLMs) we use today, like GPT-4 or Llama. These models are typically trained on hundreds of billions, sometimes trillions, of words. They require a dataset several orders of magnitude larger than a human child to achieve comparable (or sometimes still inferior) linguistic competence. ...

2024-08 · 7 min · 1473 words
[Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning 🔗](https://arxiv.org/abs/2410.07461)

Pruning LLMs: Why Your Calibration Data Matters More Than You Think

Introduction In the current era of Artificial Intelligence, Large Language Models (LLMs) like Llama 2 and GPT-4 have transformed how we interact with technology. However, their capabilities come at a steep cost: hardware resources. A 7-billion parameter model can require upwards of 10GB of memory just to load, making it inaccessible for most consumer edge devices or mobile phones. To solve this, researchers turn to network pruning—a compression technique that removes “unimportant” weights from a model to reduce its size and speed up inference. Modern pruning algorithms are surprisingly effective, capable of removing 50% or more of a model’s parameters with minimal loss in intelligence. ...

2024-10 · 8 min · 1654 words
[Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models 🔗](https://arxiv.org/abs/2410.03176)

Seeing Ghosts: Why AI Models Hallucinate Objects and How to Fix Their Eyes

Imagine asking an AI to describe a photo of a living room. It correctly identifies the sofa, the television, and the coffee table. But then, it confidently adds, “and there is a cat sleeping on the rug.” You look closely. There is no cat. There has never been a cat. This phenomenon is known as Object Hallucination. It is one of the most persistent and dangerous problems in Large Vision-Language Models (LVLMs) like LLaVA or GPT-4V. In high-stakes fields like medical imaging or autonomous driving, a hallucinated tumor or a non-existent pedestrian can be catastrophic. ...

2024-10 · 9 min · 1798 words
[Investigating Mysteries of CoT-Augmented Distillation 🔗](https://arxiv.org/abs/2406.14511)

Why Does Chain-of-Thought Distillation Work? (Hint: It’s Not Logic)

Introduction In the current landscape of Large Language Models (LLMs), “Chain of Thought” (CoT) prompting has become a dominant paradigm. We have all seen the magic: if you ask a model like GPT-4 to “think step-by-step,” its ability to solve complex math word problems or commonsense reasoning tasks improves dramatically. Naturally, researchers asked the next logical question: Can we use these reasoning chains to teach smaller models? This process is known as CoT-Augmented Distillation. The idea is simple: you take a massive “teacher” model (like GPT-4 or Mistral), generate questions with step-by-step rationales, and then fine-tune a tiny “student” model (like GPT-2 or a 2B parameter model) on that data. The hope is that the student won’t just learn the answer; it will learn how to think. ...

2024-06 · 9 min · 1903 words
[Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand Multilingual Instructions? 🔗](https://arxiv.org/abs/2402.13703)

Breaking the Language Barrier: Do Polyglot AI Models Need Polyglot Teachers?

Introduction In the rapidly evolving landscape of Large Language Models (LLMs), there is a distinct imbalance. While models like GPT-4 and Llama 2 dazzle us with their capabilities, they are predominantly “English-centric.” They are trained on vast oceans of English text, and their ability to follow instructions in other languages often feels like an afterthought—a side effect of translation rather than a core feature. But the world speaks more than just English. For an AI to be a truly global assistant, it must be “polyglot”—capable of understanding and generating fluent, culturally nuanced text in multiple languages. ...

2024-02 · 9 min · 1899 words
[Investigating Large Language Models for Complex Word Identification in Multilingual and Multidomain Setups 🔗](https://arxiv.org/abs/2411.01706)

Can LLMs Judge Difficulty? A Deep Dive into Complex Word Identification

Imagine you are learning a new language. You pick up a newspaper, start reading, and suddenly hit a wall. There is a word you just don’t understand. It disrupts your flow and comprehension. Now, imagine a computer system that could scan that text before you read it, identify those difficult words, and automatically replace them with simpler synonyms. This is the goal of Lexical Simplification, and its first, most crucial step is Complex Word Identification (CWI). ...

2024-11 · 7 min · 1457 words
[Investigating LLMs as Voting Assistants via Contextual Augmentation: A Case Study on the European Parliament Elections 2024 🔗](https://arxiv.org/abs/2407.08495)

Can We Trust AI to Help Us Vote? Auditing LLMs in the 2024 European Elections

Introduction In the age of information overload, making an informed political decision is becoming increasingly difficult. During major political events, such as the 2024 European Parliament elections, voters are bombarded with manifestos, debates, and media commentary. To navigate this, many citizens turn to Voting Advice Applications (VAAs). These are traditional, rule-based web applications where users answer a fixed questionnaire (e.g., “Do you support the Euro?”), and the system matches them with the political party that best aligns with their views. ...

2024-07 · 8 min · 1555 words
[Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis 🔗](https://arxiv.org/abs/2407.15286)

Is Your AI Actually Moral, or Just Pretending? The Mechanics of Self-Correction

Large Language Models (LLMs) have a bit of a reputation problem. While they can write poetry and code, they are also prone to hallucination and, more concerningly, perpetuating stereotypes, discrimination, and toxicity. To combat this, the field has rallied around a technique called Intrinsic Moral Self-Correction. The idea is elegantly simple: ask the model to double-check its work. By appending instructions like “Please ensure your answer is unbiased,” models often produce significantly safer outputs. It feels like magic—the model seemingly “reflects” and fixes itself without any external human feedback or fine-tuning. ...

2024-07 · 7 min · 1280 words
[Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations 🔗](https://arxiv.org/abs/2408.15232)

Beyond the Search Bar: How Watching AI Agents Argue Helps Us Learn Better

Introduction We live in the golden age of answers. If you want to know the population of Brazil or the boiling point of tungsten, a quick Google search or a prompt to ChatGPT gives you the answer instantly. These systems excel at addressing known unknowns—information gaps you are aware of and can articulate into a specific question. But what about the unknown unknowns? These are the concepts, connections, and perspectives you don’t even know exist. How do you ask a question about a topic when you don’t know the vocabulary? How do you explore the implications of a new technology if you don’t know the economic or ethical frameworks surrounding it? ...

2024-08 · 7 min · 1351 words
[Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding 🔗](https://arxiv.org/abs/2410.15609)

Making Voice Assistants Truly Robust: A Causal Approach to Speech Noise Injection

Imagine you are asking your smart home assistant to “add cereal to the shopping list.” Instead, it dutifully adds “serial” to your list. While this is a minor annoyance for a user, for the underlying Artificial Intelligence, it is a catastrophic failure of understanding. This phenomenon stems from errors in Automatic Speech Recognition (ASR). While modern Pre-trained Language Models (PLMs) like BERT or GPT are incredibly smart at understanding text, they are often trained on clean, perfect text. When they are fed messy, error-prone transcriptions from an ASR system, their performance nosedives. ...

2024-10 · 9 min · 1761 words
[Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions 🔗](https://arxiv.org/abs/2402.15055)

The Handshake Inside the Machine: How Attention Heads and MLPs Collaborate to Predict the Next Token

The interior of a Large Language Model (LLM) is often described as a “black box.” We know what goes in (a prompt) and we know what comes out (a coherent continuation), but the billions of calculations in between remain largely opaque. For students and researchers in Natural Language Processing (NLP), this opacity is a problem. If we don’t know how a model works, we can’t fully trust it, fix it when it hallucinates, or prevent it from exhibiting bias. ...

2024-02 · 9 min · 1783 words
[Interpretable Composition Attribution Enhancement for Visio-linguistic Compositional Understanding 🔗](https://aclanthology.org/2024.emnlp-main.810.pdf)

Why CLIP Can't Read Between the Lines: Fixing Compositional Reasoning in Vision-Language Models

Introduction Imagine showing a picture of a horse riding on a person (a strange image, granted) to a state-of-the-art AI model. Then, you ask the model to pick the correct caption between two options: “a person riding a horse” and “a horse riding a person.” Ideally, this should be easy. The nouns are the same, but the relationship is flipped. However, most modern Vision-Language Models (VLMs), including the famous CLIP, struggle significantly with this. They act like “Bag-of-Words” models—they see “horse,” they see “person,” and they declare a match, completely ignoring the syntax or the relationship described by the verb “riding.” ...

10 min · 1981 words
[Interpretability-based Tailored Knowledge Editing in Transformers 🔗](https://aclanthology.org/2024.emnlp-main.225.pdf)

Surgical Precision for LLMs—How Tailored Knowledge Editing Fixes Facts Without Breaking Models

Large Language Models (LLMs) like GPT-4 or LLaMA are often described as modern-day encyclopedias. They store vast amounts of information about the world, from historical dates to scientific constants. But there is a fundamental flaw in this analogy: unlike a digital encyclopedia that can be updated with a few keystrokes, an LLM is frozen in time. What happens when the Prime Minister changes? What if the model learned incorrect information during training? Or worse, what if it memorized private user data that needs to be scrubbed? ...

8 min · 1577 words