[A Learning Rate Path Switching Training Paradigm for Version Updates of Large Language Models 🔗](https://arxiv.org/abs/2410.04103)

Escaping the Update Trap: How Learning Rate Path Switching Keeps LLMs Fresh and Efficient

In the fast-moving world of Artificial Intelligence, a Large Language Model (LLM) is often only as good as its most recent data. We all know the frustration of asking a chatbot about a recent event, only to be told, “My knowledge cutoff is…” To keep models relevant, engineers must perform version updates. As new data continuously emerges, models need to ingest it. However, this creates a massive logistical and financial headache. Do you retrain the whole model from scratch every time (insanely expensive)? Or do you just train on the new data (computationally cheap, but often degrades performance)? ...

2024-10 · 8 min · 1543 words
[A Generic Method for Fine-grained Category Discovery in Natural Language Texts 🔗](https://arxiv.org/abs/2406.13103)

STAR: Illuminating Hidden Categories in Text with Comprehensive Semantic Similarities

Introduction Imagine you are building a digital life assistant. A user types: “I want to buy a vehicle suited for weekend field adventures.” Your system, trained on broad categories, successfully identifies the intent as “Buy Vehicle.” Based on this, it recommends a sleek, high-speed roadster. The user is frustrated. A roadster is terrible for “field adventures.” The system failed because it only understood the coarse-grained category (Vehicle) but missed the fine-grained nuance (Off-road Vehicle). ...

2024-06 · 9 min · 1786 words
[A Fast and Sound Tagging Method for Discontinuous Named-Entity Recognition 🔗](https://aclanthology.org/2024.emnlp-main.1087.pdf)

Taming the Broken Chain - A Fast, Sound Approach to Discontinuous NER

Introduction In the world of Natural Language Processing (NLP), Named-Entity Recognition (NER) is a cornerstone task. We typically ask models to read a sentence and highlight specific items—persons, organizations, locations, or medical symptoms. For years, the standard approach has been to treat these entities as solid blocks of text. If you see “New York City,” you draw a box around three consecutive words. But human language is rarely so tidy. Especially in specialized fields like biomedicine, entities often break apart, wrap around other words, or share components. Consider the phrase: “The patient suffered pain in the left and right knees.” ...

10 min · 2125 words
[A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery 🔗](https://arxiv.org/abs/2406.10833)

Beyond Chatbots—How LLMs are Re-Engineering the Scientific Method

Introduction In the last few years, the term “Large Language Model” (LLM) has become synonymous with chatbots that can write emails, debug code, or compose poetry. However, a quiet revolution is happening in a sector far more critical to human progress: the natural sciences. Biology, chemistry, physics, and mathematics are drowning in data. The rate of publication has far outpaced any human’s ability to read, let alone synthesize, new information. Furthermore, scientific data is distinct; it isn’t just English text. It involves molecular graphs, protein sequences, mathematical formulas, and complex imagery. ...

2024-06 · 11 min · 2163 words
[A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives 🔗](https://arxiv.org/abs/2407.15489)

Lost in Translation? Why Machine Translation Might Be the Secret Weapon of Multilingual AI

Lost in Translation? Why Machine Translation Might Be the Secret Weapon of Multilingual AI If you have been following the explosion of Natural Language Processing (NLP) over the last few years, you are likely familiar with the heavy hitters: BERT, GPT, and T5. These models have revolutionized how machines understand human language. Recently, the focus has shifted toward multilingual models—systems capable of understanding and generating text in dozens, sometimes hundreds, of languages simultaneously. ...

2024-07 · 9 min · 1739 words
[A Closer Look at Multidimensional Online Political Incivility 🔗](https://aclanthology.org/2024.emnlp-main.827.pdf)

Style vs. Substance: Decoding the Two Faces of Political Toxicity on Social Media

Introduction If you have spent any time on Twitter (now X) during an election season, you know that the discourse can get ugly. But “ugly” is a vague term. Is a tweet containing a swear word directed at a senator the same as a tweet calmly accusing a specific group of people of being “traitors to the country”? For years, content moderation tools and researchers have treated online toxicity as a binary problem: a post is either “safe” or “toxic.” However, a recent research paper titled “A Closer Look at Multidimensional Online Political Incivility” argues that this binary view is insufficient for understanding political communication. ...

8 min · 1492 words
[A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution 🔗](https://arxiv.org/abs/2410.21716)

The Stylistic Fingerprint: Solving Authorship Attribution with Bayesian LLMs

Imagine finding a lost manuscript claiming to be a forgotten work by Jane Austen or identifying the anonymous creator behind a coordinated misinformation campaign on social media. These scenarios rely on authorship attribution—the computational science of determining who wrote a specific text based on linguistic patterns. For decades, this field relied on manually counting words or, more recently, fine-tuning heavy neural networks. But a new paper, A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution, proposes a fascinating shift. Instead of training models to classify authors, the researchers leverage the raw, pre-trained probabilistic nature of Large Language Models (LLMs) like Llama-3. ...

2024-10 · 8 min · 1682 words
[1 + 1 > 2 : Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators? 🔗](https://arxiv.org/abs/2406.14721)

Breaking the Language Barrier: How Aggregating Cross-Lingual Knowledge Makes LLMs Smarter

Introduction Imagine asking a highly intelligent professor a question about the history of the Tang Dynasty. If you ask in English, they give you a vague, slightly inaccurate summary. But if you ask the exact same question in Chinese, they provide a rich, detailed, and factually perfect account. This is the current reality of Large Language Models (LLMs). Despite their reputation as universal knowledge bases, models like GPT-4 or Llama-3 suffer from a phenomenon known as multilingual inconsistency. Their “knowledge” is not stored in a language-agnostic database; it is entangled with the language of the training data. Because the internet contains vastly different information in English than it does in Chinese, Spanish, or Japanese, the model’s ability to answer questions fluctuates wildly depending on the language you use. ...

2024-06 · 8 min · 1697 words
[YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models 🔗](https://arxiv.org/abs/2409.13592)

Can AI Understand the Joke? Evaluating Satire Comprehension in Vision-Language Models with the YesBut Dataset

Can AI Understand the Joke? Evaluating Satire Comprehension in Vision-Language Models with the YesBut Dataset Artificial Intelligence has made massive strides in seeing and describing the world. Modern Vision-Language (VL) models can look at a photo of a kitchen and list the ingredients on the counter, or look at a street scene and describe the traffic. But can they understand humor? Specifically, can they grasp the biting irony of satire? ...

2024-09 · 9 min · 1876 words