Introduction

In the rapidly evolving landscape of Natural Language Processing (NLP), Large Language Models (LLMs) like GPT-4 have demonstrated “emergent abilities”—skills that appear only when models reach a massive scale. One of the most significant of these is reasoning. By asking a model to “think step-by-step” or providing examples of reasoning (Chain-of-Thought prompting), we can drastically improve its accuracy on complex tasks like math, symbolic logic, or sarcasm detection.

However, there is a catch. Generating these high-quality reasoning steps usually requires human annotation (which is expensive) or the use of auxiliary “proxy” models to generate explanations for the main model. Furthermore, as the industry pushes for efficiency, there is a growing interest in Small Language Models (SLMs)—models with fewer than 14 billion parameters that can run on consumer hardware.

Do SLMs have the capacity to improve their own reasoning without relying on humans or massive teacher models?

This is the question addressed in the paper “Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations.” The researchers propose a framework where a Small Language Model looks at its own predictions, explains why the correct answer is correct (using post hoc explanation methods), and uses those self-generated explanations to solve future problems.

Figure 1: Example of four responses to a question from the Snarks dataset using different prompting strategies.

As shown in Figure 1, traditional prompting methods often fail on nuanced tasks like detecting sarcasm. However, by using Self-AMPLIFY, the model identifies key rationale tokens (like “1000 economists”) before answering, leading to the correct prediction.

In this post, we will break down how Self-AMPLIFY works, the technology behind “self-explanation,” and why this matters for the future of efficient AI.

Background: Context and Rationales

To understand Self-AMPLIFY, we first need to understand the concept of In-Context Learning (ICL). ICL is the ability of a language model to learn a task simply by seeing a few examples in the prompt, without any updates to its weights (fine-tuning).

Standard ICL usually follows an Input-Output (IO) format:

  • Input: Question A -> Output: Answer A
  • Input: Question B -> Output: Answer B
  • Input: Target Question -> Output: ?

However, researchers found that including a rationale—the “why”—improves performance significantly. This is often called Chain-of-Thought (CoT).

  • Input: Question A -> Rationale: Explanation A -> Output: Answer A

The Problem with Current Methods

Existing methods for generating these rationales automatically often rely on external support:

  1. Auto-CoT: Asks the model to “think step by step” to generate rationales. However, if the model is small or the task is hard, it might hallucinate incorrect reasoning.
  2. AMPLIFY: A previous framework that used a separate, fine-tuned “proxy” model (like a BERT model) to generate explanations, which were then fed to the main LLM.

The dependency on external models or human data limits the autonomy of SLMs. This brings us to Post Hoc Explanations.

Post Hoc Explanations

“Post hoc” means “after the event.” In machine learning, post hoc explanation methods analyze a model after it has made a prediction to determine which parts of the input were most important.

Common methods include:

  • DeepLift and Integrated Gradients: These look at the gradients (math derivatives) inside the neural network to see which input words contributed most to the output.
  • KernelSHAP: A perturbation method that changes the input slightly (adding/removing words) to see how the output changes, determining feature importance.

The authors of Self-AMPLIFY realized that SLMs are now capable enough that these heavy mathematical explanation methods could be applied directly to the SLM itself, turning it into its own teacher.

The Self-AMPLIFY Framework

The core innovation of Self-AMPLIFY is a fully automated, 3-step loop that allows an SLM to improve its performance using its own internal signals. It removes the need for any auxiliary model.

Figure 2: Self-AMPLIFY overview. A 3-step approach involving Sample Selection, Rationale Generation, and Prompt Design.

As illustrated in Figure 2, the process flows as follows:

Step 1: \(n\)-shot Sample Selection

The system needs to decide which examples to include in the prompt to teach the model. The researchers propose two strategies based purely on the model’s own predictions:

  1. Success Strategy: Select examples that the model already answers correctly (when given a hint). The logic is that if the model is certain about an answer, its internal explanation for that answer is likely high-quality.
  2. Error Strategy: Select examples where the model initially got the wrong answer. The goal here is to force the model to analyze its mistakes by generating an explanation for the correct answer (ground truth), hoping this correction prevents similar errors in the test phase.

Step 2: Rationale Generation

Once samples are selected, the system generates rationales for them. This is the most technically rich part of the paper. Since the system knows the ground truth label (\(y\)) for the training samples, it asks: “Which parts of the input explain this label?”

The paper implements three distinct types of self-generated rationales:

A. Attribution-Based Rationales (DeepLift / KernelSHAP)

These methods treat the model as a mathematical function. They compute an importance score for every word in the input sentence relative to the correct answer.

Figure 3: Self-AMPLIFY rationale generation step with a post hoc attribution method.

As seen in Figure 3, if the correct answer is “D” (Hurricane), the method backpropagates from the output “D” to find the most relevant input words: “which,” “type,” “weather,” and “map.”

The system then converts these mathematical scores into a natural language sentence:

“The 4 keywords ‘which’, ’type’, ‘weather’ and ‘map’ are hints to predict that the answer is D.”

B. Self-TopK

This is a simpler approach that asks the model directly via a prompt: “Choose the right answer with the top-k most important keywords used to answer.”

C. Post Hoc Chain-of-Thought (Ph-CoT)

Here, the model is prompted to generate a free-text explanation given the correct answer. The template looks like: “The answer is (A). Generate a concise 3-step explanation.”

Unlike standard CoT which generates reasoning before the answer, this generates reasoning after knowing the answer, effectively rationalizing the truth.

Step 3: Prompt Design

Finally, the system constructs the prompt that will be used for inference on new, unseen data. It takes the input text (\(x\)), the generated rationale (\(r\)), and the correct label (\(y\)) to create \((x, r, y)\) triplets.

When the model faces a new test question, it sees these examples of “Question -> Explanation -> Answer” and attempts to mimic the behavior, performing what is essentially In-Context Learning.

Experiments and Results

The researchers evaluated Self-AMPLIFY on several Small Language Models, specifically Mistral-7B, Zephyr-7B, and Gemma (2B and 7B versions). They tested on datasets requiring strong reasoning, such as identifying sarcasm (Snarks), commonsense reasoning (CommonsenseQA), and causal deduction (Causal Judgment).

Does Self-AMPLIFY work?

The results were compelling. Self-AMPLIFY consistently outperformed standard Input-Output (IO) prompting and frequently beat the competitors Auto-CoT and AMPLIFY.

Table 1: Self-AMPLIFY and competitors accuracy (%) on five test sets and two 7 billion parameters models.

Looking at Table 1, we can observe several key trends:

  • Success vs. Competitors: On the ARC Challenge with Mistral-7B, the Ph-CoT version of Self-AMPLIFY reached 75.2% accuracy, compared to 71.8% for Auto-CoT and 72.8% for standard prompting.
  • Versatility: Whether using mathematical attribution (DeepLift) or natural language generation (Ph-CoT), Self-AMPLIFY showed improvements. Ph-CoT generally performed the best, likely because free-text explanations are more informative to a language model than a list of keywords.
  • Independence: Remember, AMPLIFY (the competitor) uses a separate BERT model to help it. Self-AMPLIFY beats it using only the SLM itself.

The Impact of Model Size

An interesting finding emerged when comparing different model sizes. The researchers tested the framework on Gemma-7B and the tiny Gemma-2B.

Figure 4: Self-AMPLIFY accuracy with Gemma-2B vs Gemma-7B.

As shown in Figure 4:

  • Gemma-7B (Right): Shows the expected behavior—Self-AMPLIFY (blue bars) generally beats the competitors (red bars).
  • Gemma-2B (Left): The results are much noisier. In many cases, the method does not significantly outperform the baseline.

This suggests there is a “reasoning threshold.” Tiny models (2B parameters) may struggle to generate high-quality post hoc explanations or fail to benefit from In-Context Learning as effectively as their 7B counterparts.

Success vs. Error Strategy

The researchers also investigated whether it is better to show the model examples of its successes or its failures (corrected).

  • The Success strategy (reinforcing what the model knows) performed consistently well.
  • The Error strategy (correcting mistakes) was highly effective for complex tasks like Snarks and Causal Judgment. This implies that for difficult tasks, showing the model how to solve problems it previously found confusing provides a strong learning signal.

Discussion and Implications

The Self-AMPLIFY paper presents a significant step forward for the autonomy of Small Language Models. By closing the loop—allowing a model to analyze its own outputs and feed that analysis back into its own inputs—we create a self-improving system without the overhead of massive external teacher models.

Key Takeaways

  1. No Proxy Needed: SLMs are now capable enough to run sophisticated post hoc explanation algorithms (like DeepLift) on themselves.
  2. Attribution as Rationale: We can convert mathematical “feature importance” vectors into natural language prompts to improve model reasoning.
  3. Corrective Feedback: Using the “Error Strategy” allows the model to learn specifically from the “hard” examples where its intuition initially failed.

Limitations

While promising, the approach is not without costs. Methods like KernelSHAP and DeepLift are computationally expensive because they require multiple forward and backward passes through the network. While feasible for a 7B model, they are significantly slower than simple text generation. Additionally, as seen with Gemma-2B, there is a lower limit on model size; extremely small models may not possess the requisite “self-awareness” to explain their predictions faithfully.

Conclusion

Self-AMPLIFY demonstrates that Small Language Models hold latent reasoning capabilities that can be unlocked through self-explanation. As we move toward more efficient, edge-based AI, techniques that allow these compact models to learn from their own internal dynamics—rather than relying on the cloud-based giants—will be essential. This research confirms that even small models can benefit from “thinking about their thinking.”