Introduction

Large Language Models (LLMs) like GPT-4 and LLaMA are impressive, but they are not perfect. They can hallucinate, rely on outdated information, or simply lack specific context. In recent years, researchers have developed “Knowledge Editing” techniques—surgical methods to update a model’s weights to fix a specific error without retraining the entire network.

Traditionally, this has been applied to factual knowledge. For example, if the Prime Minister of a country changes, we can edit the model to associate the country with the new leader. However, the real world isn’t just made of static facts. It is filled with commonsense knowledge—the intuitive understanding of how people act, how physics works, and social norms.

This brings us to a difficult problem: existing methods are great at swapping “Trump” for “Biden,” but they fail miserably when trying to teach a model that “if you want to open a smart lock, you need to look at the camera.”

In this post, we break down a fascinating paper, “Commonsense Knowledge Editing Based on Free-Text in LLMs,” which introduces a novel method called DEM (Dynamics-aware Editing Method). The researchers reveal that commonsense isn’t stored in the same place as facts, and they propose a new architecture to fix the “broad, long, and abstract” errors that plague current models.

Fig.1: An example with factual knowledge and commonsense knowledge,and obtaining the correct answer by editing the model.

As shown in Figure 1, editing a fact (left) is straightforward replacement. Editing commonsense (right) involves changing a complex chain of reasoning and free text.

The Problem: Facts vs. Common Sense

To understand why this is hard, we first need to understand how we currently edit models.

Most state-of-the-art editing methods (like ROME or MEMIT) treat knowledge as triples: <Subject, Relation, Object>.

Example: <Eiffel Tower, is located in, Paris>

These methods assume that this specific fact is stored in a specific set of neurons (often in the Feed-Forward/MLP layers of the Transformer). To edit the fact, you locate those neurons and mathematically “rewrite” them.

Commonsense knowledge is different. It is characterized by:

Non-instantiation: It deals with general concepts (e.g., “PersonX”) rather than specific named entities.
Free-text: The “answer” isn’t a single word; it’s a sentence or a scenario.
Broad Scope: It relies on social and physical intuition scattered across the model’s understanding.

When researchers tried to use standard factual editing tools on commonsense data, the models broke. They produced repetitive text or failed to update the reasoning. The authors hypothesized that this is because commonsense isn’t stored in the same place as facts.

Part 1: Where is Common Sense Stored? (The KLFT Method)

Before building a solution, the researchers needed to find the target. They developed a technique called Knowledge Localization for Free-Text (KLFT). This method probes the model to see which layers are active and important when retrieving different types of knowledge.

They ran experiments comparing Factual Knowledge against Commonsense Knowledge using two metrics:

Knowledge Location: Checking the probability values in hidden states to see where information resides.
Knowledge Recall: Measuring how much a specific layer contributes to the final answer.

The Dispersion of Knowledge

The results were striking. In Factual Knowledge (top row of Figure 2 below), activations are sharp and localized. You can point to a specific spot in the Multi-Layer Perceptron (MLP) layers and say, “The fact is here.”

However, for Commonsense Knowledge (bottom row), the heatmaps are blurry. The activation is spread out.

Fig. 2: Storing Factual and Commonsense Knowledge in LLMs.

The researchers further decoupled the data to remove any specific entities (like names) to look at pure commonsense. The result, shown in Figure 3, confirms that while factual knowledge (blue lines) spikes in the early/middle MLP layers, commonsense knowledge (orange/green lines) is relatively stable and dispersed throughout the network.

Fig. 3: The storage of commonsense knowledge after decoupling factual knowledge. The Single Layers refers to the transformers block layer, which includes MLP and Attn layers.

The Role of Attention Layers

Perhaps the most critical discovery was the role of the Attention (Attn) layers. Previous editing methods largely ignored Attention layers, focusing almost exclusively on MLP layers.

The heatmap below (Figure 4) shows the storage location for different relationship types (e.g., “xWant”, “xEffect”).

Left (MLP): Activity is concentrated in the early-to-mid layers.
Right (Attn): Activity is scattered across almost all layers.

Fig. 4: Display the storage location of samples for each relationship category in the MLP and Attn layers.The horizontal axis represents the parameter layer of the model,and the vertical axis represents the relationship category. The darker the color, the more knowledge stored in that layer.

The authors confirmed this using a similarity metric called Simpson Similarity to measure how much the information changes as it passes through a layer. Low similarity implies the layer did a lot of work (high contribution).

$()\n\\mathrm { S i m p s o n _ S i m i l a r i t y } = = \\frac { | p _ { i n } ^ { l } \\cap p _ { o u t } ^ { l } | } { \\operatorname* { m i n } ( | p _ { i n } ^ { l } | , | p _ { o u t } ^ { l } | ) }\n[$

As visualized in Figure 5, the Attention layers (orange line) show a vastly different response pattern for commonsense knowledge compared to factual knowledge.

Fig.5:The comparison of activation response results between factual and commonsense knowledge in knowledge recall process. Among them, the green line represents the MLP layer, the orange line represents the Attention layer. The horizontal axis represents different layers,and the vertical axis represents the numerical value of similarity.

The Takeaway: You cannot edit commonsense by only tweaking a few neurons in the MLP layers. You must edit the Attention layers as well, and you cannot assume a fixed location for every piece of knowledge.

Part 2: The Dynamics-aware Editing Method (DEM)

Armed with the insight that commonsense is “everywhere” and involves Attention layers, the authors proposed the Dynamics-aware Editing Method (DEM).

This method has two main components:

Dynamics-aware Module: To figure out where to edit.
Knowledge Editing Module: To actually perform the update on both MLP and Attention weights.

Fig. 6: The overall architecture of the Dynamics-aware Editing Method.

Step 1: Locating with Dynamic Awareness

Unlike previous methods that hard-code which layers to edit (e.g., “always edit layers 4 through 8”), DEM dynamically selects the layers that are most involved in the specific prompt.

It does this by calculating the Cosine Similarity between the input and output hidden states of a token at the last layer:

$]\n{ \\mathrm { C o s i n e _ S i m i l a r i t y } } = { \\frac { h ( T ) _ { i n } ^ { l } \\cdot h ( T ) _ { o u t } ^ { l } } { | h ( T ) _ { i n } ^ { l } | | h ( T ) _ { o u t } ^ { l } | } }\n[$

If the similarity is close to zero, it means the layer significantly transformed the information—making it a prime candidate for editing. DEM selects the top k layers (usually 3) based on this metric.

Step 2: Updating the Weights

Once the layers are identified, DEM updates the weights. This is mathematically complex because we want to force the model to produce a new “Target Answer” while ensuring we don’t break the model’s existing knowledge (Generalization and Specificity).

The objective function seeks to minimize the error for the new knowledge ($n+1$ to $n+u$) while keeping previous knowledge ($1$ to $n$) stable:

$]\n\\begin{array} { r } { W _ { \\mathrm { M L P } } , W _ { \\mathrm { A t t n } } \\triangleq \\underset { W } { \\mathrm { a r g m i n } } ( ( \\displaystyle \\sum _ { i = 1 } ^ { n } ( | W k _ { i } - v _ { i } ) | ^ { 2 } + } \\ { \\displaystyle \\sum _ { i = n + 1 } ^ { n + u } ( | W k _ { i } - v _ { i } ) | ^ { 2 } ) ) } \\end{array}\n[$

To solve this, DEM calculates an incremental weight matrix ($\Delta$) for both the MLP and Attention layers. This is a significant departure from standard methods that usually only calculate $\Delta$ for MLPs.

$]\n\\begin{array} { r l } & { \\triangle ^ { M L P } = R ^ { M L P } ( k _ { 1 } ^ { M L P } ) ^ { T } ( C _ { 0 } ^ { M L P } + k _ { 1 } ^ { M L P } ( k _ { 1 } ^ { M L P } ) ^ { T } ) ^ { - 1 } } \\ & { \\triangle ^ { A t t n } = R ^ { A t t n } ( k _ { 1 } ^ { A t t n } ) ^ { T } ( C _ { 0 } ^ { A t t n } + k _ { 1 } ^ { A t t n } ( k _ { 1 } ^ { A t t n } ) ^ { T } ) ^ { - 1 } \\quad } \\end{array}\n[$

In these equations, $R$ represents the “residual” (the error between what the model currently knows and what we want it to know), and $C_0$ is the covariance of previously memorized keys (preserving old memories).

Finally, the model optimizes the hidden states ($v$) using a loss function that includes Kullback-Leibler (KL) divergence. This ensures the new distribution of text is smooth and natural, rather than just forcing a hard keyword insertion.

$]\n\\begin{array} { r l } & { \\mathcal { L } ( v _ { i } ^ { m } ) = \\alpha \\cdot D _ { \\mathrm { K L } } ( \\mathbb { P } _ { \\mathbb { F } _ { \\theta } ^ { \\dagger } } [ \\pmb { y } ^ { m } \\mid p ^ { m } ] | \\mathbb { P } _ { \\mathcal { F } _ { \\theta } } [ \\pmb { y } ^ { m } \\middle | p ^ { m } ] ) } \\ & { + \\beta \\cdot \\frac { 1 } { P } \\sum _ { j = 1 } ^ { P } - \\log \\mathbb { P } _ { \\mathcal { F } _ { \\theta } ^ { \\dagger } } [ \\pmb { y } _ { i } ^ { Z _ { t } } \\middle | \\mathrm { p r e f } _ { j } \\oplus p ( \\pmb { x } _ { i } ^ { m } ) ] . } \\end{array}\n()$

The CKEBench Dataset

To test this method, the authors couldn’t rely on existing factual datasets. They created CKEBench (Commonsense Knowledge Editing Benchmark), derived from the famous ATOMIC database.

They converted abstract relationships (like xAttr, xReact) into human-readable templates and questions.

Table 1: An example of converting source data from ATOMIC database into directly generated(DG), multiple-choice questions(MQ),and true/false questions(T/F).

This resulted in over 15,000 samples covering physical entities, social interactions, and events.

Experimental Results

So, does editing the Attention layers and dynamically selecting them actually help?

The authors compared DEM against top-tier baselines like MEMIT, MEND, and PMET on models like GPT-J (6B) and LLaMA-2 (7B).

Quantitative Success

The results were decisive. In Table 2 (below), DEM outperforms the previous state-of-the-art (PMET) across almost all metrics. Notably:

Score: Improved by 4.5 points on GPT-J.
Commonsense: A massive improvement of 13.8%, indicating the model actually grasped the new logic rather than just memorizing a word.
Specificity: DEM is better at not breaking other unrelated knowledge.

$Table 2: The main results directly generated in the CKEBench dataset. The performance of our method is followed by the improvements ( \$\\uparrow\$ ) over the previous method.$

Qualitative Success

Numbers are great, but looking at the actual text output is often more telling. In Figure 7, we see a comparison of editing attempts for a prompt about opening a smart lock.

Original: Fails (suggests taking out a key).
MEMIT/PMET: Fail. They often produce repetitive loops (“needs to needs to”) or nonsensical sentence fragments.
DEM: (Not shown in this specific crop, but implied by success rate) generates the coherent target: “aim her face at the camera.”

Fig. 7: Examples of commonsense knowledge editing using existing methods.

Why it Works: The Ablation Study

To prove that both the “Dynamics-aware” module and the “Attention editing” were necessary, the authors performed an ablation study (removing parts of the system to see what breaks).

Table 3:Ablation study of DEM.We turn off different components of the model one at a time.

As shown in Table 3:

w/o DA (No Dynamics-aware): Performance drops significantly. Guessing which layers to edit doesn’t work.
w/o EM (No MLP editing): Huge drop. MLP layers are still the foundation.
w/o EA (No Attention editing): Notable drop in efficacy. This confirms the paper’s hypothesis: You cannot effectively edit commonsense without touching the Attention layers.

Conclusion

The paper “Commonsense Knowledge Editing Based on Free-Text in LLMs” marks a significant step forward in model maintenance. It moves us away from the simplified view that knowledge editing is just about swapping Entity A for Entity B.

By mapping the dispersed nature of commonsense knowledge—spanning both MLP and Attention layers—and designing the DEM architecture to adaptively target these areas, the researchers have unlocked a way to correct broader, more abstract errors in Large Language Models. As LLMs become more integrated into daily life, the ability to correct “social” and “physical” commonsense errors without expensive retraining will be essential.

Introduction#

The Problem: Facts vs. Common Sense#

Part 1: Where is Common Sense Stored? (The KLFT Method)#

The Dispersion of Knowledge#

The Role of Attention Layers#

Part 2: The Dynamics-aware Editing Method (DEM)#

Step 1: Locating with Dynamic Awareness#

Step 2: Updating the Weights#

The CKEBench Dataset#

Experimental Results#

Quantitative Success#

Qualitative Success#

Why it Works: The Ablation Study#

Conclusion#