Introduction
Large Language Models (LLMs) like GPT-4 and LLaMA are impressive, but they are not perfect. They can hallucinate, rely on outdated information, or simply lack specific context. In recent years, researchers have developed “Knowledge Editing” techniques—surgical methods to update a model’s weights to fix a specific error without retraining the entire network.
Traditionally, this has been applied to factual knowledge. For example, if the Prime Minister of a country changes, we can edit the model to associate the country with the new leader. However, the real world isn’t just made of static facts. It is filled with commonsense knowledge—the intuitive understanding of how people act, how physics works, and social norms.
This brings us to a difficult problem: existing methods are great at swapping “Trump” for “Biden,” but they fail miserably when trying to teach a model that “if you want to open a smart lock, you need to look at the camera.”
In this post, we break down a fascinating paper, “Commonsense Knowledge Editing Based on Free-Text in LLMs,” which introduces a novel method called DEM (Dynamics-aware Editing Method). The researchers reveal that commonsense isn’t stored in the same place as facts, and they propose a new architecture to fix the “broad, long, and abstract” errors that plague current models.

As shown in Figure 1, editing a fact (left) is straightforward replacement. Editing commonsense (right) involves changing a complex chain of reasoning and free text.
The Problem: Facts vs. Common Sense
To understand why this is hard, we first need to understand how we currently edit models.
Most state-of-the-art editing methods (like ROME or MEMIT) treat knowledge as triples: <Subject, Relation, Object>.
- Example:
<Eiffel Tower, is located in, Paris>
These methods assume that this specific fact is stored in a specific set of neurons (often in the Feed-Forward/MLP layers of the Transformer). To edit the fact, you locate those neurons and mathematically “rewrite” them.
Commonsense knowledge is different. It is characterized by:
- Non-instantiation: It deals with general concepts (e.g., “PersonX”) rather than specific named entities.
- Free-text: The “answer” isn’t a single word; it’s a sentence or a scenario.
- Broad Scope: It relies on social and physical intuition scattered across the model’s understanding.
When researchers tried to use standard factual editing tools on commonsense data, the models broke. They produced repetitive text or failed to update the reasoning. The authors hypothesized that this is because commonsense isn’t stored in the same place as facts.
Part 1: Where is Common Sense Stored? (The KLFT Method)
Before building a solution, the researchers needed to find the target. They developed a technique called Knowledge Localization for Free-Text (KLFT). This method probes the model to see which layers are active and important when retrieving different types of knowledge.
They ran experiments comparing Factual Knowledge against Commonsense Knowledge using two metrics:
- Knowledge Location: Checking the probability values in hidden states to see where information resides.
- Knowledge Recall: Measuring how much a specific layer contributes to the final answer.
The Dispersion of Knowledge
The results were striking. In Factual Knowledge (top row of Figure 2 below), activations are sharp and localized. You can point to a specific spot in the Multi-Layer Perceptron (MLP) layers and say, “The fact is here.”
However, for Commonsense Knowledge (bottom row), the heatmaps are blurry. The activation is spread out.

The researchers further decoupled the data to remove any specific entities (like names) to look at pure commonsense. The result, shown in Figure 3, confirms that while factual knowledge (blue lines) spikes in the early/middle MLP layers, commonsense knowledge (orange/green lines) is relatively stable and dispersed throughout the network.

The Role of Attention Layers
Perhaps the most critical discovery was the role of the Attention (Attn) layers. Previous editing methods largely ignored Attention layers, focusing almost exclusively on MLP layers.
The heatmap below (Figure 4) shows the storage location for different relationship types (e.g., “xWant”, “xEffect”).
- Left (MLP): Activity is concentrated in the early-to-mid layers.
- Right (Attn): Activity is scattered across almost all layers.

The authors confirmed this using a similarity metric called Simpson Similarity to measure how much the information changes as it passes through a layer. Low similarity implies the layer did a lot of work (high contribution).

As visualized in Figure 5, the Attention layers (orange line) show a vastly different response pattern for commonsense knowledge compared to factual knowledge.

The Takeaway: You cannot edit commonsense by only tweaking a few neurons in the MLP layers. You must edit the Attention layers as well, and you cannot assume a fixed location for every piece of knowledge.
Part 2: The Dynamics-aware Editing Method (DEM)
Armed with the insight that commonsense is “everywhere” and involves Attention layers, the authors proposed the Dynamics-aware Editing Method (DEM).
This method has two main components:
- Dynamics-aware Module: To figure out where to edit.
- Knowledge Editing Module: To actually perform the update on both MLP and Attention weights.

Step 1: Locating with Dynamic Awareness
Unlike previous methods that hard-code which layers to edit (e.g., “always edit layers 4 through 8”), DEM dynamically selects the layers that are most involved in the specific prompt.
It does this by calculating the Cosine Similarity between the input and output hidden states of a token at the last layer:
![]\n{ \\mathrm { C o s i n e _ S i m i l a r i t y } } = { \\frac { h ( T ) _ { i n } ^ { l } \\cdot h ( T ) _ { o u t } ^ { l } } { | h ( T ) _ { i n } ^ { l } | | h ( T ) _ { o u t } ^ { l } | } }\n[](/en/paper/2410.23844/images/009.jpg#center)
If the similarity is close to zero, it means the layer significantly transformed the information—making it a prime candidate for editing. DEM selects the top k layers (usually 3) based on this metric.
Step 2: Updating the Weights
Once the layers are identified, DEM updates the weights. This is mathematically complex because we want to force the model to produce a new “Target Answer” while ensuring we don’t break the model’s existing knowledge (Generalization and Specificity).
The objective function seeks to minimize the error for the new knowledge (\(n+1\) to \(n+u\)) while keeping previous knowledge (\(1\) to \(n\)) stable:
![]\n\\begin{array} { r } { W _ { \\mathrm { M L P } } , W _ { \\mathrm { A t t n } } \\triangleq \\underset { W } { \\mathrm { a r g m i n } } ( ( \\displaystyle \\sum _ { i = 1 } ^ { n } ( | W k _ { i } - v _ { i } ) | ^ { 2 } + } \\ { \\displaystyle \\sum _ { i = n + 1 } ^ { n + u } ( | W k _ { i } - v _ { i } ) | ^ { 2 } ) ) } \\end{array}\n[](/en/paper/2410.23844/images/010.jpg#center)
To solve this, DEM calculates an incremental weight matrix (\(\Delta\)) for both the MLP and Attention layers. This is a significant departure from standard methods that usually only calculate \(\Delta\) for MLPs.
![]\n\\begin{array} { r l } & { \\triangle ^ { M L P } = R ^ { M L P } ( k _ { 1 } ^ { M L P } ) ^ { T } ( C _ { 0 } ^ { M L P } + k _ { 1 } ^ { M L P } ( k _ { 1 } ^ { M L P } ) ^ { T } ) ^ { - 1 } } \\ & { \\triangle ^ { A t t n } = R ^ { A t t n } ( k _ { 1 } ^ { A t t n } ) ^ { T } ( C _ { 0 } ^ { A t t n } + k _ { 1 } ^ { A t t n } ( k _ { 1 } ^ { A t t n } ) ^ { T } ) ^ { - 1 } \\quad } \\end{array}\n[](/en/paper/2410.23844/images/011.jpg#center)
In these equations, \(R\) represents the “residual” (the error between what the model currently knows and what we want it to know), and \(C_0\) is the covariance of previously memorized keys (preserving old memories).
Finally, the model optimizes the hidden states (\(v\)) using a loss function that includes Kullback-Leibler (KL) divergence. This ensures the new distribution of text is smooth and natural, rather than just forcing a hard keyword insertion.
![]\n\\begin{array} { r l } & { \\mathcal { L } ( v _ { i } ^ { m } ) = \\alpha \\cdot D _ { \\mathrm { K L } } ( \\mathbb { P } _ { \\mathbb { F } _ { \\theta } ^ { \\dagger } } [ \\pmb { y } ^ { m } \\mid p ^ { m } ] | \\mathbb { P } _ { \\mathcal { F } _ { \\theta } } [ \\pmb { y } ^ { m } \\middle | p ^ { m } ] ) } \\ & { + \\beta \\cdot \\frac { 1 } { P } \\sum _ { j = 1 } ^ { P } - \\log \\mathbb { P } _ { \\mathcal { F } _ { \\theta } ^ { \\dagger } } [ \\pmb { y } _ { i } ^ { Z _ { t } } \\middle | \\mathrm { p r e f } _ { j } \\oplus p ( \\pmb { x } _ { i } ^ { m } ) ] . } \\end{array}\n()](/en/paper/2410.23844/images/013.jpg#center)
The CKEBench Dataset
To test this method, the authors couldn’t rely on existing factual datasets. They created CKEBench (Commonsense Knowledge Editing Benchmark), derived from the famous ATOMIC database.
They converted abstract relationships (like xAttr, xReact) into human-readable templates and questions.

This resulted in over 15,000 samples covering physical entities, social interactions, and events.
Experimental Results
So, does editing the Attention layers and dynamically selecting them actually help?
The authors compared DEM against top-tier baselines like MEMIT, MEND, and PMET on models like GPT-J (6B) and LLaMA-2 (7B).
Quantitative Success
The results were decisive. In Table 2 (below), DEM outperforms the previous state-of-the-art (PMET) across almost all metrics. Notably:
- Score: Improved by 4.5 points on GPT-J.
- Commonsense: A massive improvement of 13.8%, indicating the model actually grasped the new logic rather than just memorizing a word.
- Specificity: DEM is better at not breaking other unrelated knowledge.

Qualitative Success
Numbers are great, but looking at the actual text output is often more telling. In Figure 7, we see a comparison of editing attempts for a prompt about opening a smart lock.
- Original: Fails (suggests taking out a key).
- MEMIT/PMET: Fail. They often produce repetitive loops (“needs to needs to”) or nonsensical sentence fragments.
- DEM: (Not shown in this specific crop, but implied by success rate) generates the coherent target: “aim her face at the camera.”

Why it Works: The Ablation Study
To prove that both the “Dynamics-aware” module and the “Attention editing” were necessary, the authors performed an ablation study (removing parts of the system to see what breaks).

As shown in Table 3:
- w/o DA (No Dynamics-aware): Performance drops significantly. Guessing which layers to edit doesn’t work.
- w/o EM (No MLP editing): Huge drop. MLP layers are still the foundation.
- w/o EA (No Attention editing): Notable drop in efficacy. This confirms the paper’s hypothesis: You cannot effectively edit commonsense without touching the Attention layers.
Conclusion
The paper “Commonsense Knowledge Editing Based on Free-Text in LLMs” marks a significant step forward in model maintenance. It moves us away from the simplified view that knowledge editing is just about swapping Entity A for Entity B.
By mapping the dispersed nature of commonsense knowledge—spanning both MLP and Attention layers—and designing the DEM architecture to adaptively target these areas, the researchers have unlocked a way to correct broader, more abstract errors in Large Language Models. As LLMs become more integrated into daily life, the ability to correct “social” and “physical” commonsense errors without expensive retraining will be essential.
](https://deep-paper.org/en/paper/2410.23844/images/cover.png)