Taming Complexity in Semantic Parsing: Introducing the AMS Parser

In the world of Natural Language Processing (NLP), teaching a computer to translate a sentence into a structured logical formula—a task known as semantic parsing—is one of the ultimate goals. It is the bridge between raw text and actual understanding.

While modern Large Language Models (LLMs) like GPT or T5 are incredibly powerful, they often operate like a black box. They are great at guessing the next word, but they don’t inherently “understand” logic in a structured way. This becomes a significant problem when dealing with complex linguistic frameworks like Discourse Representation Theory (DRT), which relies heavily on nested logic and variable scope.

In this post, we will deep dive into a fascinating paper titled “Scope-enhanced Compositional Semantic Parsing for DRT”. The researchers propose a novel hybrid system—the AMS Parser—that separates the “skeleton” of a sentence’s meaning from its logical “scope,” solving a major bottleneck in how machines understand complex language.

If you are a student of NLP, this paper offers a masterclass in how to combine symbolic logic with neural networks to solve problems that pure deep learning struggles with.

The Problem: Why DRT is Hard for Machines

To understand the solution, we first need to understand the problem. Most semantic parsing tasks map a sentence to a graph. For example, “The cat sleeps” might become a simple graph where “cat” is the agent of “sleep.”

However, Discourse Representation Theory (DRT) is different. It was designed by linguists to handle complex phenomena like negation, quantifiers (“every”, “some”), and how entities interact across sentences.

In DRT, meaning is represented by Discourse Representation Structures (DRS), which look like nested boxes.

Boxes represent “scope.”
Variables (like $x$ for a child) live inside these boxes.
The placement of a variable in a specific box determines where it is valid.

Recently, researchers have converted these box structures into Discourse Representation Graphs (DRG) to make them easier for computers to process.

Figure 1: DRS (top) and DRG (bottom) for the sentence Every child misbehaves occasionally; dashed lines represent scope assignments of connectives.

Look at Figure 1 above. The sentence is “Every child misbehaves occasionally.”

Top: The traditional DRS box notation. Notice how the logic for “every” requires two boxes (connected by an arrow). The “child” is in the left box, and the “misbehaving” is in the right box.
Bottom: The graph version (DRG). The solid lines represent the basic meaning (who did what), but the dashed lines represent scope.

The problem is those dashed lines. Current state-of-the-art models (usually Sequence-to-Sequence models like T5) try to generate this whole structure at once. But because they don’t truly understand the rules of the boxes, they often produce ill-formed graphs—structures that look like logic but break the rules of DRT. Furthermore, as sentences get longer, these models lose track of which variable belongs in which box, leading to a degradation in accuracy.

The Foundation: Compositional AM Parsing

To solve this, the researchers turned to Compositional Semantic Parsing. Unlike a black-box Transformer that guesses the whole graph at once, a compositional parser builds the meaning step-by-step, combining small pieces into a whole.

The specific framework they used is called the AM Parser (Apply-Modify Parser). It treats semantic parsing almost like playing with Lego blocks. It has a library of small graph fragments (lexical graphs) for each word and combines them using specific algebraic operations: Apply (APP) and Modify (MOD).

Figure 2: Relevant graphs for sentence The little cat wanted to sleep.

As shown in Figure 2, the AM parser works by predicting a dependency tree (Figure 2e).

It assigns a small graph to every word (Figure 2a).
It uses APP operations to plug arguments into heads (like plugging “sleep” into “wanted” in Figure 2d).
It uses MOD operations to attach modifiers (like attaching “little” to “cat” in Figure 2b).

This approach is powerful because it is transparent. You can trace exactly how the sentence was built.

The Conflict: Scope vs. Compositionality

However, there is a catch. The AM Parser was designed for simpler graphs (like AMR). It struggles massively with the scope edges (the dashed lines in Figure 1) required by DRT.

Why? Because the AM algebra assumes a neat tree-like structure where pieces fit together locally. Scope edges often cross boundaries in ways that the standard Apply and Modify operations can’t handle.

Figure 3: Failed combination of graphs for Fig 1

Figure 3 illustrates this failure. To build the graph for “Every child…”, the parser tries to combine the graph for “child” with the graph for “every.” But “every” creates two boxes. The logic of the AM parser forces it to choose one box to attach “child” to, but the complex scope rules might require connections to both, or connections that violate the parser’s algebraic constraints.

The researchers found that if they tried to train the standard AM parser on standard DRT graphs, it failed to decompose (find a valid tree structure for) 99.3% of the data. It was essentially impossible.

The Solution: The AMS Parser

The researchers’ innovation is the AMS Parser (AM Parser + Scope). Their core insight is simple yet brilliant: Divide and Conquer.

If scope edges are making the graph too hard to parse compositionally, why not remove them, parse the easy part, and then add them back in later?

The AMS Parser operates in a two-stage pipeline, illustrated in Figure 4.

Figure 4: Overall structure of the AMS parser.

The pipeline works as follows:

Simplification: Train the AM Parser on simplified graphs without the difficult scope edges.
Scope Prediction: Train a separate Dependency Parser to predict only the scope edges.
Recombination: Merge the outputs to get the full, correct DRT structure.

Let’s break down these steps.

Step 1: Simplifying the Graphs

To make the graphs “digestible” for the AM Parser, the researchers developed two simplification strategies:

1. Compact DRG: They remove scope edges only when a node is in the same box as its parent. This removes about 70% of scope edges but keeps the graph connected.

Figure 5: Compact DRG for Fig 1 (The removed edges are marked in light blue).

2. Scopeless DRG (The stronger approach): They remove all scope edges, reducing the graph to essentially just its predicate-argument structure (Who did what to whom?). This removes the complexity of nesting entirely during this stage.

Figure 6: Scopeless DRG for Fig 1 (The removed edges are marked in light blue).

In Figure 6, you can see the “Scopeless” version of the graph from the introduction. The complex dashed lines are gone (marked in blue). The result is a simple structure that the AM Parser handles easily.

Table 1 below shows the dramatic difference this makes. The standard AM parser could only handle 0.7% of the data (“NoPrep”). With the Scopeless simplification (“SCPL”), it can handle 94.4% of the data.

Table 1: Decomposable graphs in PMB5 ( % ) . APP: member edges grouped with the box; resulting in Apply operations in the AM dependency tree.MOD: member edges grouped with the content nodes,resulting in Modify operations.

Step 2: Predicting Scope with Dependency Parsing

Now that the AM Parser can successfully generate the “skeleton” of the meaning, how do we put the scope (the logic) back in?

The researchers realized that scope can be modeled as a dependency relationship between words. For the sentence “Every child misbehaves occasionally”, the word “every” introduces the boxes. The words “child” and “misbehaves” need to be put into those boxes.

They trained a neural dependency parser (specifically a biaffine parser, often used in syntactic parsing) to predict these links directly between tokens.

Figure 7: Scope dependency graph for Every child misbehaves occasionally.

In Figure 7, we see the Scope Dependency Graph.

The arrow from “Every” to “child” is labeled scope_b2. This tells the system: “Put the concept for child into the second box created by every.”
The arrow from “Every” to “misbehaves” is labeled scope_b3. This says: “Put the concept for misbehaves into the third box created by every.”

This turns a complex graph problem into a standard sequence labeling problem, which neural networks are very good at.

Step 3: Recombination

Finally, the system combines the two outputs.

It takes the graph structure (predicate-argument relations) from the AM Parser.
It takes the scope assignments from the Dependency Parser.
It “stitches” the scope edges back into the graph, ensuring every variable ends up in the correct logical box.

Crucially, because the AM parser guarantees a valid graph structure and the scope resolver just adds edges to existing nodes, the final output is always well-formed. The AMS parser never produces “broken” logic.

Experiments and Results

The researchers evaluated their model on the Parallel Meaning Bank (PMB), the standard dataset for DRT parsing. They compared the AMS Parser against powerful sequence-to-sequence models like T5 and mBART.

1. Does the Scope Predictor work?

First, they checked if their dedicated dependency parser was actually good at guessing scope.

Table 2: Accuracy of scope dependency parsing.

As Table 2 shows, the scope parser is incredibly accurate, achieving an F-score (LAS) of 95.7% on the test set. This validates the hypothesis that scope is distinct enough to be learned as a separate task.

2. Overall Performance

Next, they looked at the full parsing performance (SMATCH scores).

Table 3: Accuracy and error rates for DRG parsing.

Table 3 highlights several key findings:

Zero Errors: Look at the “Err” column. The AMS Parser has 0.0% error rate. It never produces broken graphs. In contrast, T5-base failed to produce valid graphs 20% of the time.
Beat the Baselines: The AMS Parser (specifically the scpl+d version, meaning Scopeless simplification + Dependency scope resolution) achieves 87.1%, outperforming standard sequence-to-sequence models trained on the same gold data (like T5-large at 84.2%).
TestLong Performance: This is the most impressive result. On the TestLong split (sentences with ~40 tokens), the AMS Parser scores 48.7%, completely crushing the sequence-to-sequence models (T5-large only got 18.1%).

3. Scaling to Complexity

The main weakness of standard neural models is that they struggle when the logic gets deep. The researchers broke down performance by the number of logical boxes in the target graph.

Table 4: SMATCH score for multi-box DRGs and corresponding scope score (highlighted in gray)

Table 4 reveals that as complexity increases (from 1 box to 4+ boxes):

Standard T5 and mBART performance collapses.
The AMS Parser maintains much higher performance. For graphs with 4 or more boxes, it scores 75.2%, compared to just 45-50% for the T5 models.

4. Length Generalization

Finally, Figure 8 visually demonstrates the “Length Generalization” problem. The x-axis represents the length of the document, and the y-axis is accuracy.

$Figure 8: SMATCH ^ { + + } F1 for DRGs Across Different Models and Document Lengths.$

Notice the distinct red line (AMS Parser). While other models (like the green and orange lines) nosedive as sentences get longer, the AMS Parser remains much more stable. This proves that separating structure from scope is essential for handling the complexity of real-world language.

Conclusion and Implications

The AMS Parser represents a significant step forward in semantic parsing. By acknowledging that logical scope and predicate-argument structure are different types of information, the researchers were able to design a system that excels at both.

Key Takeaways:

Divide and Conquer works: Separating the parsing of “who did what” (AM Parser) from “logical scope” (Dependency Parser) yields better results than trying to do it all at once.
Reliability matters: Unlike pure neural models, the AMS Parser guarantees well-formed output, which is critical if we want to use these systems for actual logical reasoning.
Complexity is the frontier: The biggest gains were seen in long, complex sentences with nested logic. As we move toward AI that can read legal documents or philosophical texts, this ability to handle nesting will be paramount.

This paper serves as a reminder that while end-to-end Deep Learning is powerful, combining it with linguistic insights and structured, symbolic approaches often leads to the most robust solutions.

The Problem: Why DRT is Hard for Machines#

The Foundation: Compositional AM Parsing#

The Conflict: Scope vs. Compositionality#

The Solution: The AMS Parser#

Step 1: Simplifying the Graphs#

Step 2: Predicting Scope with Dependency Parsing#

Step 3: Recombination#

Experiments and Results#

1. Does the Scope Predictor work?#

2. Overall Performance#

3. Scaling to Complexity#

4. Length Generalization#

Conclusion and Implications#