Introduction

In the world of Natural Language Processing (NLP), Deep Neural Networks (DNNs) are the reigning champions. They power everything from sentiment analysis on e-commerce sites to toxic comment detection on social media. However, these models have a significant Achilles’ heel: they are brittle. A slight, often imperceptible change to an input sentence—known as an adversarial attack—can cause a state-of-the-art model to completely misclassify the text.

While there has been a massive amount of research into breaking English models, the security of models processing Chinese—the second most popular language on the internet with over a billion users—has been surprisingly underestimated.

The challenge with attacking Chinese text is that you cannot simply copy-paste methods designed for English. English relies on alphabetic spelling; Chinese relies on logograms with complex shapes and specific tones. When researchers try to apply English attack methods (like simple synonym swapping) to Chinese, the results are often “awkward” sentences that a native speaker would spot immediately.

In this post, we will dive deep into a paper that proposes a novel solution to this problem: the Immune-based Sound-Shape Code (ISSC) algorithm. This method combines the linguistic unique features of Chinese characters (how they sound and look) with a biological heuristic algorithm (how immune systems evolve) to generate attacks that are both highly effective against AI and natural to human readers.

Background: The Unique Challenge of Chinese Text

To understand why this new method is necessary, we first need to look at why existing methods fail.

In adversarial text generation, the attacker’s goal is to introduce a perturbation \(\Delta \mathbf{X}\) to an input sentence \(\mathbf{X}\) so that the model makes a mistake, while keeping the sentence readable for humans.

Most existing Chinese attack methods fall into two traps:

  1. Direct Transfer: They simply apply English algorithms (like Genetic Algorithms or Particle Swarm Optimization) using generic Chinese embeddings. This ignores the internal structure of Chinese characters.
  2. Simple Greedy Search: They use Chinese-specific features (like Pinyin) but optimize the attack using simple, greedy methods that result in unnatural sentences.

As shown in the comparison below, transfer attacks often result in words that change the meaning entirely (“paradise” becomes “fairyland,” which is fine, but “fairyland” might not fit the context). In contrast, the Sound-Shape Code (SSC) approach we are discussing generates candidates that sound or look like the original, maintaining the “flow” of the sentence for a native reader.

Candidates comparison of English transfer attack and Chinese Sound-Shape Code (SSC) attack.

The researchers identified that to break Chinese models effectively, we need to manipulate the text the way Chinese is actually constructed: through Sound (pronunciation) and Shape (visual structure).

Core Method: The ISSC Algorithm

The ISSC method is built on two main pillars:

  1. Substitution: Using Sound-Shape Code (SSC) to find the perfect impostor characters.
  2. Optimization: Using an Adaptive Immune Algorithm (IA) to decide which characters to swap and in what order.

Let’s break these down.

Pillar 1: Sound-Shape Code (SSC) Substitutions

English is linear. If you misspell “identity” as “idenity,” it’s a character-level error. Chinese is two-dimensional and tonal. A character has a visual layout and a specific pronunciation (Pinyin).

The Sound Aspect

Chinese has many homophones—characters that sound the same but have different meanings. This is a common source of typos for humans using Pinyin input methods. An adversarial attack that leverages this mimics natural human error.

Illustration of same or similar pronunciations in Chinese words.

The Shape Aspect

Chinese characters are also visual. They are built from radicals and strokes. Some characters look incredibly similar, differing only by a single stroke or radical.

Illustration of characters in similar glyph.

The Sound-Shape Code

The researchers utilize a sophisticated encoding system called Sound-Shape Code (SSC). This system maps every Chinese character into a unified code that represents both its auditory and visual properties.

As illustrated in the diagram below, the SSC is an 11-bit code.

  • Bits 1-5 (Sound): Encode the Final, Initial, Tone, Final complement, and Structure.
  • Bits 6-11 (Shape): Encode the Four-corner code (a way of indexing characters by their corners) and the stroke count.

Sound-Shape Code and its components.

To find a substitute for a word, the algorithm calculates the similarity between the target character’s SSC and other characters in the dictionary. If two characters share a high overlap in their SSC (e.g., same tone, similar four-corner code), they are excellent candidates for substitution because the difference is visually and phonetically subtle.

The algorithm pays close attention to Structure and Four-Corner Codes. The structure defines how the character is laid out (left-right, up-down, enclosed), which strongly influences visual similarity.

Illustration of Chinese characters with different structures.

Illustration of Chinese characters with fourcorner code.

By using this encoding, the algorithm ensures that the “impostor” characters are not just random synonyms, but “look-alikes” or “sound-alikes” that preserve the natural feel of the text.

Pillar 2: Adaptive Immune Optimization

Once we have a list of potential substitute characters, we need to decide which ones to actually use to fool the AI. If a sentence has 20 words, and each word has 10 substitutes, the number of combinations is astronomical. We cannot check them all.

This is where the Immune Algorithm (IA) comes in.

Inspired by biological immune systems, IA is an optimization method. In this analogy:

  • Antigen: The objective function (the AI model we want to fool).
  • Antibody: A potential adversarial sentence (a feasible solution).

The goal is to evolve the antibodies so they have the highest “affinity” to the antigen—meaning they successfully fool the model.

Avoiding the Trap of “Sameness”

Standard optimization algorithms (like Genetic Algorithms) often suffer from “premature convergence.” They find one decent solution and then the whole population of solutions becomes identical, getting stuck in a local optimum.

The ISSC algorithm introduces an Adaptive mechanism to prevent this. It evaluates antibodies based on two factors:

  1. Affinity: How well does it fool the model?
  2. Concentration: How similar is this antibody to others in the population?

The objective function looks like this:

Equation for the objective function S considering affinity J and concentration rho.

Here, \(\mathcal{J}\) is the attack success (affinity), and \(\rho\) is the concentration (similarity). By subtracting the concentration, the algorithm penalizes solutions that are too similar to everyone else. This forces the algorithm to maintain diversity and explore different parts of the search space.

The similarity is calculated using edit distance:

Equation for calculating similarity between two antibodies.

The Vaccination Mechanism

This is the most distinct part of their optimization. In a standard evolutionary algorithm, the worst candidates are usually deleted and replaced by random new ones.

However, in adversarial attacks, “random” new sentences are usually terrible. They take too long to evolve into something useful.

The authors propose a Vaccination operation. Instead of killing off the weak antibodies, the algorithm takes the “best” antibody found so far (the global optimal) and injects some of its segments (genes) into the weaker antibodies.

This “vaccine” gives the weak candidates a boost, transferring known good traits while allowing them to retain their own unique differences. This significantly speeds up the search for a successful attack.

Experiments & Results

The researchers tested ISSC on five Chinese datasets, including news classification (Chinanews) and sentiment analysis (ChnSentiCorp, JD.com reviews). They attacked six different deep learning models, including BERT, RoBERTa, and LSTM.

They compared their method against strong baselines, including:

  • English transfers: BEAT, GA, PSO.
  • Chinese-specific: Argot, ES (Expanding Scope).

Attack Success Rate (ASR)

The primary metric is how often the attack successfully fooled the model.

Table showing the attack success rate and modification rate of all attack methods.

As shown in the table above (Table 2), ISSC (far right column) consistently achieves the highest Attack Success Rates. For example, on the Chinanews dataset against a CNN model, it achieved 100% ASR. On average, it outperformed the best baseline methods by over 2%.

Crucially, look at the Modification Rate (MR) columns. A lower MR is better because it means fewer words were changed to achieve the hack. ISSC generally requires fewer changes than Argot or GA to break the model, indicating the changes it makes are more “potent.”

Text Quality and Fluency

An attack is useless if the resulting text looks like garbage. To measure quality, the authors used BERTScore (semantic similarity) and Perplexity (fluency).

Results of BERTScore on all datasets.

In Figure 7, we see the distribution of BERTScore. ISSC (the last violin plot) consistently shows a higher median score and a tighter distribution near 1.0. This proves that the adversarial sentences retain the meaning of the original text much better than methods like GA or PSO.

Results of perplexity on all datasets.

Figure 8 shows the change in perplexity (\(\Delta\) PPL). A smaller change means the sentence is more natural. ISSC is competitive here, often producing smoother text than the baselines, thanks to the Sound-Shape Code ensuring linguistic consistency.

Does the “Vaccination” Help?

The authors performed an ablation study to see if their specific algorithmic choices mattered. They compared the full ISSC against a version without the vaccination module (ISSC-vacc).

Ablation results of vaccination on three models with the Chinanews dataset.

The results in Table 6 confirm that without vaccination, the Attack Success Rate drops and the number of queries (how many times they had to ask the model “did this work?”) increases. The vaccination mechanism makes the attack faster and more effective.

Conclusion

The Adaptive Immune-based Sound-Shape Code (ISSC) algorithm represents a significant step forward in understanding the vulnerabilities of Chinese NLP models. By moving away from simple translation tactics and embracing the specific visual and auditory nature of the Chinese language, the researchers created a method that is both highly effective and stealthy.

Key takeaways:

  1. Language Specificity Matters: You cannot effectively attack (or defend) Chinese models using only English-centric techniques. The Sound-Shape Code proves that leveraging character structure and pronunciation creates superior adversarial candidates.
  2. Diversity in Optimization: The use of an Immune Algorithm with concentration control prevents the search from getting stuck, a common problem in high-dimensional text attacks.
  3. Vaccination Strategy: Injecting “best-so-far” traits into weaker solutions is a powerful way to accelerate convergence in evolutionary algorithms.

This work serves as a reminder that as we deploy AI globally, we must rigorously test them against attacks that are native to the languages they serve. Robustness in English does not guarantee robustness in Chinese.