Introduction
Imagine you have a personal assistant who has read every email you’ve ever written, knows exactly which movies you like, and understands your writing style perfectly. Now, imagine trying to build that assistant using today’s Large Language Models (LLMs). You face a difficult dilemma.
Option one is to use a prompt-based approach (like RAG), where you feed your private history into a centralized model like ChatGPT. This works, but it raises serious privacy concerns—do you really want to send your personal data to a remote server for every query? Option two is to fine-tune your own personal model. This keeps your data safer and provides deeper personalization, but it is computationally expensive. If a service has one million users, maintaining one million separate fine-tuned models (a paradigm known as “One-PEFT-Per-User”) creates a storage and cost nightmare.
But what if there was a third way? What if we could treat personalization like a Lego set?
This is the core idea behind PERSONALIZED PIECES (PER-PCS), a new framework introduced by researchers at the University of Notre Dame. Instead of training a massive new model for every user, PER-PCS allows users to “share” small, modular pieces of their learned parameters. A new user can then assemble a custom model by picking the best pieces from the community pool that match their specific history—without ever exposing their raw data.
In this post, we will dive deep into how PER-PCS works, the mathematics behind its “assembly” process, and why it might represent the future of efficient, collaborative AI.
The Context: The Personalization Bottleneck
To understand why PER-PCS is necessary, we first need to look at the current state of LLM personalization. Off-the-shelf models are “one-size-fits-all.” They are trained on general internet data and don’t know that you prefer sci-fi movies over rom-coms or that you write formal academic emails rather than casual texts.
The Two Main Approaches
- Prompt-Based Personalization: This involves retrieving relevant snippets from your history and stuffing them into the prompt (context window). While effective, this is limited by the context window size and privacy risks.
- Parameter-Efficient Fine-Tuning (PEFT): This is the more robust approach. Instead of updating all billions of parameters in an LLM, we freeze the main model and train small adapter modules—most commonly using LoRA (Low-Rank Adaptation).
The state-of-the-art in PEFT personalization has been OPPU (One-PEFT-Per-User). In this setup, if you are User A, you train your own LoRA module. User B trains their own. This provides excellent ownership and performance. However, it scales linearly. 100 users? 100 modules. 10 million users? You effectively need a data center just to store everyone’s personal parameters. Furthermore, OPPU is isolated; User A’s model learns nothing from User B, even if they have very similar tastes.
The Solution: Personalized Pieces (PER-PCS)
PER-PCS changes the paradigm from “owning a whole model” to “assembling a model from shared parts.”

As illustrated in Figure 1, the framework operates on a community principle. We have Sharers (users willing to contribute parameters) and Target Users (users who need a personalized model).
The genius of PER-PCS lies in its modularity. It recognizes that a user’s preference isn’t a single monolithic block. It’s a combination of many small traits. By breaking down fine-tuned parameters into “pieces,” the system creates a pool of building blocks. A target user can then scan their own history, look at the pool, and say, “I’ll take a piece of the blue puzzle, a piece of the red, and two of the green,” creating a bespoke model on the fly.
The Core Method
Let’s break down the architecture into three distinct stages: Sharer Preparation, Gate Training, and Target Assembly.
1. Sharer Selection and Decomposition
First, the system needs a pool of parameters. It identifies “Sharer” users—typically those with rich history data who represent different clusters of behavior.
The researchers use LoRA for the fine-tuning. Recall that LoRA modifies a linear layer in a neural network (\(W_o\)) by adding a low-rank update (\(BA\)). Usually, we think of the “LoRA module” as the collection of all these updates across the whole network.
PER-PCS takes a more granular view. It defines a “Piece” as the specific parameter update at a single layer. If an LLM has 32 layers, a single Sharer contributes 32 distinct pieces to the pool.
2. The Routing Gate: Labeling the Pieces
If we toss thousands of parameter pieces into a pool, how does a target user know which ones to pick? We cannot simply test them all; that would be too slow.
The solution is a Gate. For every piece contributed by a sharer, the system trains a tiny vector—a “gate vector”—that acts as a signature or a key.
For a specific layer \(l\) and a sharer \(s_i\), the gate vector \(g_{s_i}^l\) is trained to predict whether this specific piece is useful for the sharer’s own history. The training is lightweight and “post-hoc,” meaning it happens after the LoRA training. The modified forward pass during this training phase looks like this:

Here, \(\sigma\) is the sigmoid function. The gate vector \(g\) learns to activate (output close to 1) when the input activation \(v_t^l\) matches the context where this piece performs well.
3. Assembling the Target User’s Model
This is where the magic happens. A new Target User arrives. They have no personal model trained. They only have their history of text (emails, reviews, tweets).
Instead of training, PER-PCS performs an Auto-Regressive Assembly.

As shown in Figure 2, the process works layer by layer, input by input:
- Input Processing: The target user’s history is fed into the base LLM.
- Scoring: At every layer, the system compares the current hidden states (activations) of the target user against the Gate Vectors of all available pieces in the pool. It uses a similarity score to see which pieces “resonate” with the user’s current context.
The scoring equation calculates the compatibility (\(\alpha\)) between the sharer’s piece (\(s_i\)) and the target user’s history:

- Selection (Top-k): The system selects the top-\(k\) pieces with the highest scores.
- Weighting: It doesn’t just average them; it calculates weights using a softmax function so that the most relevant pieces have the strongest influence.

- Aggregation: Finally, the selected pieces are mathematically combined to form a temporary, customized weight matrix for that layer for the target user.

The result? A Personalized PEFT (\(\Delta W_{\hat{u}}^l\)) that is constructed entirely from other people’s knowledge but tailored specifically to the target user.
Why is this efficient?
This approach is “training-free” for the target user. No backpropagation is required. Furthermore, regarding storage, the target user doesn’t need to save the heavy matrices (\(A\) and \(B\)). They only need to save the indices (which pieces did I pick?) and the weights (how much do I use them?). This reduces storage requirements by orders of magnitude.
Experiments and Results
The researchers tested PER-PCS on the LaMP benchmark, a suite of personalization tasks ranging from predicting product ratings to generating personalized news headlines. They compared their method against non-personalized baselines, Retrieval-Augmented Generation (RAG), and the resource-heavy OPPU method.
Performance Comparison
The results, summarized in Table 1 below, are compelling.

Key Takeaways from the Data:
- Beating the Baseline: PER-PCS (Ours) consistently outperforms the non-personalized base model and standard RAG (retrieval) approaches.
- Rivaling the Upper Bound: OPPU (the right-most columns) represents the “gold standard” where every user gets a dedicated trained model. PER-PCS achieves performance very close to OPPU—and in some cases, statistically ties with it—despite not training on the target user.
- Better than Retrieval PEFT: The “PEFT Retrieval” baseline (simply grabbing a whole model from a similar user) performs significantly worse than PER-PCS. This proves that piece-level composition (mixing and matching layers) is superior to model-level selection.
The Efficiency Gap
The true power of PER-PCS becomes visible when we look at resource consumption.

Figure 7 illustrates the massive scalability advantage.
- Storage (Top Graph): As the number of users grows (x-axis), the storage required for OPPU (orange line) skyrockets linearly. PER-PCS (green line) remains effectively flat. This is because adding a new user in PER-PCS only costs a few kilobytes of index data, rather than megabytes of parameter data.
- Time (Bottom Graph): The assembly time for PER-PCS is nearly instantaneous compared to the training time required for OPPU.
Robustness: Do we need many sharers?
A common critique of collaborative systems is the “cold start” problem. What if only a few people share their parameters?

Figure 3 shows the system’s performance as the number of sharers increases from 10 to 100. Surprisingly, PER-PCS (orange line) is incredibly stable. Even with a small pool of sharers, the system can find relevant pieces to construct a good model. It consistently beats the PEFT Retrieval baseline (blue line), which fluctuates unpredictably.
Furthermore, the researchers investigated whether sharers need to share all their parameters.

As shown in Figure 5, even if sharers only consent to share 20% or 40% of their pieces (to preserve privacy or reduce bandwidth), PER-PCS maintains high performance. It doesn’t collapse, showing that the system is highly redundant and robust.
Discussion and Implications
Privacy through Modularity
One of the most interesting aspects of PER-PCS is its approach to privacy. In a standard RAG system, your raw data (text) is retrieved and sent to the model. In PER-PCS, what is shared is the parameter pieces (\(A\) and \(B\) matrices). While these parameters capture behavior, they are abstract mathematical representations, not raw text.
Moreover, because the target user constructs their model by combining pieces from dozens of different sharers, it becomes difficult to reverse-engineer the specific identity or data of any single source.
Fine-Grained vs. Coarse-Grained
The “Case Study” in the paper provides a visual intuition of why this works better than simply retrieving a neighbor’s model.

In Figure 6, the black dots represent the pieces selected by PER-PCS. Notice how scattered they are? For a single target user, the model picks layer 1 from Sharer A, layer 2 from Sharer B, and layer 3 from Sharer C. The “PEFT Retrieval” baseline (dotted line) forces the user to take all layers from one sharer. The scatter plot proves that optimal personalization requires mixing and matching—a fine-grained approach.
Conclusion
PER-PCS represents a significant shift in how we think about personalized AI. It moves us away from the computationally wasteful “one-model-per-person” approach and towards a collaborative “sharing economy” of model parameters.
By breaking LLM adaptation into modular pieces and using intelligent gating to route them, PER-PCS achieves the holy grail of personalization:
- High Performance: Comparable to fully trained personal models.
- High Efficiency: Drastically lower storage and compute costs.
- Privacy Preservation: Users share parameters, not raw history.
For students and researchers entering the field, this paper highlights the importance of modularity in deep learning. As models get larger, our ability to adapt them shouldn’t necessarily require more brute force training. sometimes, it just requires putting the right pieces together.
](https://deep-paper.org/en/paper/2406.10471/images/cover.png)