In the world of algorithm design, there is a constant arms race between the data structure and the “adversary”—the entity generating the inputs. Traditional randomized algorithms work wonders against an oblivious adversary, one who generates a sequence of queries beforehand, unaware of the algorithm’s internal coin flips.
But what happens when the adversary is adaptive? Imagine a scenario where a user (or an attacker) queries your database, sees the result, and uses that information to craft a specifically difficult next query. This creates a feedback loop. The output reveals information about the algorithm’s internal randomness, allowing the adversary to “correlate” their inputs with your private random seed, eventually breaking the correctness guarantees of the system.
This is a massive problem for high-dimensional search and optimization. To handle \(T\) adaptive queries, the naive solution is to run \(T\) independent copies of the data structure, switching to a fresh one for every query. This destroys efficiency.
In the paper “On Differential Privacy for Adaptively Solving Search Problems via Sketching,” researchers propose a sophisticated framework using Differential Privacy (DP) not to protect user data, but to protect the algorithm’s internal randomness. By making the algorithm’s outputs “private” with respect to its own random seed, they prevent the adversary from learning enough to break the system.
This post breaks down how they achieve this for two major problems: Approximate Near Neighbor (ANN) search and Adaptive Linear Regression.
The Core Concept: Privacy as Robustness
The central insight of this research builds on a growing trend: using Differential Privacy (DP) as a tool for algorithmic stability.
Classically, an algorithm \(\mathcal{A}\) is \((\varepsilon, \delta)\)-differentially private if its output distribution doesn’t change much when we change a single element of the input database.

In this context, the “database” isn’t user records; it is the collection of random strings used by a set of data structures. If the output of our search problem is differentially private with respect to these random strings, then the output doesn’t reveal much about any specific random string. Consequently, an adaptive adversary cannot easily exploit specific weaknesses in the randomization.
Previous works proved that for numerical problems (like estimating the cost of a regression), you only need \(\tilde{O}(\sqrt{T})\) copies of a data structure rather than \(T\). This paper asks the harder question: Can we do this for search problems where the output is a high-dimensional vector?
Part 1: Adaptive Approximate Near Neighbor (ANN)
The first challenge is the Approximate Near Neighbor problem. Given a dataset \(U\) and a query \(v\), we want to find a point in \(U\) close to \(v\).
The Setup
We want to support \(T\) adaptive queries. We maintain \(k \approx \sqrt{T}\) copies of an oblivious ANN data structure (like Locality Sensitive Hashing - LSH). For a specific query \(v_t\), we don’t just query one LSH. We sample a small batch of them.
The Problem: Leaking Information
If we simply output the result of one LSH, we reveal exactly which “bucket” the query fell into for that specific random seed. An adversary can use this to map out the geometry of the hash functions.
The Solution: Differentially Private Selection
The researchers treat the outputs of the sampled LSH structures as “votes.” If multiple data structures return the same point \(u \in U\) as a neighbor, that point gets more votes.
To protect the privacy of the random seeds, we can’t just pick the winner. We must use a noisy voting mechanism. Specifically, they employ a variation of the Exponential Mechanism:
- Count how many data structures return each candidate point.
- Add random noise to these counts.
- Return the candidate with the maximum noisy count.
The noise distribution used is the Exponential Distribution:

The “Sparse Argmax” Trick
There is a computational hurdle here. The universe of points \(U\) might be massive (\(n\) points). We cannot generate a noisy vote count for every single point in the database for every query—that would take \(O(n)\) time, defeating the purpose of sublinear search.
The authors introduce the Sparse Argmax Mechanism. They observe that for any query, only a small number of points (\(s\)) are actually returned by the LSH structures. The “vote” vector is incredibly sparse; most points have zero votes.
The algorithm generates noise only for the sparse non-zero entries. For the millions of zero-vote points, they effectively sample the maximum of \(n\) independent noise variables using order statistics logic (computing the “max noise” of the empty buckets) in \(O(1)\) time. This keeps the query time efficient:

Here, \(s\) represents the sparsity of the neighborhood—roughly how many candidate neighbors we expect to find. As long as the neighborhood isn’t too dense (a property formalized as Assumption 1.3), this method provides adaptive security with vastly improved space complexity over the naive method.
Part 2: Adaptive Linear Regression
The second major contribution deals with solving linear regression \(\min_x \|U_t x - b_t\|_2\) where the matrix \(U\) and vector \(b\) are constantly being updated by an adversary.
The Sketching Approach
Standard practice for huge matrices is Sketching. We multiply the input by a random matrix \(S\) (like a CountSketch or Subsampled Randomized Hadamard Transform - SRHT) to shrink the height of the matrix from \(n\) to \(r\), where \(r \ll n\). We then solve the small regression problem.

The Aggregation Challenge
To handle adaptivity, we again maintain \(k \approx \sqrt{T}\) sketches. When a query comes in, we sample a few sketches and solve them, getting a set of candidate solution vectors \(x_{(1)}, \dots, x_{(s)}\).
How do we combine these vectors privately? We can’t just average them (sensitive to outliers). The solution proposed is the Coordinate-wise Private Median.
For each coordinate \(j \in [d]\):
- Collect the \(j\)-th coordinate from all candidate vectors.
- Compute the differentially private median of these scalar values.
- Assemble the result vector \(g\).
The \(\ell_\infty\) Guarantee
This is where the math gets tricky. Standard sketching guarantees that the cost of the regression is preserved (the \(\ell_2\) norm of the residual). It does not usually guarantee that the solution vector \(x\) is close to the optimal \(x^*\) in every single coordinate.
However, if the error is concentrated in one coordinate, the coordinate-wise median might fail. The authors prove that for specific sketches (like SRHT composed with CountSketch), the sketched solution satisfies a stronger \(\ell_\infty\) guarantee:

This bound ensures that the error is spread out across coordinates. This allows the coordinate-wise median to reconstruct a solution vector that is close to the optimal solution \(x^*\) in terms of prediction cost.
By combining the max and min singular values of the matrix, they derive the final utility bound:

This method allows for a highly efficient data structure. As shown in the table below, the space complexity scales with \(\sqrt{T}\) rather than \(T\) or \(n\), offering a massive improvement when \(T\) is large.

Part 3: Handling Ill-Conditioned Matrices
The regression strategy above relies on the condition number \(\kappa\) (the ratio of the largest to smallest singular values) being reasonable. If \(\kappa\) is huge, the \(\ell_\infty\) guarantee becomes too loose, and the error blows up.
For these “hard” cases, the authors pivot to a different technique: Bounded Computation Paths.
The Insight
Even if the adversary is adaptive, the interaction between the algorithm and the adversary can be viewed as a path on a decision tree. If we round the algorithm’s output to a discrete grid (finite precision), the total number of possible output sequences (paths) the algorithm can generate is finite, denoted as \(|\mathcal{P}|\).
Instead of trying to be robust against infinite possibilities, we just need to survive the union bound over all possible computation paths in \(\mathcal{P}\).
The Algorithm
- Use a single, very large sketch (larger than usual, logarithmic in \(|\mathcal{P}|\)).
- Solve the regression.
- Round the solution to the nearest point on a pre-defined grid.
This rounding effectively limits the adversary’s ability to fine-tune inputs based on microscopic variations in the output. The cost is a dependence on \(\log |\mathcal{P}|\), but the dependence on the condition number \(\kappa\) drops from polynomial to logarithmic.
Faster Updates via Batching
Finally, the paper addresses the practical cost of updates. In high-frequency settings (like online weighted matching), updating \(\sqrt{T}\) data structures for every single input is slow.
The authors utilize Fast Rectangular Matrix Multiplication. Instead of updating eagerly, they buffer updates and process them in blocks. By carefully tuning the block size \(\theta\), they amortize the cost of applying the Johnson-Lindenstrauss transform.

This batching makes the “Privacy as Robustness” framework competitive not just in theoretical query complexity, but in actual wall-clock update time.
Conclusion
This research bridges a gap between two distinct fields: Differential Privacy and Streaming Algorithms. It demonstrates that privacy tools are not just for protecting sensitive data; they are powerful mathematical instruments for stabilizing algorithms against worst-case adaptive inputs.
By combining Sparse Argmax selection for search and Coordinate-wise Medians for regression, the authors provide a general recipe for converting oblivious static data structures into robust, adaptive dynamic ones with only a square-root overhead penalty.
Note: The images displayed in this post are extracted directly from the source paper “On Differential Privacy for Adaptively Solving Search Problems via Sketching” to illustrate the core mathematical concepts.
](https://deep-paper.org/en/paper/2506.05503/images/cover.png)