Machine learning models are often criticized for being black boxes. We feed them data, they produce an output, yet the reasoning behind their decisions is hidden. This lack of transparency can be unacceptable in high-stakes areas like medical diagnosis or loan applications, where understanding the why matters just as much as the what. Clustering—the task of grouping similar data points—is no exception. How can we trust a model’s clusters if we cannot understand the logic that creates them?
Enter explainable AI. The goal: design models that are both accurate and interpretable to humans. For clustering, Dasgupta et al. (2020) proposed a simple but powerful idea—define clusters using threshold decision trees. Instead of complex, irregular boundaries, clusters are defined by straightforward axis-aligned cuts like “age > 30” or “income ≤ $50,000.” These boundaries make the logic behind clustering crystal clear.
Of course, this simplicity comes at a cost: forcing clusters to live inside neat rectangular boxes can distort the data’s natural grouping and increase clustering cost. This trade-off is called the price of explainability. The critical question is: How high is this price? And can we find an algorithm that achieves the lowest possible price?
A recent paper, Random Cuts Are Optimal for Explainable k-Medians, gives a definitive answer. It analyzes a strikingly simple algorithm called RANDOMCOORDINATECUT and proves that it achieves the optimal competitive ratio. This work doesn’t just present a great algorithm—it closes a theoretical gap and reframes a complex geometric problem as a clean probabilistic game.
What Is Explainable Clustering?
Before diving into the paper’s core contribution, let’s build some intuition.
Standard k-medians clustering partitions a dataset \(X\) into \(k\) clusters by selecting \(k\) center points that minimize the sum of distances from each data point to its assigned center. Here we focus on the \(\ell_1\)-norm, or Manhattan distance, \( \|x - c\|_1 = \sum_j |x_j - c_j| \).
In standard k-medians, clusters are defined by Voronoi partitions: each point belongs to its closest center. As shown in Figure 1 (left), this can create complex, non-rectangular boundaries—especially under \(\ell_1\) distance—making it difficult to explain why a point belongs to one cluster versus another.
Figure 1: Left: Voronoi partition for unconstrained k-medians under \(\ell_1\) distance. Middle: explainable partition with rectangular regions. Right: the corresponding decision tree.
In explainable k-medians, cluster boundaries must be axis-aligned. You build a decision tree where each split is a simple threshold cut \(x_j \le \theta\). This creates \(k\) rectangular clusters \(P_1,\dots,P_k\). The k-medians cost is computed using the optimal medians for these rectangles:
Figure 2: The explainable k-medians objective sums \(\ell_1\) distances from each point to the median of its rectangular cluster.
The main metric is the competitive ratio—the cost of the explainable clustering divided by the optimal unconstrained cost. Dasgupta et al. showed no algorithm could beat \(\Omega(\log k)\), and existing analyses of RANDOMCOORDINATECUT gave an \(\mathcal{O}(\log k \log \log k)\) bound. Was the algorithm suboptimal, or was the analysis loose?
The Algorithm: RANDOMCOORDINATECUT
RANDOMCOORDINATECUT takes \(k\) reference centers (computed by a standard k-medians algorithm) and builds a threshold decision tree isolating them.
Steps:
- Initialize: Start with a single root node representing all of space; assign all \(k\) reference centers to it.
- Recursive splitting: While any leaf node contains more than one center:
- Pick a random coordinate \(j\) (e.g., \(x\)-axis, \(y\)-axis).
- Pick a random threshold \(\theta\) along that coordinate.
- For each leaf \(u\), if the cut \((j,\theta)\) divides its centers into two non-empty groups, split \(u\) into:
- Left child: all points with \(x_j \le \theta\).
- Right child: all points with \(x_j > \theta\).
- Stop: When every leaf contains exactly one center. The resulting tree defines \(k\) rectangular clusters.
Figure 3: A random coordinate cut (j, θ) splits a leaf’s centers into left/right children if both sides are non-empty.
Earlier analyses showed its expected cost is \(\mathcal{O}(\log k \log \log k)\) competitive.
Figure 4: Previous analysis bounded the competitive ratio by \(\mathcal{O}(\log k \log \log k)\).
The Analysis: Turning Clustering into a Game
This paper’s key insight is to transform the algorithm’s behavior into a Set Elimination Game, stripping away geometry and focusing on a clean random process.
Idea: Consider a single data point \(x\) that ends up assigned to center \(c^i\). The cost is \(\|x - c^i\|_1\). We compute its expected cost via the game.
- Cuts as elements: Let \(\Omega\) be all possible cuts \((j,\theta)\) in the bounded data space.
- Distances as sets: For each center \(c^i\), define \(S_i \subset \Omega\) as the set of cuts that separate \(x\) from \(c^i\).
Figure 5: \(S_i = \{(j,\theta) \in \Omega : \mathrm{sign}(x_j - \theta) \neq \mathrm{sign}(c_j^i - \theta)\}\).
Crucial connection: The measure \(\mu(S_i)\) equals \(\|x - c^i\|_1\) because each coordinate’s contribution is the length of the separating interval \(|x_j - c_j^i|\), summed over \(j\).
Set Elimination Game rules:
- Players: \(S_1,\dots,S_k\).
- Rounds: Draw random \(\omega \in \Omega\).
- Elimination: Remove every \(S_i\) containing \(\omega\), unless all remaining sets contain \(\omega\), in which case no one is eliminated.
Figure 6: A cut eliminates any sets it separates unless it separates all remaining sets.
The game ends with one winner, and the cost is \(\mu(\mathrm{winner})\). The final center assigned to \(x\) corresponds to the winning set; expected clustering cost equals expected game cost.
The authors prove:
Figure 7: The expected cost \(\mathbb{E}[\mu(\mathrm{winner})]\) \(\le (2\ln k + 2) \cdot \min_i \mu(S_i)\).
In clustering terms, \(\min_i \mu(S_i)\) is \(x\)’s distance to its closest center. Summing over \(x\) shows RANDOMCOORDINATECUT’s competitive ratio is at most \(2\ln k + 2\):
Figure 8: Tight bound matches the \(\Omega(\log k)\) lower limit—optimal.
A Glimpse into the Proof
The proof introduces exponential clocks: imagine each cut \(\omega\) has an independent exponential clock with rate \(\mu(\omega)\); the hitting time \(h(\omega)\) is when its clock rings.
For a set \(S\), \(h(S) = \min_{\omega\in S} h(\omega)\) is exponential with rate \(\mu(S)\).
Figure 9: Hitting time \(h(X)\) is the first time a cut in \(X\) appears.
Disjoint sets have independent hitting times—a powerful property.
The authors assume \(S_1\) is smallest (\(\mu(S_1) \le \mu(S_i)\) for all \(i\)) and break the expected cost into:
- Winner is \(S_1\).
- Winner is a surprise set (large but survives unusually long: \(h(S_i) \ge (\ln k)/\mu(S_i)\)).
- Winner is a non-surprise set other than \(S_1\).
Figure 10: Surprise sets are rare; probability \(\le \frac{1}{k} \cdot \frac{\mu(S_1)}{\mu(S_i)}\).
Term (1) is \(\le \mu(S_1)\). Term (2) is also \(\le \mu(S_1)\) by rarity of surprise winners.
Term (3) requires more care: a non-surprise winner’s hitting time is small, letting the authors bound its contribution by \(2L\mu(S_1)\) with \(L = \ln k\).
Figure 11: Cost decomposition by type of winning set.
Summing the bounds yields:
Figure 12: \(\mathbb{E}[\mu(\mathrm{winner})] \le (2\ln k + 2)\mu(S_1)\).
Conclusion and Takeaways
This paper exemplifies theory solving practical problems: it answers how well we can cluster and explain at the same time.
Key takeaways:
- Optimality is achievable: RANDOMCOORDINATECUT is provably optimal for explainable k-medians; the price of explainability is \(\Theta(\log k)\).
- Gap closed: The competitive ratio improves from \(\mathcal{O}(\log k \log \log k)\) to a tight \(\mathcal{O}(\log k)\), matching the lower bound.
- Elegant tools: The Set Elimination Game abstraction and exponential clocks technique offer reusable methods for analyzing randomized algorithms.
In the broader mission to make AI transparent, this is encouraging: sometimes we can achieve full interpretability—clusters defined by simple rules—without undue cost. With nothing more than random coordinate cuts, we can construct explanations that are, provably, the best possible.