Discrimination Backtest — Does Watchlist Rank Predict Designation?
Generated 2026-07-02T23:41:38+00:00
Result. Among entities our data links to a sanctioned party, watchlist rank separates the companies designated in the following year from those never designated — AUC 0.6623 at the start of the invasion wave rising to 0.7864 by 2025, improving every year. A company on the qualified watchlist is designated the next year at 117–238× the rate of a random Russian company. This is the ranking counterpart to the recall backtest: that one asks whether we flag designations at all; this asks whether we rank them near the top.
The recall backtest asks a yes/no question (did any pre-designation link exist) and never uses the score that orders the watchlist. This asks the ranking question directly: are the entities we rank highest the ones actually designated next? Point-in-time landmark case-control — at each Jan-1 date every entity is scored from links dated strictly before it, gated so the counterparty was already sanctioned when the link formed.
Two AUCs, two questions. AUC-population vs a random EGRUL company (symmetric zeros) folds selection and ranking together — being on the list plus where on it. AUC-within-visible compares only entities already linked to sanctioned parties, isolating pure ranking quality. 0.50 = no better than a phone book; 1.0 = perfect. Recall-visible is the share of window designations our data could see at all before T (the recall ceiling). A never-designated control is not a false positive — designation is a throttled sample of the sanctionable set — so this scores ranking, not calibrated probability.
Per-landmark discrimination
| Landmark T | Cases (visible) | Recall-vis | AUC-population | AUC-within-visible |
|---|---|---|---|---|
| 2022-01-01 | 1,589 (513) | 32.3% | 0.6605 | 0.6623 |
| 2023-01-01 | 3,917 (968) | 24.7% | 0.622 | 0.7331 |
| 2024-01-01 | 1,578 (419) | 26.6% | 0.6302 | 0.7546 |
| 2025-01-01 | 1,031 (299) | 29.0% | 0.6414 | 0.7864 |
Designation enrichment by score bucket (sampling-corrected, next-1yr window)
Lift = how many more times likely a company in the bucket is to be designated in the next year than a random EGRUL company. qualified is the published watchlist definition (≥2 link types or ≥3 sanctioned ties).
| Landmark T | Base rate | flagged lift | qualified lift | multi-channel lift | ≥3-channel lift |
|---|---|---|---|---|---|
| 2022-01-01 | 0.0129% | 157.4× | 237.7× | 990.7× | 829.9× |
| 2023-01-01 | 0.0318% | 73.1× | 133.8× | 441.1× | 761.6× |
| 2024-01-01 | 0.0128% | 46.7× | 121.1× | 312.0× | 299.9× |
| 2025-01-01 | 0.0084% | 34.7× | 117.3× | 309.9× | 1645.6× |
Pooled (indicative — controls recur across years)
- Cases 8,115 (2,199 visible pre-window)
- AUC-population 0.6333 · AUC-within-visible 0.7666
- Base designation rate 0.0165%; being on the qualified watchlist raises it 137.6×, ≥3-channel 726.1×.
Reading it. AUC-within-visible isolates ranking given a link exists (does breadth/tie-count order the linked crowd correctly); AUC-population adds selection (being flagged at all vs a random company). Recall-visible is the orthogonal recall ceiling — the share of designations our data sees pre-designation. The enrichment table is the buyer-facing number: work the qualified list and you hit next-year designations at many times the base rate. A never-designated control is not a false positive — designation is a throttled sample of the sanctionable set, so this is enrichment of real designations, not a precision claim.