Methodology
How the rankings are built
This page documents how the Top 100 list is constructed, what is in the data, and what is deliberately left out. The abc conjecture ranking uses the same three-source composite design as the sister Goldbach and Riemann hypothesis sites: arXiv preprint output, OpenAlex topical citations, and zbMATH MSC classifications. The abc field has a substantial arXiv presence, so the current ranking is built mainly from the arXiv signal, with OpenAlex contributing where it overlaps; the zbMATH MSC layer is being integrated.
Data sources
| Source | What it gives | Limitations |
|---|---|---|
| arXiv (math.NT, math.AG) | Preprint-level: titles, abstracts, authors, dates, co-author graph | Biased toward researchers who post preprints. Senior figures who publish mainly in journals are undercounted. |
| OpenAlex | Author-level: paper count, citations, affiliations, country | Concept tagging is noisy in math; surname-only matching can misidentify authors. |
| zbMATH Open | Curated math review database; canonical author codes; editor-assigned MSC classification (MSC classes 11J25, 11D75, 11J87, 11G05) | Coverage of older non-Western mathematicians is the best of the three sources; the REST API is gated behind a one-time Terms-of-Use acceptance. This layer is being folded in. |
Search terms
The arXiv and OpenAlex pulls use the following 14 terms, restricted to the math.NT and math.AG categories on arXiv:
- abc conjecture
- Szpiro conjecture
- Vojta conjecture
- S-unit equation
- height inequality
- Belyi map
- Wieferich prime
- Thue equation
- inter-universal Teichmuller
- Nevanlinna theory arithmetic
- integral points on curves
- effective Mordell
- radical of integers
- Mason-Stothers theorem
These terms were selected by a term-analysis pass on a first draft pull. Broad general-math terms (such as "Diophantine approximation") were excluded because they pulled hundreds of papers about metric Diophantine approximation (Schmidt, Jarnik, homogeneous dynamics) that have no connection to the abc conjecture. The terms above keep the corpus on the abc, Szpiro, Vojta, S-unit, and height themes.
Pipeline
Title-weighting. A paper can mention the abc conjecture without being about it, for example by citing it as a motivation or famous open problem. To reduce this noise, the arXiv and OpenAlex pipelines weight a keyword match by where it appears: a match in the paper title counts at full weight, and a match only in the abstract counts at half (a factor of 0.5). zbMATH is not title-weighted, because its documents are classified by human editors.
- arXiv pull: 14 search terms restricted to the math.NT and math.AG categories. Each paper's contribution to an author is title-weighted. A co-authorship graph is built and eigenvector centrality is the second factor in an arXiv composite of
0.60 * pr(weighted papers) + 0.40 * pr(eigen). Authors with at least 3 topical papers qualify. The pull yielded 412 unique papers and 68 qualifying authors. - OpenAlex pull: the same 14 phrase queries, with an author cap of 10 per work to remove megapapers. Works and their citations are title-weighted. Composite:
0.60 * pr(weighted works) + 0.40 * pr(weighted citations). Result: 187 authors found, 18 qualifying (works at least 3). - zbMATH pull: documents tagged with the MSC classes 11J25 (Diophantine approximation, abc conjecture), 11D75 (Diophantine inequalities), 11J87 (Schmidt Subspace Theorem and generalizations), and 11G05 (elliptic curves over global fields). This pull is in progress and is being folded into the merged ranking.
- Merge and scoring: the rankings are surname-deduplicated and joined. The available ranks are combined with a weighted order statistic: each researcher's ranks are sorted and weighted
0.70on the best,0.20on the middle, and0.10on the worst. Lower combined score ranks higher. An earlier design simply summed the ranks, which punished anyone outstanding in one source but weak in another; the weighted order statistic fixes that. - Estimating a missing rank: a researcher ranked by only one pipeline is not given a flat penalty. To estimate a missing rank, we order the whole pool by a pipeline the researcher does appear in, then walk outward to the two nearest researchers above and the two nearest below who carry a real rank in the missing pipeline, and average those values. The 0.70 top weight may only land on a measured rank, so an estimate can support a score but can never be its headline signal. Estimated ranks show in [square brackets] on the Top 100 table; measured ranks show plain.
- Hand-curated edits: an exclusions file removes researchers the automated pipeline surfaced in error. The merge does not hand-place any researcher; everyone earns their rank from the pipeline scores.
Audit decisions
Excluded
A small number of authors surfaced by the automated pipeline are removed by hand when they are clearly off-topic (for example, researchers whose surname coincides with a term in the query set but who work in an unrelated field). The specific names are kept internal.
The Mochizuki IUT literature
Shinichi Mochizuki announced a claimed proof of the abc conjecture via Inter-universal Teichmuller theory in 2012. As of 2025, the proof has not been verified or accepted by the broader mathematical community. The pipeline treats Mochizuki as one researcher among many and ranks by publication activity, not by the proof claim. Mochizuki and his direct collaborators may rank lower than their publication output warrants because IUT-specific terminology does not overlap with the standard abc/Szpiro/Vojta search terms used here.
What is not in this list
- Researchers without a digital footprint. The pipeline indexes arXiv well and OpenAlex moderately. Figures who publish mainly in journals are undercounted until the zbMATH MSC layer is fully integrated.
- Subjective importance. A theorist whose entire body of work is one influential paper may rank lower than a productive researcher with many adjacent papers. The ranking measures output, not depth.
- The full IUT community. The claimed proof literature uses specialized terminology not covered by the current search terms. This is a known gap in the current v1 build.