How We Calculate Name Frequency
Data Sources
We use two authoritative U.S. government datasets:
Social Security Administration (SSA)
National baby name statistics from 1880 to present, plus state-level data. Covers every registered birth in the United States. Names with fewer than 5 occurrences in a year are excluded for privacy.
U.S. Census Bureau
Surname frequency data from the decennial census, including per-100,000 frequency rates and ancestry/ethnicity breakdowns. We use the most recent available year (typically 2010 or 2020).
Estimating Living Americans with a First Name
The SSA provides birth counts, not living population counts. To estimate the number of living Americans with a given first name, we apply approximate survival rates to each birth cohort:
estimated_living = sum over all years of:
births_in_year x survival_rate(year)
Survival rates are derived from simplified U.S. life tables. For example, someone born in 1960 (age ~65) has an estimated ~88% chance of being alive today, while someone born in 1940 (age ~85) has roughly 50%.
The first name frequency is calculated as the ratio of recent births (since 1940) for that name to total recent births across all names.
Estimating Full Name Combinations
To estimate how many people share a specific first + last name combination, we use the statistical independence formula:
P(full name) = P(first name) x P(last name)
estimate = P(full name) x U.S. population
Where P(first name) is the first name frequency among living Americans, and P(last name) is the surname frequency from Census data (prop100k / 100,000). The current U.S. population used is approximately 335,893,238.
The Independence Assumption
This method assumes that first names and last names are statistically independent. In reality, there can be correlations:
- Cultural naming patterns may link certain first names with certain ethnic surnames
- Regional naming trends may correlate with regional surname distributions
- Generational trends in first names may not perfectly align with surname distributions
For most common name combinations, the independence assumption produces reasonable estimates. For names strongly associated with specific ethnic or cultural groups, the estimate may be less accurate.
Average Age Estimation
Average age is calculated by weighting each birth year by both the number of births and the survival probability:
avg_age = sum(age x births x survival) / sum(births x survival)
This accounts for the fact that names popular in earlier decades have older average bearers, and mortality reduces the weight of very old cohorts.
Rarity Classification
| Classification | Estimated Living Bearers |
|---|---|
| Very Common | 500,000+ |
| Common | 100,000 to 499,999 |
| Uncommon | 10,000 to 99,999 |
| Rare | 1,000 to 9,999 |
| Very Rare | Under 1,000 |
Limitations
- All figures are statistical estimates, not official counts
- SSA data only covers U.S. births, not immigrants who received their names abroad
- Names with fewer than 5 births per year are excluded from SSA data
- Census surname data is released on a decennial basis and may lag current demographics
- Name changes (marriage, legal changes) are not reflected in birth records
- The survival model uses simplified life tables, not age/sex/race-specific rates
Privacy
All data used is publicly available from U.S. government sources. No individual-level data is used or stored. Search queries are processed in real time and are not logged.