How We Calculate Name Frequency

Data Sources

We use two authoritative U.S. government datasets:

Social Security Administration (SSA)

National baby name statistics from 1880 to present, plus state-level data. Covers every registered birth in the United States. Names with fewer than 5 occurrences in a year are excluded for privacy.

U.S. Census Bureau

Surname frequency data from the decennial census, including per-100,000 frequency rates and ancestry/ethnicity breakdowns. We use the most recent available year (typically 2010 or 2020).

Estimating Living Americans with a First Name

The SSA provides birth counts, not living population counts. To estimate the number of living Americans with a given first name, we apply approximate survival rates to each birth cohort:

estimated_living = sum over all years of:

births_in_year x survival_rate(year)

Survival rates are derived from simplified U.S. life tables. For example, someone born in 1960 (age ~65) has an estimated ~88% chance of being alive today, while someone born in 1940 (age ~85) has roughly 50%.

The first name frequency is calculated as the ratio of recent births (since 1940) for that name to total recent births across all names.

Estimating Full Name Combinations

To estimate how many people share a specific first + last name combination, we use the statistical independence formula:

P(full name) = P(first name) x P(last name)

estimate = P(full name) x U.S. population

Where P(first name) is the first name frequency among living Americans, and P(last name) is the surname frequency from Census data (prop100k / 100,000). The current U.S. population used is approximately 335,893,238.

The Independence Assumption

This method assumes that first names and last names are statistically independent. In reality, there can be correlations:

  • Cultural naming patterns may link certain first names with certain ethnic surnames
  • Regional naming trends may correlate with regional surname distributions
  • Generational trends in first names may not perfectly align with surname distributions

For most common name combinations, the independence assumption produces reasonable estimates. For names strongly associated with specific ethnic or cultural groups, the estimate may be less accurate.

Average Age Estimation

Average age is calculated by weighting each birth year by both the number of births and the survival probability:

avg_age = sum(age x births x survival) / sum(births x survival)

This accounts for the fact that names popular in earlier decades have older average bearers, and mortality reduces the weight of very old cohorts.

Rarity Classification

Classification Estimated Living Bearers
Very Common500,000+
Common100,000 to 499,999
Uncommon10,000 to 99,999
Rare1,000 to 9,999
Very RareUnder 1,000

Limitations

  • All figures are statistical estimates, not official counts
  • SSA data only covers U.S. births, not immigrants who received their names abroad
  • Names with fewer than 5 births per year are excluded from SSA data
  • Census surname data is released on a decennial basis and may lag current demographics
  • Name changes (marriage, legal changes) are not reflected in birth records
  • The survival model uses simplified life tables, not age/sex/race-specific rates

Privacy

All data used is publicly available from U.S. government sources. No individual-level data is used or stored. Search queries are processed in real time and are not logged.