What to know
This page explains what cancer prevalence is and how it is calculated in U.S. Cancer Statistics.
Definition and calculation of cancer prevalence
Prevalence is the number of people with a specific disease or condition in a given population at a specific time. This measure includes both newly diagnosed and pre-existing cases of the disease. It is different from incidence, because incidence measures only the number of newly diagnosed cases in a given population at a specific time.
There are different types of prevalence. For example:
- Annual prevalence is the number of people with the disease at any time during a year.
- Period prevalence is the number of people with the disease at any time during a specified number of years, such as the last 10 years.
- Limited-duration prevalence is the number of people alive on a certain day who were diagnosed with the disease during a specified number of years (such as the last 5 or 20 years).
How cancer prevalence is calculated
Cancer incidence data submitted to CDC's National Program of Cancer Registries (NPCR) in the 2023 data submission period were used to create a data set in SEER*Stat for this analysis.12 The data set included data from 43 NPCR central cancer registries that:
- Met the United States Cancer Statistics (USCS) publication criteria for all years 2001 through 2020.
- Conducted linkage with the National Death Index, active patient follow-up for all years 2001 through 2020, or both.
These registries include Alabama, Alaska, Arizona, Arkansas, California, Colorado, Delaware, Florida, Georgia, Idaho, Illinois, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, Tennessee, Texas, Utah, Vermont, Washington, West Virginia, Wisconsin, and Wyoming. These data cover 92% of the U.S. population.
Cases from these registries were included in the analysis if:
- The case was an invasive cancer diagnosed from 2001 through 2020.
- The patient's age was known and was 0 through 99 years.
- The patient's sex was known.
- The case was not identified solely on the basis of a death certificate or autopsy.
Because NPCR data are available from 2001, 20-year limited-duration prevalence estimates are included in addition to 5-year estimates.
Calculation of limited-duration prevalence
Limited-duration prevalence is the number of people alive on a certain day who were diagnosed with the disease during a specified number of years (such as the last 5 or 20 years).
In this report, the limited-duration prevalence was calculated using SEER*Stat software.2 It estimates, among the people diagnosed with cancer in the last 5 or 20 years, the proportion who were still alive on January 1, 2021.1
- The date of the start of follow-up (month, day, and year) was set to the date of diagnosis.
- The date of the last follow-up (month, day, and year) was set to:
- The date of the last contact if the case was actively followed.
- The date of death if the case was matched to the state death files or the National Death Index.
- Cases not matched to the state death files or the National Death Index were presumed to be alive on the prevalence date.
- The date of the last contact if the case was actively followed.
Multiple primaries
For patients diagnosed with multiple tumors, prevalence calculations include the first tumor of each cancer type in the previous 5 or 20 years.
For example, a woman was diagnosed with thyroid cancer 9 years ago and breast cancer 3 years ago:
- The thyroid cancer would contribute to the 20-year limited-duration prevalence estimates for all cancer sites and for thyroid cancer.
- The breast cancer would contribute to the 5-year limited-duration prevalence estimate for all cancer sites and both the 5-year and 20-year estimates for breast cancer.
- The breast cancer would not contribute to the 20-year limited-duration prevalence estimate for all cancer sites because the woman is already counted in this estimate for thyroid cancer.
NPCR prevalence proportions
NPCR prevalence proportions were calculated for each combination of age, sex, and race and ethnicity group. For this section of the report, race and ethnicity were categorized as:
- Non-Hispanic White. Cases with unknown race were combined with White race.
- Non-Hispanic Black.
- Non-Hispanic, Indian Health Service-linked American Indian and Alaska Native.
- Non-Hispanic Asian and Pacific Islander.
- Hispanic.
Cancer prevalence counts for the U.S. population
Cancer prevalence counts for the U.S. population as of January 1, 2021, were estimated by multiplying the age-, sex-, and race and ethnicity-specific NPCR prevalence proportions by the corresponding U.S. population estimates. The U.S. population estimates are based on the average of the 2020 and 2021 population estimates from the U.S. Census Bureau. The sum of the counts by race and ethnicity was used to estimate the U.S. cancer prevalence counts for all races combined.3 Cancer prevalence counts and percentages for each of the 43 states by sex and by race and ethnicity were estimated directly in SEER*Stat.
Prevalence percentage
Prevalence percentage is the percentage of the population alive with cancer. The U.S. prevalence percentage estimates are based on the states included in the analysis.
- National Program of Cancer Registries SEER*Stat Database: NPCR Prevalence Analytic file 2001–2020 (43 NPCR central cancer registries). United States Department of Health and Human Services, Centers for Disease Control and Prevention. Released June 2024, based on the 2023 submission.
- Surveillance Research Program, National Cancer Institute SEER*Stat software version 8.4.
- Gail MH, Kessler L, Midthune D, Scoppa S. Two approaches for estimating disease prevalence from population-based registries of incidence and total mortality. Biometrics. 1999;55(4):1137–1144.