What to know
Confidence intervals reflect the range of variation in estimating cancer rates. The Data Visualizations tool uses confidence intervals that are expected to include the true underlying rate 95% of the time.
Width of confidence intervals
The width of a confidence interval depends on the amount of variability in the data. Narrow confidence intervals tend to imply greater certainty in the estimate, while wide confidence intervals tend to imply more variability in the data and could mean there is less certainty.
Sources of variability include the underlying occurrence of cancer as well as uncertainty about when the cancer is diagnosed, when a death from cancer occurs, and when the data about the cancer are sent to the registry or state health department.
In any year when large numbers of a particular cancer are diagnosed or large numbers of cancer patients die, the effects of random variability are small and the confidence interval would likely be narrow. With rare cancers, however, the rates are small and the chance occurrence of more or fewer cases or deaths in a year can affect those rates markedly. Under these circumstances, the confidence interval will be wide to indicate uncertainty or instability in the cancer rate.
The Poisson process
To estimate the extent of this uncertainty, a statistical framework is applied.1 The standard model used for rates for vital statistics is the Poisson process,2 which assigns more uncertainty to rare events relative to the size of the rate than it does to common events.
Parameters are estimated for the underlying disease process. For this report, we estimated a single parameter to represent the incidence rate and its variability. Of note, the Poisson model can estimate separate parameters that represent contributions to the rate from various risk factors, the effects of cancer control interventions, and other attributes of the population risk profile in any year.
Modified gamma intervals
The Data Visualizations tool uses confidence intervals that are expected to include the true underlying rate 95% of the time. The confidence intervals are modified gamma intervals3 computed using SEER*Stat. The modified gamma intervals are more efficient than the gamma intervals of Fay and Feuer4 in that they are less conservative while still retaining the nominal coverage level.
Various factors such as population heterogeneity can sometimes lead to "extra-Poisson" variation in which the rates are more variable than would be predicted by a Poisson model. No attempt was made to correct for this. In addition, the confidence intervals do not account for systematic (in other words, nonrandom) biases in the incidence rates.
Considerations when comparing rates
Using overlapping confidence intervals to determine significant differences between two rates presented in the Data Visualizations tool is discouraged. The practice fails to detect significant differences more frequently than standard hypothesis testing.5
Another consideration when comparing differences between rates is their public health importance. For some rates presented in the Data Visualizations tool, numerators and denominators are large and standard errors are therefore small. This results in statistically significant differences that may be too small to be important for decisions related to population-based public health programs.
- Särndal C-E, Swennson B, Wretman J. Model-Assisted Survey Sampling. New York (NY): Springer-Verlag; 1992.
- Brillinger DR. The natural variability of vital rates and associated statistics. Biometrics. 1986;42(4):693–734.
- Tiwari RC, Clegg LX, Zou Z. Efficient interval estimation for age-adjusted cancer rates. Stat Methods Med Res. 2006;15(6):547–569.
- Fay MP, Feuer EJ. Confidence intervals for directly standardized rates: a method based on the gamma distribution. Stat Med. 1997;16(7):791–801.
- Schenker N, Gentleman JF. On judging the significance of differences by examining the overlap between confidence intervals. Am Stat. 2001;55(3):182–186.