|
|
|||||||||
|
Persons using assistive technology might not be able to fully access information in this file. For assistance, please send e-mail to: mmwrq@cdc.gov. Type 508 Accommodation and the title of the report in the subject line of e-mail. Detecting Elongated Disease ClustersDaniel B. Neill, A. Moore, M. Sabhnani Corresponding author: Daniel B. Neill, Carnegie Mellon University, Department of Computer Science, 5000 Forbes Avenue, Pittsburgh, PA 15213. Telephone: 412-621-2650; Fax: 412-268-5576; E-mail: neill@cs.cmu.edu. Disclosure of relationship: The contributors of this report have disclosed that they have no financial interest, relationship, affiliation, or other association with any organization that might represent a conflict of interest. In addition, this report does not contain any discussion of unlabeled use of commercial products or products for investigational use. AbstractIntroduction: When pathogens are dispersed by wind or water, the resulting disease clusters can be highly elongated in shape, and tests for circular or square regions will have lower power to detect these high-aspect ratio clusters. One possible solution is to search for rectangular clusters by using a variant of Kulldorff's (1997) spatial scan statistic to find the most significant rectangular region and by computing the region's statistical significance (p value) by randomization. However, when data are aggregated to an N x N grid, an exhaustive search would require searching over all O(N4) gridded rectangular regions (both for the original grid and for each Monte Carlo replication). Such a search is computationally infeasible for certain large, real-world data sets. Objectives: This study attempted to accelerate the spatial scan statistic, enabling rapid detection of the most significant rectangular cluster (and its p value) without a loss of accuracy. Methods: A fast spatial scan algorithm was presented that allowed computation of the same region and p value as the exhaustive search approach, but hundreds or thousands of times faster. The algorithm divides the grid into overlapping regions (using a novel overlap-kd tree data structure), bounds the maximum likelihood ratio of subregions contained in each region, and prunes regions that cannot contain the most significant region. The resulting effect was searching over all rectangular regions while only examining a fraction of these. The fast spatial scan was also extended to multidimensional data sets, enabling the application of spatial scan statistics to other domains with more than two spatial dimensions; in addition, these extra search dimensions allowed incorporation of temporal information (allowing fast spatio-temporal cluster detection) and demographic information (e.g., patients' age and sex). Results: The fast spatial scan achieves speedups from 20--2,000 times compared with the exhaustive search approach on real and simulated data sets, including data from emergency department records and over-the-counter (OTC) drug sales. For example, elongated clusters were detected in national OTC data in 47 minutes, compared with 2 weeks for an exhaustive search. Theoretical and empirical results, including preliminary comparisons to Kulldorff's SaTScan software, indicate that the fast spatial scan makes the detection of elongated clusters computationally feasible. Conclusion: In collaboration with the RODS Laboratory at the University of Pittsburgh, the fast spatial scan is being applied to prospective disease surveillance nationwide, using daily OTC drug sale data from the National Retail Data Monitor.
Disclaimer All MMWR HTML versions of articles are electronic conversions from ASCII text into HTML. This conversion may have resulted in character translation or format errors in the HTML version. Users should not rely on this HTML document, but are referred to the electronic PDF version and/or the original MMWR paper copy for the official text, figures, and tables. An original paper copy of this issue can be obtained from the Superintendent of Documents, U.S. Government Printing Office (GPO), Washington, DC 20402-9371; telephone: (202) 512-1800. Contact GPO for current prices. **Questions or messages regarding errors in formatting should be addressed to mmwrq@cdc.gov.Date last reviewed: 8/5/2005 |
|||||||||
|