Evaluating Change Detection in Data Streams

Saturday, October 29, 2011
Hall 1-2 (San Jose Convention Center)
Gina-Maria Pomann, BA , Statistics, North Carolina State University , Raleigh, NC
Tamraparni Dasu , AT&T Labs, Florham Park
Shankar Krishnan , AT&T Labs, Florham Park
Data streams are rapidly accumulating data sets that are used for real time decision making and thus pose computational challenges. Examples include telecommunications data, financial ticker streams, and network polling data. In order to detect that an important event has occurred, it is often of interest to determine if the generating distribution of a data stream has changed.  For example, medical data from patients accumulated over time can be used as an indicator for a disease outbreak.  Change detection algorithms for data streams typically return binary decisions of “Change” or “No Change”. However, binary responses provide no additional information about the properties of an algorithm such as sensitivity to different types of changes, or stability with respect to small perturbations in the distribution. Therefore, we propose a rigorous, objective performance measure, streaming power, to evaluate and identify desirable properties that an effective change detection algorithm should have. In doing so, we provide the user with a framework that enables them to compare different algorithms and choose the one that best meets their needs. The change of distribution in data streams is modeled using a simple mixture model which provides a direct methodology for computing the streaming power of a change detection algorithm. We state the theoretical properties of streaming power and use this to define a sensitivity measure. Using our methodology and simulated data examples, three well known change detection algorithms are compared.