Friday, October 12, 2012: 8:00 PM
6C/6E (WSCC)
In recent years, technological advances have resulted in a huge increment in data production as well in the evolution of methods that facilitate its collection. The data that arrive continuously and massively with infinite tendency are known as data streams. This type of data is coming from sensors, personal bank transactions and automated measuring tools among others. The algorithms for processing this kind of data must provide rapid and real time responses, which implies that they must maintain a decision model all the time. The clustering of data streams by variables finds groups of streams with similar behavior over time. There are a handful of algorithms to perform clustering of data streams, but a comparison of these methods is really scarce. In this work, we have compared two different approaches of algorithms for clustering of data streams by variables: ODAC, a divisive hierarchical algorithm and CORREL that operates over the Sliding Windows model and performs clustering by partitioning. Based on our experimentation on simulated and real world datasets we concluded that ODAC outperforms CORREL because of its speed, independence from the probabilistic distribution of the data streams and precision in obtaining the clusters. However, ODAC requires a large amount of data points to discover the inherent clustering structure.