Saturday, October 29, 2011
Hall 1-2 (San Jose Convention Center)
Timeseries can be similar in shape but differ in length. For example, the sound waves produced by the same word spoken twice have roughly the same shape, but one may be shorter in duration. Stream data mining, approximatequerying of image and video databases, data compression, and near duplicate detection are applications that need to be able to classify or cluster such timeseries, and to search for and rank timeseries that are similar to a chosen timeseries. We demonstrate our approach for clustering and performing similarity search in databases of timeseries data, where the timeseries have high and variable dimensionality. This demonstration uses our Timeseries Sensitive Hashing (TSH) to index the timeseries. TSH adapts Locality Sensitive Hashing (LSH), which is an approximate algorithm to index data points in a d-dimensional space under some (e.g., Euclidean) distance function. TSH, unlike LSH, can index points that do not have the same dimensionality via a generalization of the Dot Product operator. Our experiments show that large multimedia databases containing human activities can be indexed and queried at about 23% faster and with comparable precision than with traditional techniques. As examples of the potential of TSH, the demonstration will index and classify timeseries from an image database and timeseries describing human motion extracted from a video stream and a motion capture system. Our main conclusion is that a fair comparison of variable length timeseries with a Dot Product operator leads to provide faster and more reliable results for video databases containing human activity.