Objectives: The objective of this project was to develop a scalable spatio-temporal data mining framework for fraud detection. To accomplish this, we (a) developed new algorithms to detect anomalies (outliers) based on spatio-temporal context, (b) adapted sophisticated partitioning methods for parallelization to achieve scalability, and (c) applied pruning strategy to improve efficiency.
Methods: The methods we used in this spatio-temporal data mining approach for fraud detection included new algorithms specially designed for spatial/temporal data (SpatioTemporal Local Outlier Factor (ST-LOF), Spatio-Temporal Locality Density Based Clustering for Applications with Noise (ST-LDBSCAN)), sophisticated partitioning methods (Randomized Partitioning and Overlapped Time Frame Portioning), and pruning strategy to improve performance. We also developed simple visualization display of data point clustering and anomalies. We also test ran our system against different datasets in different use cases.
Results: The proposed approach was able to detect anomalies according to spatio-temporal context and locality considerations with excellent recall and satisfactory precision. Experiments on both synthetic and real (Buoy and Medicare) data sets approved and showed the scalability of the approach to big data sets. The key for this scalability was parallelization and pruning strategies involved.
Conclusions: The proposed approach is promising to the tasks of fraud (anomaly) detection in the spatio-temporal applications using unsupervised manner. It has an intuitive notion of defining anomalies (outliers) which considers spatio-temporal and also the locality concept for evaluating outliers. It showed high scalability to bigger data set sized. The best handling for categorical attributes is still not addressed. The applicability to map-reduce frameworks is also left as future work.