This paper deals with finding outliers (exceptions)
in large, multidimensional datasets.
The identification of outliers can lead to the
discovery of truly unexpected knowledge in areas
such as electronic commerce, credit card
fraud, and even the analysis of performance
statistics of professional athletes. Existing
methods that we have seen for finding outliers
in large datasets can only deal efficiently
with two dimensions/attributes of a dataset.
Here, we study the notion of DB- (Distance- 数据挖掘研究院
Based) outliers. While we provide formal and
empirical evidence showing the usefulness of 数据挖掘研究院
DB-outliers, we focus on the development of
algorithms for computing such outliers.
First, we present two simple algorithms, both
having a complexity of O(k N
2
), k being the
dimensionality and N being the number of objects
in the dataset. These algorithms readily
support datasets with many more than
two attributes. Second, we present an optimized
cell-based algorithm that has a complexity
that is linear w...

