RSS
热门关键字:  数据挖掘  数据仓库  商业智能  人工智能  搜索引擎
当前位置 :| 首页>数据挖掘知识>异类>

Algorithms for Mining Distance-Based Outliers in Large Datas

来源: 作者:unkonwn 时间:2004-12-04 点击:

This paper deals with finding outliers (exceptions)
in large, multidimensional datasets.
The identification of outliers can lead to the
discovery of truly unexpected knowledge in areas
such as electronic commerce, credit card
fraud, and even the analysis of performance
statistics of professional athletes. Existing
methods that we have seen for finding outliers
in large datasets can only deal efficiently
with two dimensions/attributes of a dataset.
Here, we study the notion of DB- (Distance- 数据挖掘研究院

Based) outliers. While we provide formal and
empirical evidence showing the usefulness of 数据挖掘研究院

DB-outliers, we focus on the development of
algorithms for computing such outliers.
First, we present two simple algorithms, both
having a complexity of O(k N

2

), k being the
dimensionality and N being the number of objects
in the dataset. These algorithms readily
support datasets with many more than
two attributes. Second, we present an optimized
cell-based algorithm that has a complexity
that is linear w...

数据挖掘研究院

 

数据挖掘研究院

资料全文下载

  数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?