RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论

BIRCH:A New Data Clustering Algorithm and Its Applications

来源: 作者:unkonwn 时间:2004-11-30 点击:

Abstract:Data clustering is an important technique for exploratory data analysis, and has been studied for several years.It has been shown to be useful in many practical domains such as data classification and image processing. Recently, there has been a growing emphasis on exploratory analysis of very large datasets to discover useful patterns and fior correlations among attributes .This is called data mining  and data clustering is regarded as a particular branch  However existing data clustering methods do not adequately address the problem of processing large datasets with a limited amount of resources (e.g., memory and cpu cycles). So as the dataset size increases, they do not scale up well in terms of memory requirement running time, and result quality. 数据挖掘实验室

In this paper,an efficient and scalable data clustering method is proposed, based on a new in memory data structure called CF-tree, which serves as an in memory summary of the data distribution.We have implemented it in a system called BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies ) and studied its performance extensively in terms of memory requirements , running time, clustering quality, stability and scalability;we also compare it with other available methods  Finally, BIRCH is applied to solve two real life problems  one is building an iterative and interactive pixel classiffication tool, and the other is generating the initial codebook for image compression 

数据挖掘研究院


Keywords: Very Large Databases,Data Clustering,Incremental Algorithm,Data Classiffication
and Compression 数据挖掘研究院

资料全文下载

数据挖掘实验室

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?