RSS
热门关键字:  数据挖掘  数据仓库  商业智能  人工智能  搜索引擎

Database Primitives for Spatial Data Mining

来源: 作者:unkonwn 时间:2004-12-11 点击:

The computerization of many business and government transactions and the advances in scientific data collection tools provide us with a huge and continuously increasing amount of data. This explosive growth of databases has far outpaced the human ability to interpret this data, creating an urgent need for new techniques and tools that support the human in transforming the data into useful information and knowledge. Knowledge discovery in databases (KDD) has been defined as the non-trivial process of discovering valid, novel, and potentially useful, and ultimately understandable patterns from data [FPS 96]. The process of KDD is interactive and iterative, involving several steps. In particular, data mining is the step of applying appropriate algorithms that, under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data. 数据挖掘研究院

Spatial Database Systems (SDBS) (see [Gue 94] for an overview) are database systems for the management of spatial data. To find implicit regularities, rules or patterns hidden in large spatial databases, e.g. for geo-marketing, traffic control or environmental studies, spatial data mining algorithms are very important (see [KHA 96] for an overview of spatial data mining). 数据挖掘研究院

In [LHO 93], attribute-oriented induction is performed by using (spatial) concept hierarchies to discover relationships between spatial and non-spatial attributes. A spatial concept hierarchy represents a successive merging of neighboring regions into larger regions. In [NH 94], the clustering algorithm CLARANS, which groups neigboring objects automatically without a spatial concept hierarchy, is combined with attribute-oriented induction on non-spatial attributes. [KH 95] introduces spatial association rules which describe associations between objects based on different spatial neighborhood relations. [KN 96] present algorithms to detect properties of clusters using reference maps and thematic maps. For instance, a cluster may be explained by the existence of certain neigboring objects which may “cause” the existence of the cluster. New algorithms for spatial classifcation and spatial trend analysis are sketched in [EKS 97] and elaborated in [EFKS 98]. For spatial classification it is important that class membership of a database object is not only determined by its non-spatial attributes but also by the attributes of objects in its neighborhood. In spatial trend analysis, patterns of change of some non-spatial attribute(s) in the neighborhood of some database object are determined. 数据挖掘研究院

We argue that data mining algorithms should be integrated with existing DBMS, i.e. they should not run on separate files but they should run directly on a database. Thus, redundant storage and potential inconsistencies can be avoided. Furthermore, the query operations provided by a DBMS may be used, for example, to select subsets relevant for data mining or to support the user in evaluating the discovered patterns.

In this paper, we introduce a set of database primitives for mining in spatial databases (see [EKS 97] for a first outline). These primitives are sufficient to express most of the algorithms for spatial data mining from the literature, in particular they can express the algorithms reviewed above. We present techniques for efficiently supporting these primitives by a DBMS. [AIS 93] follows a similar approach for mining in relational databases. The use of these database primitives will enable the integration of spatial data mining with existing DBMS’s and will speed-up the development of new spatial KDD algorithms. The rest of the paper is organized as follows. In section 2, several types of neighborhood relations are introduced. Based on neighborhood relations, neighborhood graphs and their operations as well as some important filters are defined in section 3. We present our approach to efficiently support the database primitives by a spatial DBMS in section 4. An extensive performance evaluation is presented in section 5. In section 6 a short summary and some directions for future research are given. 数据挖掘研究院

资料全文下载

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?