The algorithms developed in the context of the FIRST project will then be applied and extended in order to detect asteroids in the MACHO data set, as well as previously unknown correlations between spatial and temporal phenomena. The data from the MACHO project exceeds eight terabytes. The sheer size of this data makes it an effective test-bed for our research in large-scale data mining.  We are also working with climate data, both observational and simulated, to study the effects of sub-sampling on the conclusions drawn from the data.

数据挖掘研究院

We expect that our work in these applications will help answer several of the open research questions in the area of data mining and pattern recognition for large, complex, multi-dimensional data. As we extend this work to other applications, we hope that we can successfully address the issue of data overload, and help scientists to explore and understand their data in an effective and efficient manner. 数据挖掘交友

Data mining is an interactive and iterative process involving data pre-processing, search for patterns, knowledge evaluation, and possible refinement of the process based on input from domain experts or feedback from one of the steps. The pre-processing of the data is a time-consuming, but critical, first step in the data mining process. It is often domain and application dependent; however, several techniques developed in the context of one application or domain can be applied to other applications and domains as well. The pattern recognition step is usually independent of the domain or application

数据挖掘实验室

Large-scale scientific data mining is a field very much in its infancy, making it a source of several open research problems. In order to extend data mining techniques to large-scale data, several barriers must be overcome. The extraction of key features from large, multi-dimensional, complex data is a critical issue that must be addressed first, prior to the application of the pattern recognition algorithms. The features extracted must be relevant to the problem, insensitive to small changes in the data, and invariant to scaling, rotation, and translation.  In addition, we need to select discriminating features through appropriate dimension reduction techniques. The pattern recognition step poses several challenges as well. For example, is it possible to modify existing algorithms, or design new ones, that are scalable, robust, accurate, and interpretable? Further, can these algorithms be applied effectively and efficiently to complex, multi-dimensional data? And, is it possible to implement these algorithms efficiently on large-scale multiprocessor systems so that a scientist can interactively explore and analyze the data?

数据挖掘工具

While these problems must be overcome for large-scale data mining to be applied in any domain, certain additional concerns must be addressed for scientific data. For example, data from science applications are often available as images, a format that is known to pose serious challenges in the extraction of features. Further, problems in knowledge discovery may be such that the class of interest occurs with low probability, making random sampling inapplicable and traditional clustering techniques ineffective.  In many cases, there may be a scarcity of labeled data in a classification problem and several iterations of the data mining process may be required to obtain a reasonable sized training set.  Some applications, such as remote sensing, may need data fusion techniques to mine the data collected by several different sensors, at different resolutions. Another key feature in which data mining applied to science applications differs from its commercial counterpart is that high accuracy and precision are required in prediction and description in order to test or refute competing theories. These problems, specific to scientific data sets, preclude the direct application of software and techniques that have been developed for commercial applications. 数据挖掘实验室

Our approach to scaling data mining and pattern recognition algorithms to large, complex, multi-dimensional data addresses each of the steps in the data mining process. Specifically, our research focus includes: 数据挖掘交友

  • Image processing techniques, including wavelets, for feature extraction
  • Dimension reduction techniques to handle multi-dimensional data
  • Scalable algorithms for classification and clustering
  • Parallel implementations for interactive exploration of data
  • Applied statistics to ensure that the conclusions drawn from the data are statistically sound
We are designing and implementing a flexible object-oriented software infrastructure to implement our algorithms. This will enable scientists in a variety of disciplines to experiment with various algorithms, fine-tune an algorithm to a problem,  and handle growing data sets.

Our work on data mining and pattern recognition algorithms can be applied to many domains. As an initial test-bed application, we have selected the data from the MACHO and FIRST projects. Working with the scientists from the FIRST project, we are developing algorithms to automatically detect radio-emitting galaxies with a bent-double morphology. Our research in this domain is addressing the important question of feature extraction from complex image data. 数据挖掘研究院
 

数据挖掘交友

Mission:

The Center for Applied Scientific Computing at the Lawrence Livermore National Laboratory is developing scalable algorithms for the interactive exploration of large, complex, multi-dimensional scientific data. By applying and extending ideas from data mining and pattern recognition, we are developing a new generation of computational tools and techniques that are being used to improve the way in which scientists extract useful information from data.

Background:

Our ability to generate data far outstrips our ability to explore, analyze, and understand it. Advances in technology have enabled scientists to gather data from experiments, simulations, and observations at an ever-increasing pace. Data that was measured in gigabytes until recently, is now being measured in terabytes, and will soon approach the petabyte range. Often, the data is complex, available either as time-series data, or as images. In order to achieve our scientific goals, we need to fully exploit this data by extracting all the useful information from it. Unfortunately, the size and complexity of the data in many scientific domains is such that it is impractical to manually analyze, explore, and understand the data. As a result, useful information is often overlooked, and the potential benefits of increased computational and data gathering capabilities are only partially realized.

To solve this problem, the Center for Applied Scientific Computing (CASC) at the Lawrence Livermore National Laboratory (LLNL) is developing a new generation of computational tools and techniques to help automate the exploration and analysis of large scientific data sets. By applying and extending ideas from the area of data mining,  we hope to improve the way in which scientists interact with large, multi-dimensional, time-varying data. These techniques will help us to automatically identify patterns in the data, making it possible for scientists to interactively explore the areas of interest in the data. 数据挖掘论坛

Applications:

Data mining techniques can be applied to data gathered from simulations, experiments, or observations in various scientific domains. The tera-scale computing environment at LLNL has enabled the simulation of increasingly complex phenomena leading to the generation of vast quantities of data. These simulations play a key role in areas such as nuclear weapons stockpile stewardship, where computer simulations have replaced experiments, and climate modeling, where experiments are impractical or unwise. In order to help the scientists understand the output from such simulations, visualization techniques are frequently used to display the data. Often, the size of the data is such that visualization, by itself, is not sufficient. By coupling visualization with data mining techniques, it would be possible to allow interactive display of only those areas that are of interest to the scientist, enabling faster exploration of the output data. This would not only help in understanding the output from a single simulation, but also help in comparing the output from ensembles of simulations, or comparing simulations with experiments, or controlling the simulations interactively. 数据挖掘实验室

In addition to data from computer simulations, data mining techniques can also be very useful in domains such as astrophysics, where vast quantities of data are gathered during surveys of the sky. Frequently, this data is analyzed manually, making the results from a survey very subjective. The use of automated techniques can help bring objectivity to such data analysis. In addition, data that was originally obtained for one purpose can now be analyzed using pattern recognition techniques to detect previously unknown patterns in the data. This application of knowledge discovery would help address the concerns voiced by astrophysicists that the sheer size of their data, and the consequent difficulties in analyzing it, has resulted in the loss of serendipitous discoveries that were so vital to progress in the area in the past. 数据挖掘交友

The tools and techniques developed in the areas of data mining and pattern recognition are applicable to many scientific domains, including verification and validation, visualization, computational steering, remote sensing, medical imaging, genomics, climate modeling, astrophysics etc.

Research Approach

Data mining is a process concerned with uncovering patterns, associations, anomalies, and statistically significant structures and events in data. It can not only help us in knowledge discovery, that is, the identification of new phenomena, but it is also useful in enhancing our understanding of known phenomena. One of the key steps in data mining is pattern recognition, namely, the discovery and characterization of patterns in image and other high-dimensional data. A pattern is defined as an arrangement or an ordering in which some organization of underlying structure can be said to exist. Patterns in data are identified using measurable features or attributes that have been extracted from the data.
 
数据挖掘工具

Sapphire: Large Scale Data Mining and Pattern Recognition

 
 
 
Figure: Data mining: an iterative and interactive process
 
 
Images of radio-emitting galaxies with Bent Double morphology.
 
[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:”Findability”: The Key to Enterprise Search
下一篇:一种新的Web用户行为模式挖掘算法的研究
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • :::数据挖掘未来研究方向:::
  • :::数据挖掘常用技术:::
  • :::数据挖掘研究内容和本质:::
  • :::数据挖掘的功能:::
  • 数据挖掘测试数据集大全
  • :::数据挖掘的研究历史和现状:::
  • Making the Most of Operational Analytics
  • 近期与数据挖掘相关的一些重要会议的截止日
  • :::数据挖掘热点:::
  • 韩家炜的论文下载
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • 从影响圈到关注圈,从数据挖掘到价值挖掘
  • SAS Updates BI Products
  • Call for Papers & Invited Session Propos
  • IEEE Intelligent Systems Special Issue
  • 近期与数据挖掘相关的一些重要会议的截止日
  • Data mining program near rock bottom
  • IDC Names Oracle as Leader in Data Wareh
  • Characterizing the Function Space for Ba
  • German scientists develop software to re
  • deviantART.com Web Application Software
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静