Expectation Maximization Algorithm

EM Algorithm
Expectation Maximization (EM) algorithm is used to estimate the probability density of a set of given data. In order to model the probability density of the data, Gaussian Mixture Model is used.  The probability density of the data modeled as the weighted sum of a number of gaussian distributions.

During the initialization, the applet reads the data file from the host which is has been downloaded. The URL of the data file is defined in the html source code. Default number of gaussians is also defined in html code.  Number of gaussians for the algorithm can be selected prior to the first execution of the algorithm and cannot be changed there after.

数据挖掘实验室

The initial estimates of the parameters of the gaussian (mean and the variance) is randomly selected. However, during this random guess of the parameters, the range and the variance of the data is used. 数据挖掘论坛

Implementation
EM algorithm is implemented as a class named em and it is capable of dealing with any dimensional data. The em class has two constructors. One constructor accepts a pointer to the data which is stored in a N x d dimensional double array where N denotes the number of data and d represents the dimensionality of the data. The other constructor accepts the number of gaussian components to be used as a parameter as well. If the former  constructor is used, the number of components must be set using the setParameter method of the class.  Once all the parameters set, the em object randomly sets the initial parameters of the gaussian components. The iterate method em uses various methods available in order to find a new set of estimates for the parameters. The iteration process continues until the parameters converge. The various methods used during iteration computes the following functions:  P(j|xn ), P(xn|j), P(j), P(xn ), etc. The details of the calculation of these functions can be found in the source code.  The details of the em algorithm can be found in "Neural Networks for Pattern Recognition" by Christopher M. Bishop.

数据挖掘工具

Although the class em was enough to determine the parameters of the Gaussian mixture model, I have decided to implement the output routines in a class called em_graph derived from class em. The derived class em_graph uses the range information of the data in order to scale and plot the data and the representation of the resultant gaussian parameters. Even tough, I have not tried the em_graph class with a data of higher dimensionality, I am pretty confident that the class is capable of displaying the first to dimension of the data without any problems.

Finally, the applet em_app  reads the data from the file specified in the html source and creates a em_graph object.  The details of this class is not relevant to the EM algorithm, therefore it will not be explained here. 数据挖掘实验室

[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:Automation: How and Why?
下一篇:Expectation Maximization
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • Microsoft 聚类分析算法
  • Microsoft 决策树算法
  • Hidden Markov Model (HMM) Toolbox for Ma
  • 页面定时刷新功能实现
  • 决 策 树
  • Decision support systems applications re
  • Microsoft Naive Bayes 算法
  • Parallel C4.5 (PC4.5)
  • 自动分类在搜索引擎性能优化中的应用
  • 国内首台Cell刀片服务器集群投入运行 中国
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • On the Optimality of the Simple Bayesian
  • Clustering for Collaborative Filtering A
  • Collaborative Filtering with the Simple
  • 自动分类在搜索引擎性能优化中的应用
  • S-PLUS介绍(flash)
  • Creation and manipulation of decision tr
  • Parallel C4.5 (PC4.5)
  • 页面定时刷新功能实现
  • 分类比赛数据集
  • What’s New on the Web? The Evolution of
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静