首页 | 人工智能 | 数据挖掘知识 | 相关研究方向 | 编程技术 | 电脑常识 | 互联网资源 | 交流论坛 | 免费书籍资料下载 | 论文下载 | 文档资料 | 在线手册
人工智能: 信息检索 商业智能 搜索引擎技术与新闻 神经网络 生物信息学 模式识别 知识工程 本体理论与方法 机器学习 决策支持 自然语言理解 专家系统 >>更多
数据挖掘知识:
数据挖掘论文 数据挖掘其他 数据挖掘工具与应用 时序模式 相关研究人员主页 相关方向求职招聘信息 文本挖掘 学位论文 异类 预测 web数据挖掘 >>更多
相关研究方向: 联机分析 信息抽取 小波变换 数据仓库 access数据库 DB2数据库 Mysql数据库 Oracle数据库 SqlServer数据库 Sysbase数据库 统计分析 >>更多
主页>人工智能>机器学习>

KL divergence

来源: 作者:互联网作品 发布时间:2007-05-20

The Kullback and Leibler divergence is a common measure of the “distance” between two probability distributions. It’s central in probabilty based machine learning algorithm. 本文转载自数据挖掘研究院

For instance, when trying to approximate an intractable distribution p(x), we can try minimize KL(p,q) (or KL(q,p)) with q belonging to a particular class of distributions (ex: exponential family).

(KL is used in variational methods and approximate inference message passing algorithms.)

数据挖掘

Another scenario is when we are considering that a “true” distribution p generated some data, and we infer q from our prior and data without knowing p. In this case KL(p,q) measure how close we are from the true p. This can help to derive some learning bounds and estimator proprieties.

商业智能

But why using this divergence (which isn’t a real distance)? Why not a more classical L^2 distance ? Or Chi-square ? 搜索引擎

There are several leads:

  • Information geometry: KL is a special case of delta-divergences (comming from delta-connection) These divergences have the great advantage to be invariant by reparametrisation.
  • Information theory: KL can be seen as the amount of information (in bits) missing to q in order to specify p. (conditional entropy). It is the average “surprise” of a incoming message drawn from q when you expect it arrived form p.
  • Bayesian theory: KL minimisation can be “derived” from log-likelihood maximisation.

Invariance seems to be the more general requirement. Closeness beetween distribution should not depend on parametrisation and base measure. 数据仓库

Moreover the delta divergence point of vue gives us a better understanding of different appromate inference algorithm. Belief propagation, expectation propagation, variational bayes, mean field, tree reweighted belief propagation, power expectation propagation, generalised belief propagation are unified with delta-divergences. (this paper).

搜索引擎

The information theory justification seems weaker to me because it takes place in a theory of communication, requiring an emmitter, a channel and a receiver. However it’s intuitive and expressing divegence in bits shows the parametrisation independence.
Finally I’m still not sure of my derivation from Log Likelihood minimisation, especially in the continous case. 数据挖掘

上一篇:Google - Freedom with Speech   下一篇:Judge denies Arizona attorney general's request for Western Union's data
版权申明:本站信息收集自互联网,仅供学习参考使用。若有违法转摘您的作品请email我们及时删除!  
用户名: 新注册) 密码: 匿名评论 所有评论
评论内容:(不能超过250字,需审核后才会公布,请自觉遵守互联网相关政策法规。
Google
8 热门推荐
  • The 3rd International Conference on Larg
  • A satisfied customer
  • 通用类和函数
  • Normal Bayes 分类器
  • K近邻算法
  • 支持向量机算法及其代码实现
  • Decision Trees算法及其代码实现
  • Boosting算法及其代码实现
  • Programming Languages for Machine Learni
  • Special session on feature selection and
  • 8 阅读排行
     
    版权所有:数据挖掘研究院 2004-2006 未经授权禁止复制或建立镜像
    增值电信业务经营许可证编号:皖B2-20040042 文网文:[2005]027号