RSS
热门关键字:  数据挖掘  数据仓库  商业智能  人工智能  搜索引擎
当前位置 :| 首页>人工智能>机器学习>

Machine learning and generalisation

来源: 作者: 时间:2007-11-19 点击:
Machine learning is research domain that has provided the world with cell phones with voice recognition capabilities, recommending systems on websites selling books or dvd’s, fingerprint identification systems, spam detection, and many others. It can be seen as positioned at the intersection of computer science, applied mathematics and statistics, sharing concepts with artificial intelligence and information theory.

When are machines (read computers) said to learn ?

数据挖掘研究院

Machine learning is concerned with computer programs that allow better performances as they gain experience. Hence, lauching twice the same program might give different results if a learning process has taken place. Such software provides with betters answers as more information is fed into it. As for humans, computers can either learn 'by heart', like short-term learning people tend to resort to before exams, or they can learn 'in the long term', being able to infer new knowledge from known facts. This is called generalisation.

By-heart learning

数据挖掘实验室

Computer learn 'by heart' when they need new information to address new situations. The typical example is the anti-virus software. Anti-virus software are better as more information, in this case virus signatures, is provided. When the software downloads, periodically, news virus signatures, or definitions, it is able to spot and eliminate more and more viruses. It is therefore learning in the sense we juste defined. 数据挖掘研究院

It is however learning 'by heart' as it can now only detect viruses for which a signature was provided. It is not able to detect and remove a new virus without downloading its corresponding signature file. It the signature is known, the virus will be detected 100% of the time.

Other examples of by-heart learning are the auto-completion capabilities of most web browsers, and Windows' Start menu, which proposes a list of frequently used softwares, depending on the user.

数据挖掘研究院

Generalisation

数据挖掘研究院

Computer programs are said to generalize when they are able to deal with new situations without the need for new information. The typical example is the anti-spam software. Spam filters get better and better as more examples of both spam and legitimate emails are provided to them, increasing their percentage of correct classification. 数据挖掘研究院

Spam filter are indeed able to generalize the concept of spam. When a new email arrives, most probably different from any other email already received, the spam filter estimates the degree of probability that this particular email is a spam, without the need for a specific signature file describing all possible spam messages, as is the case with viruses. It is therefore sometimes making mistakes, marking as junk a legitimate email, or conversely marking as legitimate a spam email. But mistakes get fewer as more and more example are provided.

数据挖掘研究院

Other examples include recommendation systems for online shops, and optical character recognition software.

Reasoning by analogy and inductive reasoning

Generalisation corresponds thus to the process of reasoning by analogy, and inductive reasoning. New elements can be processed even though the software was not told explicitely how to process them. Spam filters compare new emails with past email which have been confirmed by the user as spam, and decided, based on the similarities, to classify the new email as junk or not. Spam filters furhtermore build dictionaries of words that often appear in spam messages and use this dictionary, which can be different from one user to another, to estimate the degree of 'spamness' of an email.

Conclusions 数据挖掘实验室

By-heart learning is only concerned with reasoning by analogy, but only with situations which have already been encountered, or which have been explicitely described. 数据挖掘实验室

Generalisation is thus a very interesting ability to achieve, but also, of course a very challenging mathematical and computational problem. 数据挖掘研究院

Interested readers can refer to the following books: 数据挖掘研究院

machine learning in general : Machine Learning, Tom Mitchell, McGraw Hill, (1997)generalisation : Statistical Learning Theory, Vladimir Vapnik, Wiley-Interscience, (1998),

数据挖掘实验室

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?