RSS
热门关键字:  数据挖掘  数据仓库  商业智能  人工智能  搜索引擎
当前位置 :| 首页>人工智能>信息检索>

The Krovetz Stemmer

来源: 作者:unkonwn 时间:2004-11-29 点击:
The Krovetz Stemmer was developed by Bob Krovetz, at the University of Massachusetts, in 1993. It is quite a ′light′ stemmer, as it makes use of inflectional linguistic morphology.

The area of morphology (the internal structure of words) can be broken down into two subclasses, inflectional and derivational. Inflectional morphology describes predictable changes a word undergoes as a result of syntax (the plural and possessive form for nouns, and the past tense and progressive form for verbs are the most common in English). These changes have no effect on a word’s ‘part-of-speech’ (a noun still remains a noun after pluralizations). In contrast, changes of derivational morphology may or may not affect a word’s meaning (e.g.; ‘-ise’, ‘-ship’). Although English is a relatively weak morphological language, languages such as Hungarian and Hebrew have stronger morphology where thousands of variants may exist for a given word. In such a case the retrieval performance of an IR system would be severely be impacted by a failure to deal with such variations.

数据挖掘研究院

The Krovetz Stemmer effectively and accurately removes inflectional suffixes in three steps, the conversion of a plural to its single form (e.g. ‘-ies’, ‘-es’, ‘-s’), the conversion of past to present tense (e.g. ‘-ed’), and the removal of ‘-ing’. The conversion process firstly removes the suffix, and then though a process of checking in a dictionary for any recoding (also being aware of exceptions to the normal recoding rules), returns the stem to a word. The low level of strength with the English language due to nature of the Stemmer, causes issues with its usage within the field of IR, where an increased level of strength and index compression may be sought. For this reason, this Stemmer is frequently used in conjunction with other Stemmers, making use of the advantage of the accuracy of removal of suffixes by this Stemmer, which then adds the compression of another Stemmer, such as the Paice/Husk Stemmer or Porter Stemmer. 数据挖掘研究院

R. Krovetz, 1993: "Viewing morphology as an inference process," in R. Korfhage et al., Proc. 16th ACM SIGIR Conference, Pittsburgh, June 27-July 1, 1993; pp. 191-202.

  数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?