首页 | 人工智能 | 数据挖掘知识 | 相关研究方向 | 编程技术 | 电脑常识 | 互联网资源 | 交流论坛 | 免费书籍资料下载 | 论文下载 | 文档资料 | 在线手册
人工智能: 信息检索 商业智能 搜索引擎技术与新闻 神经网络 生物信息学 模式识别 知识工程 本体理论与方法 机器学习 决策支持 自然语言理解 专家系统 >>更多
数据挖掘知识:
数据挖掘论文 数据挖掘其他 数据挖掘工具与应用 时序模式 相关研究人员主页 相关方向求职招聘信息 文本挖掘 学位论文 异类 预测 web数据挖掘 >>更多
相关研究方向: 联机分析 信息抽取 小波变换 数据仓库 access数据库 DB2数据库 Mysql数据库 Oracle数据库 SqlServer数据库 Sysbase数据库 统计分析 >>更多
主页>相关研究方向>信息抽取>

MUC Evaluations and dataset

来源: 作者:unkonwn 发布时间:2004-12-10

Since early 1990, the MUC evaluations have been funding the development of metrics and statistical algorithms to support government evaluations of emerging information extraction technologies. In the mid-nineties MUC evaluations began to provide prepared data and task definitions in addition to providing fully automated scoring software to measure machine and human performance. The tasks grew from just production of a database of events found in newswire articles from one source to the production of multiple databases of increasingly complex information extracted from multiple sources of news in multiple languages. The databases now include named entities, multilingual named entities, attributes of those entities, facts about relationships between entities, and events in which the entities participated.

字串3

The results of these evaluations were reported at conferences during the 1990′s where developers and evaluators shared their findings and government specialists described their needs. These conferences were called "Message Understanding Conferences (MUC)" as a results of the use of such technology to process military messages. The multilingual portion was known as "Multilingual Entitity Task (MET)" The proceedings of these conferences have all been published, the last of which appears on this website. All previous proceedings were published in bound form by Morgan Kaufmann Publishers.

字串5

MUC Data Sets

For each evaluation, ground truth had to be established to determine the reliability of the participating systems. Datasets were typically prepared by human annotators for training, dry run test, and formal run test usage. These datasets are now being made available wherever possible on this website.

The texts used for MUC 6 and MUC 7 are copyrighted materials and are only available through the Linguistic Data Consortium (LDC) for a small fee. The texts are available as: newswire articles for MUC-6 (MUC-VI Text Collection), and newswire articles for MUC-7 (North American News Text Corpora).

字串7

Contact the LDC for licensing of the texts and request the public domain prepared datasets used in MUC and the MUC scoring software. The MUC 3 and MUC 4 Data Sets are provided completely free of charge courtesy of FBIS (Federal Broadcast Information Services). The MET 2 Data Sets are provided completely free of charge courtesy of the US Government. They are available here in compressed and TAR′ed format. 字串6

MUC 3 and MUC 4 Data Sets

字串4

MET 2 Data Sets 字串3

Note: If you see the data, rather than a dialog box, then download the file and save it before uncompressing and un TARing the file. 字串5

字串5

上一篇:信息抽取相关词语定义   下一篇:Improving Pseudo-Relevance Feedback in Web Information Retri
版权申明:本站信息收集自互联网,仅供学习参考使用。若有违法转摘您的作品请email我们及时删除!  
用户名: 新注册) 密码: 匿名评论 所有评论
评论内容:(不能超过250字,需审核后才会公布,请自觉遵守互联网相关政策法规。
Google
8 热门推荐
  • 信息抽取相关词语定义
  • 什么是信息抽取?
  • Jakarta POI - Java API To Access Microso
  • 什么是信息抽取(Information Extraction )
  • XWRAP Elite Home
  • Webstemmer - How it works?
  • Generic Information Retrieval System
  • TIPSTER Text Program
  • Phase III Overview
  • TIPSTER Related Research
  • 8 阅读排行
     
    版权所有:数据挖掘研究院 2004-2006 未经授权禁止复制或建立镜像
    增值电信业务经营许可证编号:皖B2-20040042 文网文:[2005]027号