首页 | 人工智能 | 数据挖掘知识 | 相关研究方向 | 编程技术 | 电脑常识 | 互联网资源 | 交流论坛 | 免费书籍资料下载 | 论文下载 | 文档资料 | 在线手册
人工智能: 信息检索 商业智能 搜索引擎技术与新闻 神经网络 生物信息学 模式识别 知识工程 本体理论与方法 机器学习 决策支持 自然语言理解 专家系统 >>更多
数据挖掘知识:
数据挖掘论文 数据挖掘其他 数据挖掘工具与应用 时序模式 相关研究人员主页 相关方向求职招聘信息 文本挖掘 学位论文 异类 预测 web数据挖掘 >>更多
相关研究方向: 联机分析 信息抽取 小波变换 数据仓库 access数据库 DB2数据库 Mysql数据库 Oracle数据库 SqlServer数据库 Sysbase数据库 统计分析 >>更多
主页>相关研究方向>信息抽取>

TIPSTER Text Program

来源: 作者:unkonwn 发布时间:2004-11-24

The TIPSTER Text Program was a Defense Advanced Research Projects Agency (DARPA ) led government effort to advance the state of the art in text processing technologies through the cooperation of researchers and developers in Government, industry and academia. The resulting capabilities were deployed within the intelligence community to provide analysts with improved operational tools. Due to lack of funding, this program formally ended in the Fall of 1998. 字串5

DARPA, the Department of Defense (DoD) and the Central Intelligence Agency (CIA) jointly funded and managed the program, in close collaboration with the National Institute of Standards and Technology (NIST) and the Space and Naval Warfare Systems Center (SPAWAR, or SSC), formerly NCCOSC/NRaD. A TIPSTER Advisory Board was formed in 1998 with members representing users from other Government agencies interested in automated text processing, such as the Department of Energy (DOE), Federal Bureau of Investigation (FBI), Internal Revenue Service (IRS), National Science Foundation (NSF), Treasury Department and other Government agencies. 字串1

In its efforts to improve document processing efficiency and cost effectiveness TIPSTER focused on three underlying technologies. 字串9

  • Document Detection: the capability to locate documents containing the type of information the user wants from either a text stream or a store of documents.
  • Information Extraction: the capability to locate specified information within a text.
  • Summarization: the capability to condense the size of a document or collection while retaining the key ideas in the material

These three capabilities formed the basis for nearly all other information handling tasks.

字串1

TIPSTER Phase I

During the first phase of TIPSTER research efforts, (1991-1994), the participants made major advances in creating the algorithms for document detection and information extraction and in improving the techniques for measuring those advances, through activities such as the Message Understanding Conferences (MUC) and the Text Retrieval Conferences (TREC). Document Detection technologies improved Recall from roughly 30% to as high as 75% and the improvement in the processing of natural language queries was also significant. Improvements in Information Extraction produced increases in Recall from roughly 49% to 65% and in Precision from 55% to 59%, and dramatic gains were made in the ability to automatically identify a wide range of items such as names (both personal and organizational), dates, locations, times, phone numbers, etc.

字串9

TIPSTER Phase II

The TIPSTER research and development community turned its attention to the creation of a software architecture during the second phase, (April 1994-September 1996), in order to standardize the technology components, enable "plug and play" capabilities among the various tools being developed, and permit the sharing of software among the various participants. Based on feedback from the researchers, developers, and users of the existing prototype and implementation systems, the architecture, funding permitted, continued to evolve.

字串3

The Multilingual Entity Task (MET) developed Chinese and Japanese training collectons with over 300 documents in each language. The task was initially confined to Named Entity extraction and the development of a variety of tools such as word boundary finder, part-of-speech tagged Chinese lexicons and dictionaries. 字串6

Various research projects and demonstration systems in support of Document Detection and Information Extraction were also completed. 字串8

TIPSTER Phase III

Phase III started in October 1996 and continued to build on Phase I and II achievements with new projects in supporting research, development and evaluation areas. Also, summarization was added as a fundamental task area. See Phase III Overview 字串5

上一篇:中文搜索引擎技术揭密:网络蜘蛛   下一篇:Generic Information Retrieval System
版权申明:本站信息收集自互联网,仅供学习参考使用。若有违法转摘您的作品请email我们及时删除!  
用户名: 新注册) 密码: 匿名评论 所有评论
评论内容:(不能超过250字,需审核后才会公布,请自觉遵守互联网相关政策法规。
Google
8 热门推荐
  • MUC Evaluations and dataset
  • 信息抽取相关词语定义
  • 什么是信息抽取?
  • Jakarta POI - Java API To Access Microso
  • 什么是信息抽取(Information Extraction )
  • XWRAP Elite Home
  • Webstemmer - How it works?
  • Generic Information Retrieval System
  • Phase III Overview
  • TIPSTER Related Research
  • 8 阅读排行
     
    版权所有:数据挖掘研究院 2004-2006 未经授权禁止复制或建立镜像
    增值电信业务经营许可证编号:皖B2-20040042 文网文:[2005]027号