|
Since early 1990, the MUC evaluations have been funding the development of metrics and statistical algorithms to support government evaluations of emerging information extraction technologies. In the mid-nineties MUC evaluations began to provide prepared data and task definitions in addition to providing fully automated scoring software to measure machine and human performance. The tasks grew from just production of a database of events found in newswire articles from one source to the production of multiple databases of increasingly complex information extracted from multiple sources of news in multiple languages. The databases now include named entities, multilingual named entities, attributes of those entities, facts about relationships between entities, and events in which the entities participated. 字串3
The results of these evaluations were reported at conferences during the 1990′s where developers and evaluators shared their findings and government specialists described their needs. These conferences were called "Message Understanding Conferences (MUC)" as a results of the use of such technology to process military messages. The multilingual portion was known as "Multilingual Entitity Task (MET)" The proceedings of these conferences have all been published, the last of which appears on this website. All previous proceedings were published in bound form by Morgan Kaufmann Publishers. 字串5
MUC Data Sets
For each evaluation, ground truth had to be established to determine the reliability of the participating systems. Datasets were typically prepared by human annotators for training, dry run test, and formal run test usage. These datasets are now being made available wherever possible on this website.
The texts used for MUC 6 and MUC 7 are copyrighted materials and are only available through the Linguistic Data Consortium (LDC) for a small fee. The texts are available as: newswire articles for MUC-6 (MUC-VI Text Collection), and newswire articles for MUC-7 (North American News Text Corpora). 字串7
Contact the LDC for licensing of the texts and request the public domain prepared datasets used in MUC and the MUC scoring software. The MUC 3 and MUC 4 Data Sets are provided completely free of charge courtesy of FBIS (Federal Broadcast Information Services). The MET 2 Data Sets are provided completely free of charge courtesy of the US Government. They are available here in compressed and TAR′ed format. 字串6
MUC 3 and MUC 4 Data Sets 字串4
MET 2 Data Sets 字串3
Note: If you see the data, rather than a dialog box, then download the file and save it before uncompressing and un TARing the file. 字串5
字串5
|