RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论
当前位置 :| 首页>人工智能>信息检索>

Content(2)

来源: 作者:unkonwn 时间:2004-12-01 点击:

4.2.3  Specialized registries

4.3  Full-Text

4.3.1  Periodicals

(4/17/05)  PubMed Central (PMC) has a more direct URL since the book was published: http://pubmedcentral.gov/ .  PMC continues to grow, albeit slowly.  It contains articles from nearly 200 journals as of April, 2005.  The content can be accessed either by searching or browsing from the PMC site or via linkages from MEDLINE records displayed in PubMed.  The rules for journals joining PMC continue to evolve as well, with the latest instructions at:
http://pubmedcentral.gov/about/pubinfo.html

NLM and PMC have attempted to bring more standardization to electronic journal publishing with a new Archiving and Interchange Document Type Definition  (DTD) (http://dtd.nlm.nih.gov/).  This provides a standard way to format content for NLM databases in XML.  A related Journal Publishing DTD is optimized for authoring and initial XML tagging of journal material.  Likewise, a PubMed Journal Article DTD has been created for the submission of citations and abstracts for MEDLINE/PubMed and a Book DTD has been developed for the NCBI Bookshelf.

Another effort of PMC is to scan back issues of the included journals ( http://www.pubmedcentral.gov/about/scanning.html ).  The scanned pages for each article are combined into a single PDF file.  The text has optical character recognition (OCR) applied for searching, although OCR errors are not corrected.

PMC can now be searched directly not only through its own interface, but also in PubMed by selecting PMC (as opposed to PubMed) from the topmost drop-down menu of PubMed.

(4/17/05)  The full text of the biomedical informatics literature is increasingly available for free.  AMIA has made its proceedings from 1997 to 2003 (there was no AMIA conference in 2004 due to AMIA hosting MEDINFO 2004) at:
http://www.amia.org/pubs/proceedings/symposia/start.html
In addition, the journals JAMIA, JMLA, and BMC Medical Informatics and Decision Making are available in PMC.  As noted in the book, an important source of information about bioinformatics databases and systems comes from the annual database issue of Nucleic Acids Research.  The publisher of this journal, Oxford Journals, has made this issue freely available under an open access model.  In fact, in 2005, the entire journal adopted an open access model.  The most recent database issues can be accessed at the following URLs: 数据挖掘研究院
(4/18/04)  The coalescence of the Elsevier publishing empire has allowed the company to merge the content from the 1,800+ scientific journals it publishes into a single database (and search system) called Science Direct (http://www.sciencedirect.com).

4.3.2  Textbooks

(4/17/05)  Another large collection of on-line textbooks is the NCBI Bookshelf (http://www.ncbi.nih.gov/entrez/query.fcgi?db=Books).  Part of the NCBI Entrez system, this resource provides access to the full text of over a dozen commerically published textbooks.  Most of the books cover topics in cellular and molecular biology, such as: 数据挖掘实验室
  • Bast, R., Kufe, D., et al., eds. (2003). Cancer Medicine (Sixth Edition). Hamilton, ON, Canada. BC Decker.
  • Cooper, G. (2000). The Cell - A Molecul ar Approach (Second Edition) . Sunderland, MA. Sinauer Associates, Inc.
  • Griffiths, A., Miller, J., et al. (1999). Introduction to Genetic Analysis (Seventh Edition). New York. W. H. Freeman & Co.
One book within this collection is particularly relevant to health and biomedical IR:  The NCBI Handbook.  This book has 23 chapters on a variety of topics relevant to NCBI databases:
  • Part 1. The Databases
    • GenBank: The Nucleotide Sequence Database
    • PubMed: The Bibliographic Database
    • Macromolecular Structure Databases
    • The Taxonomy Project
    • The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation
    • The Gene Expression Omnibus (GEO): A Gene Expression and Hybridization Repository
    • Online Mendelian Inheritance in Man (OMIM): A Directory of Human Genes and Genetic Disorders
    • The NCBI BookShelf: Searchable Biomedical Books
    • PubMed Central (PMC): An Archive for Literature from Life Sciences Journals
    • The SKY/CGH Database for Spectral Karyotyping and Comparative Genomic Hybridization Data
    • The Major Histocompatability Complex Database, dbMHC
  • Part 2. Data Flow and Processing
    • Sequin: A Sequence Submission and Editing Tool
    • The Processing of Biological Sequence Data at NCBI
    • Genome Assembly and Annotation Process
  • Part 3. Querying and Linking the Data
    • The Entrez Search and Retrieval System
    • The BLAST Sequence Analysis Tool
    • LinkOut: Linking to External Resources from Entrez Databases
    • The Reference Sequence (RefSeq) Project
    • Entrez Gene: A Directory of Genes
    • Using the Map Viewer to Explore Genomes
    • UniGene: A Unified View of the Transcriptome
    • The Clusters of Orthologous Groups (COGs) Database: Phylogenetic Classification of Proteins from Complete Genomes
  • Part 4. User Support
    • User Services: Helping You Find Your Way
    • Exercises: Using Map Viewer
Another book produced by NCBI and available in this collection is Genes and Disease, which provides a description of the roles genes play in a variety of human diseases.  Also fashioned into a book for this collection is the Health Services/Technology Assessment Text (HSTAT) database, which contains all of the evidence reports, technology assessment, and practice guidelines of the Agency for Healthcare Research & Quality (AHRQ, http://www.ahrq.gov).

数据挖掘研究院



A growing number of textbooks are being converted to PDA format.  The largest vendors of PDA-based medical textbooks include Skyscape (www.skyscape.com) and Unbound Medicine (www.unboundmedicine.com).

4.3.3  Web sites

(4/17/05)  Another type of textual Web resource that has grown substantially since publication of the second edition of the book is the Wiki, or free encyclopedia.  Wikis allow any indivudual in a community to write or edit an entry.  This allows massive distributed and collaborative work to be done.  For example, the prototype Wiki, the Wikipedia, has over one million entries (over 500,000 in English).  However, the distributed approach is a double-edged source, with no guarantee of authority or accuracy for any topic (Terdiman, 2005), leading one author to describe it as a "faith-based encyclopedia" (McHenry, 2004).  As with all information on the Web, reader discretion must be advised!

McHenry, R. (2004). The Faith-Based Encyclopedia. Tech Central Station. November 15, 2004. http://www.techcentralstation.com/111504A.html.
Terdiman, D. (2005). Wikipedia Faces Growing Pains. Wired News. January 10, 2005. http://www.wired.com/news/print/0,1294,66210,00.html.

(4/19/04)  Almost all of the URLs listed for clinical practice guidelines in the book have changed since publication:
Other collections of clinical practice guidelines include:

数据挖掘研究院


Other URLs in this section have changed as well:
(4/17/05)  A growing type of Web content is the weblog or blog (Johnson, 2002).  A blog is essentially a running commentary on a topic maintained by a person or community.  While probably less widespread for biomedical topics, blogs are extremely popular in the political realm.  They are also popular in virtual communities with an interest in a diversity of topics. 数据挖掘研究院

One of the interesting effects is blogs is their impact on the Google PageRank searching algorithm.  When words are repeatedly linked to a specific Web site, they can cause that Web site to rise up in Google′s search rankings.  A well-known example of this is the search "miserable failure," which those opposed to the policies of George W. Bush were able to associate with links to his biography.  (Bush′s biography ranks at the top of Google output for the search miserable failure.)  Some call this activity "hacktivism" (Denning, 1998).  This aspect of Google′s behavior has not been without controversy, e.g., Google′s placing of an anti-Semitic Web site at the top of its rankings when the word "jew" is entered (see http://www.google.com/explanation.html).  The role of blogs in the political process has received a great deal of attention (e.g., http://www.washingtonpost.com/wp-adv/marketing/blog/).

数据挖掘研究院



Denning, D. (1998). Activism, hacktivism, and cyberterrorism:  the Internet as a tool for influencing foreign policy. The Internet and International Systems:  Information Technology and American Foreign Policy Decisionmaking, San Francisco, CA. http://www.nautilus.org/info-policy/workshop/papers/denning.html .
Johnson, S. (2002). Use the blog, Luke. Salon Magazine. May 10, 2003.   http://www.salon.com/tech/feature/2002/05/10/blogbrain/ .

(4/18/04)  A number of commercial collections of patient-based information, available to health care organizations by license for use on their internal Web sites, have become available:
There are also growing numbers of free consumer-oriented resources available as well, including: 数据挖掘研究院

4.4  Databases/Collections

4.4.1  Images

(4/18/03)  As with other content, the number of image databases continues to grow.  Another collection of pathology images is the Pathology Education Instructional Resource (PEIR, http://www.peir.net/).

The Health Education Assets Library (HEAL, http://www.healcentral.org/) is a project aiming to create a national repository of free, Web-based multimedia teaching materials in the health sciences.  Associated with each image is a standard metadata record based on the Dublin Core Metadata Initiative (DCMI, http://www.dublincore.org/), which is described in Chapter 5. 数据挖掘研究院

(5/6/03)  The Digital Anatomist Project ( http://sig.biostr.washington.edu/projects/da/) models anatomical structures and the knowledge associated with them (Brinkley and Rosse, 1997).  Its indexing approach is briefly described in the update for Chapter 5.

Brinkley, J. and Rosse, C. (1997). The Digital Anatomist distributed framework and its applications to knowledge-based medical imaging. Journal of the American Medical Informatics Association, 4: 165-183.

(4/18/04)  A commercial image encyclopedia has been published by Current Medicine, Images.MD (http://www.images.md/).

Another image collection has been assembled for image retrieval research.  The CasImage collection (http://www.casimage.com/) was developed at University Hospitals of Geneva and consists of anonymized textual case reports each linked to one or more anonymized images associated with the case (Rosset et al., 2004).  A large majority of the case reports are in French, but about 20% are in English.  A paper describing the operational system that collected the images has been described (Rosset et al., 2002). 数据挖掘研究院

Rosset, A., Muller, H., et al. (2004). Casimage project:  a digital teaching files authoring environment. Journal of Thoracic Imaging , 19: 103-108.
Rosset, A., Ratib, O., et al. (2002). Integration of a multimedia teaching and reference database in a PACS environment. Radiographics, 22: 1567-1577.
最新评论共有 1 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?