27.10.2014 Views

UMLS: The Graph Behind the Forest - Medical Ontology Research

UMLS: The Graph Behind the Forest - Medical Ontology Research

UMLS: The Graph Behind the Forest - Medical Ontology Research

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Institute for Discrete Sciences<br />

Workshop on Associating Semantics with <strong>Graph</strong>s<br />

Rutgers University<br />

April 16, 2007<br />

Unified <strong>Medical</strong> Language System<br />

<strong>The</strong> graph behind <strong>the</strong> forest<br />

Olivier Bodenreider<br />

Lister Hill National Center<br />

for Biomedical Communications<br />

Be<strong>the</strong>sda, Maryland - USA


Biomedical trees


http://www.tolweb.org/tree/


http://www.ncbi.nlm.nih.gov/Taxonomy/<br />

Lister Hill National Center for Biomedical Communications<br />

4


<strong>Medical</strong> Subject Headings<br />

http://www.nlm.nih.gov/mesh/2007/MBrowser.html<br />

Lister Hill National Center for Biomedical Communications<br />

5


Gene <strong>Ontology</strong><br />

http://amigo.geneontology.org/cgi-bin/amigo/go.cgi<br />

Lister Hill National Center for Biomedical Communications<br />

6


SNOMED Clinical Terms<br />

http://www.clininfo.co.uk/clue5/clue.htm<br />

Lister Hill National Center for Biomedical Communications<br />

7


Biomedical trees revisited


<strong>Medical</strong> Subject Headings<br />

Amino Acids, Peptides, and Proteins<br />

Proteins<br />

Cytoskeletal<br />

Proteins<br />

Contractile<br />

Proteins<br />

Muscle Proteins<br />

Membrane<br />

Proteins<br />

Dystrophin<br />

http://www.nlm.nih.gov/mesh/2007/MBrowser.html<br />

Lister Hill National Center for Biomedical Communications<br />

9


Gene <strong>Ontology</strong><br />

biological process<br />

biological regulation<br />

metabolic process<br />

regulation of<br />

biological process<br />

primary metabolic process<br />

regulation of<br />

metabolic process<br />

lipid metabolic process<br />

regulation of lipid metabolic process<br />

http://amigo.geneontology.org/cgi-bin/amigo/go.cgi<br />

Lister Hill National Center for Biomedical Communications<br />

10


SNOMED Clinical Terms<br />

disorder of trunk<br />

disorder of thorax<br />

neoplasm of trunk<br />

disorder of breast<br />

neoplasm of thorax<br />

neoplasm of breast<br />

http://www.clininfo.co.uk/clue5/clue.htm<br />

Lister Hill National Center for Biomedical Communications<br />

11


Terminology integration<br />

Unified <strong>Medical</strong> Language System


Addison’s s disease in medical vocabularies<br />

Synonyms<br />

• Addisonian syndrome<br />

• Bronzed disease<br />

• Addison melanoderma<br />

• As<strong>the</strong>nia pigmentosa<br />

• Primary adrenal deficiency<br />

• Primary adrenal insufficiency<br />

• Primary adrenocortical insufficiency<br />

• Chronic adrenocortical insufficiency<br />

eponym<br />

symptoms<br />

clinical<br />

variants<br />

Lister Hill National Center for Biomedical Communications<br />

13


Organize terms<br />

Synonymous terms clustered into a concept<br />

Preferred term<br />

Unique identifier (CUI)<br />

Addison Disease MeSH D000224<br />

Primary hypoadrenalism MedDRA 10036696<br />

Primary adrenocortical insufficiency ICD-10 E27.1<br />

Addison's disease (disorder) SNOMED CT 363732003<br />

C0001403<br />

Addison's disease<br />

Lister Hill National Center for Biomedical Communications<br />

14


SNOMED International<br />

Diseases/Diagnoses<br />

Diseases of <strong>the</strong> endocrine system<br />

Diseases of <strong>the</strong> Adrenal Glands<br />

Addison’s Disease


MeSH<br />

Diseases<br />

Endocrine Diseases<br />

Adrenal Gland Diseases<br />

Adrenal Gland Hypofunction<br />

Addison’s Disease


AOD<br />

Endocrine disorder<br />

Adrenal disorder<br />

Adrenal cortical disorder<br />

Adrenal cortical hypofunction<br />

Addison’s Disease


Read Codes<br />

Endocrine disorder<br />

Disorder of adrenal gland<br />

Hypoadrenalism<br />

Adrenal Hypofunction<br />

Corticoadrenal insufficiency<br />

Addison’s Disease


ICD-10<br />

Disorders of o<strong>the</strong>r<br />

endocrine gland<br />

O<strong>the</strong>r disorders of<br />

adrenal gland<br />

Primary adrenocortical insufficiency


Organize concepts<br />

Inter-concept<br />

relationships: hierarchies<br />

from <strong>the</strong> source<br />

vocabularies<br />

Redundancy: multiple<br />

paths<br />

One graph instead of<br />

multiple trees<br />

(multiple inheritance)<br />

A<br />

C B<br />

B D E H E F H D E<br />

G H<br />

A<br />

B C<br />

D E F<br />

G<br />

H<br />

Lister Hill National Center for Biomedical Communications<br />

20


organize concepts<br />

Endocrine Diseases<br />

Adrenal Cortex Diseases<br />

Adrenal Gland Diseases<br />

SNOMED<br />

MeSH<br />

AOD<br />

Read Codes<br />

Hypoadrenalism<br />

Adrenal Gland Hypofunction<br />

<strong>UMLS</strong><br />

Adrenal cortical hypofunction<br />

Addison’s Disease


Endocrine System<br />

Endocrine Glands<br />

Abdominal organ<br />

Diseases<br />

Endocrine Diseases<br />

Adrenal Glands<br />

Adrenal Dysfunction<br />

Adrenal Gland Diseases<br />

Adrenal Cortex Diseases<br />

Disorders of o<strong>the</strong>r<br />

endocrine gland<br />

Adrenal Cortex<br />

Adrenal Cortex Dysfunction<br />

Hypoadrenalism<br />

Adrenal Gland Hypofunction<br />

O<strong>the</strong>r disorders of<br />

adrenal gland<br />

Adrenal cortical hypofunction<br />

Secondary hypocortisolism<br />

Addison’s Disease<br />

Addison’s disease due to autoimmunity


Source Vocabularies<br />

(2007AA)<br />

139 source vocabularies<br />

• 17 languages<br />

Broad coverage of biomedicine<br />

• 5.5M names<br />

• 1.4M concepts<br />

• 16M relations<br />

Common presentation<br />

Lister Hill National Center for Biomedical Communications<br />

23


Semantic Types<br />

Anatomical<br />

Structure<br />

Fully Formed<br />

Anatomical<br />

Structure<br />

Body Part, Organ or<br />

Organ Component<br />

Embryonic<br />

Structure<br />

Pharmacologic<br />

Substance<br />

Disease or<br />

Syndrome<br />

Population<br />

Group<br />

Semantic<br />

Network<br />

Concepts<br />

Esophagus<br />

12<br />

Left Phrenic<br />

Nerve<br />

4<br />

Mediastinum<br />

Heart<br />

9 Valves 31<br />

Heart<br />

Fetal<br />

Heart<br />

Saccular<br />

Viscus<br />

22<br />

97<br />

Angina<br />

Pectoris<br />

Cardiotonic<br />

225 Agents<br />

Tissue<br />

Donors<br />

Meta<strong>the</strong>saurus


Biomedical forest<br />

vs. graph


<strong>UMLS</strong> Knowledge Source Server<br />

http://umlsks.nlm.nih.gov/<br />

Lister Hill National Center for Biomedical Communications<br />

26


Addison’s s disease in <strong>UMLS</strong>KS (1)<br />

Lister Hill National Center for Biomedical Communications<br />

27


Addison’s s disease in <strong>UMLS</strong>KS (2)<br />

Lister Hill National Center for Biomedical Communications<br />

28


Addison’s s disease in <strong>UMLS</strong>KS (3)<br />

Lister Hill National Center for Biomedical Communications<br />

29


Addison’s s disease in <strong>UMLS</strong>KS (4)<br />

Lister Hill National Center for Biomedical Communications<br />

30


Addison’s s disease in <strong>UMLS</strong>KS (5)<br />

Lister Hill National Center for Biomedical Communications<br />

31


<strong>UMLS</strong> Semantic Navigator<br />

Lister Hill National Center for Biomedical Communications<br />

32<br />

http://mor.nlm.nih.gov/perl/semnav.pl


AmiGO<br />

http://amigo.geneontology.org/cgi-bin/amigo/go.cgi<br />

Lister Hill National Center for Biomedical Communications<br />

33


GenNav<br />

http://mor.nlm.nih.gov/perl/gennav.pl<br />

Lister Hill National Center for Biomedical Communications<br />

34


Semantics of <strong>the</strong> <strong>UMLS</strong> graph<br />

Issues and challenges


Visualization of large graphs<br />

Lister Hill National Center for Biomedical Communications<br />

36


Visualization of large graphs<br />

Lister Hill National Center for Biomedical Communications<br />

37


Acyclicity<br />

“back edge” from a child concept to a parent concept<br />

A<br />

A<br />

A<br />

B<br />

D<br />

B<br />

E<br />

G<br />

H<br />

Reflexive<br />

13,000<br />

Direct<br />

1800<br />

Indirect<br />

120<br />

Lister Hill National Center for Biomedical Communications<br />

38


Underspecification of relationships<br />

Relationship “attribute” not always present<br />

Relations used to create hierarchies vs. hierachical<br />

relations<br />

Lister Hill National Center for Biomedical Communications<br />

39


Information integration<br />

Mapping<br />

Which tasks?<br />

Depending on <strong>the</strong> degree of human involvement<br />

• Hypo<strong>the</strong>sis generation / validation<br />

• Knowledge discovery<br />

• Automated reasoning<br />

Knowledge standardization<br />

• Common format<br />

• Common semantics<br />

Lister Hill National Center for Biomedical Communications<br />

40


SKOS – <strong>The</strong>saurus<br />

Which formalisms?<br />

• Simple Knowledge Organization Schema<br />

RDF – Concept-Relationship<br />

Relationship-Concept triples<br />

• Resource Description Framework<br />

Description Logics / Frames<br />

• OWL Web <strong>Ontology</strong> Language<br />

• Protégé (frames / OWL)<br />

• OBO Open Biomedical <strong>Ontology</strong><br />

Rule languages<br />

Formal logic<br />

Lister Hill National Center for Biomedical Communications<br />

41


For concepts<br />

Which identifiers?<br />

• Namespaces, ontologies, knowledge bases<br />

• OBO – Open Biomedical Ontologies<br />

• <strong>UMLS</strong> – Unified <strong>Medical</strong> Language System<br />

• NCBI Entrez (Entrez Gene, GenBank, UniGene, …)<br />

• Mappings across information sources<br />

For relationships<br />

Lister Hill National Center for Biomedical Communications<br />

42


Conclusions


Integrating subdomains<br />

Clinical<br />

repositories<br />

Genetic<br />

knowledge bases<br />

O<strong>the</strong>r<br />

subdomains<br />

SNOMED<br />

OMIM<br />

…<br />

NCBI<br />

Taxonomy<br />

<strong>UMLS</strong><br />

MeSH<br />

Biomedical<br />

literature<br />

Model<br />

organisms<br />

UWDA<br />

GO<br />

Genome<br />

Anatomy<br />

annotations<br />

Lister Hill National Center for Biomedical Communications<br />

44


Integrating subdomains<br />

O<strong>the</strong>r<br />

subdomains<br />

Clinical<br />

repositories<br />

Genetic<br />

knowledge bases<br />

Biomedical<br />

literature<br />

Model<br />

organisms<br />

Genome<br />

Anatomy<br />

annotations<br />

Lister Hill National Center for Biomedical Communications<br />

45


From glycosyltransferase<br />

to congenital muscular dystrophy<br />

glycosyltransferase<br />

GO:0016757<br />

GO:0008194<br />

isa<br />

GO:0016758<br />

GO:0008375<br />

acetylglucosaminyltransferase<br />

LARGE<br />

EG:9215<br />

has_molecular_function<br />

has_associated_phenotype<br />

GO:0008375<br />

MIM:608840<br />

acetylglucosaminyltransferase<br />

Muscular dystrophy,<br />

congenital, type 1D<br />

Lister Hill National Center for Biomedical Communications<br />

46


<strong>Medical</strong><br />

<strong>Ontology</strong><br />

<strong>Research</strong><br />

Contact:<br />

Web:<br />

olivier@nlm.nih.gov<br />

mor.nlm.nih.gov<br />

Olivier Bodenreider<br />

Lister Hill National Center<br />

for Biomedical Communications<br />

Be<strong>the</strong>sda, Maryland - USA


<strong>UMLS</strong> References<br />

<strong>UMLS</strong><br />

umlsinfo.nlm.nih.gov<br />

<strong>UMLS</strong> browsers<br />

(free, but <strong>UMLS</strong> license required)<br />

• Knowledge Source Server: umlsks.nlm.nih.gov<br />

• Semantic Navigator:<br />

http://mor.nlm.nih.gov/perl/semnav.pl<br />

• RRF browser<br />

(standalone application distributed with <strong>the</strong> <strong>UMLS</strong>)<br />

Lister Hill National Center for Biomedical Communications<br />

48


Gentle introduction<br />

<strong>UMLS</strong> References<br />

• Bodenreider O. (2004). <strong>The</strong> Unified <strong>Medical</strong> Language<br />

System (<strong>UMLS</strong>): Integrating biomedical terminology.<br />

Nucleic Acids <strong>Research</strong>; ; D267-D270.<br />

D270.<br />

http://mor.nlm.nih.gov/pubs/pdf/2004-nar<br />

nar-ob.pdf<br />

Seminal paper<br />

• Lindberg, D. A., Humphreys, B. L., & McCray, A. T.<br />

(1993). <strong>The</strong> Unified <strong>Medical</strong> Language System.<br />

Methods Inf Med, 32(4), 281-91.<br />

Lister Hill National Center for Biomedical Communications<br />

49


Biomedical information integration<br />

through RDF<br />

Biomedical perspective<br />

• Sahoo S, Zeng K, Bodenreider O, Sheth AP. (2007). From<br />

“glycosyltransferase” to “congenital muscular dystrophy”:<br />

Integrating knowledge from NCBI Entrez Gene and <strong>the</strong> Gene<br />

<strong>Ontology</strong>. Proceedings of Medinfo (in press).<br />

http://mor.nlm.nih.gov/pubs/pdf/2007-medinfo<br />

medinfo-ss.pdf<br />

Semantic Web perspective<br />

• Sahoo S, Zeng K, Bodenreider O, Sheth AP. (2007). An<br />

experiment in integrating large biomedical knowledge resources<br />

with RDF: Application to associating genotype and phenotype<br />

information. Proceedings of <strong>the</strong> workshop on Health Care and Life<br />

Sciences Data Integration for <strong>the</strong> Semantic Web at <strong>the</strong> 16th<br />

International World Wide Web Conference (WWW2007) (in press).<br />

http://mor.nlm.nih.gov/pubs/pdf/2007-www_hcls<br />

www_hcls-ss.pdfss.pdf<br />

Lister Hill National Center for Biomedical Communications<br />

50

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!