14.01.2013 Views

all-SSC outperforms participants on GSC

all-SSC outperforms participants on GSC

all-SSC outperforms participants on GSC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge I vs. Ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge II<br />

D. Rebholz-Schuhmann<br />

16 th March 2011, CALBC Workshop II


CALBC workshop II<br />

Mar 16, 2011, EBI<br />

Stakes<br />

� Corpus alignment is doable: different semantic types, single<br />

document<br />

� Different similarity measures for harm<strong>on</strong>isati<strong>on</strong>: exact, nested, cos.<br />

� <str<strong>on</strong>g>SSC</str<strong>on</strong>g> generati<strong>on</strong> scales / generalisable: Iterative generati<strong>on</strong><br />

� Partner-<str<strong>on</strong>g>SSC</str<strong>on</strong>g> <str<strong>on</strong>g>outperforms</str<strong>on</strong>g> partners <strong>on</strong> <strong>GSC</strong><br />

� <str<strong>on</strong>g>all</str<strong>on</strong>g>-<str<strong>on</strong>g>SSC</str<strong>on</strong>g> <str<strong>on</strong>g>outperforms</str<strong>on</strong>g> <str<strong>on</strong>g>participants</str<strong>on</strong>g> <strong>on</strong> <strong>GSC</strong><br />

� <str<strong>on</strong>g>SSC</str<strong>on</strong>g> is the best opti<strong>on</strong> when no <strong>GSC</strong> available<br />

� Incremental improvement of <str<strong>on</strong>g>SSC</str<strong>on</strong>g> performance: Large number of<br />

c<strong>on</strong>tributi<strong>on</strong>s leads to higher performance<br />

� Performance of the <str<strong>on</strong>g>SSC</str<strong>on</strong>g> c<strong>on</strong>cerning the other types<br />

� Normalisati<strong>on</strong> of the <str<strong>on</strong>g>SSC</str<strong>on</strong>g>: c<strong>on</strong>cept id annotati<strong>on</strong>s for menti<strong>on</strong>s<br />

� Use cases of the <str<strong>on</strong>g>SSC</str<strong>on</strong>g><br />

2


CALBC workshop II<br />

Mar 16, 2011, EBI<br />

CALBC corpus development<br />

December 2010<br />

850K<br />

June 2010<br />

100K<br />

December 2009<br />

50K<br />

June 2009<br />

1,5K<br />

3


1st Ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge Rollout (Overview)<br />

CALBC workshop II<br />

Mar 16, 2011, EBI<br />

4


2nd Ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge Rollout (Overview)<br />

CALBC workshop II<br />

Mar 16, 2011, EBI<br />

� <str<strong>on</strong>g>SSC</str<strong>on</strong>g>-I: result of the Pilot Project<br />

� <str<strong>on</strong>g>SSC</str<strong>on</strong>g>-II: result of the first ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge<br />

New<br />

(850k + <strong>GSC</strong>)<br />

5


Time<br />

Timeline of the CALBC project<br />

PMB<br />

SAC<br />

www.calbc.eu<br />

Project<br />

start<br />

CALBC workshop II<br />

Mar 16, 2011, EBI<br />

Submissi<strong>on</strong> site<br />

Test runs<br />

Alignment of<br />

Partners’ c<strong>on</strong>tributi<strong>on</strong>s<br />

Alignment<br />

soluti<strong>on</strong>s<br />

available<br />

Invitati<strong>on</strong><br />

to ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge<br />

Access to<br />

test data +<br />

submissi<strong>on</strong><br />

site<br />

Harm<strong>on</strong>ised<br />

Corpus<br />

available<br />

Training<br />

data<br />

Active<br />

participati<strong>on</strong><br />

Final<br />

Corpus<br />

from<br />

Pilot<br />

6<br />

40 <str<strong>on</strong>g>participants</str<strong>on</strong>g><br />

X submissi<strong>on</strong>s<br />

Closing<br />

First<br />

Ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge<br />

CALBC Silver Standard Corpus<br />

6


Timeline of the CALBC project<br />

15 <str<strong>on</strong>g>participants</str<strong>on</strong>g><br />

22 submissi<strong>on</strong>s<br />

Time<br />

Closing<br />

1 st Ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge<br />

CALBC workshop II<br />

Mar 16, 2011, EBI<br />

27<br />

Participants<br />

1 st<br />

CALBC<br />

Workshop<br />

Invitati<strong>on</strong><br />

to ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge<br />

Access to<br />

test data +<br />

submissi<strong>on</strong><br />

site<br />

Harm<strong>on</strong>ised<br />

Corpus<br />

available<br />

Training<br />

data<br />

Active<br />

participati<strong>on</strong><br />

Final<br />

Corpus<br />

from<br />

1 st<br />

Ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge<br />

7<br />

16 <str<strong>on</strong>g>participants</str<strong>on</strong>g><br />

54 submissi<strong>on</strong>s<br />

Closing<br />

2 nd Ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge<br />

CALBC Silver Standard Corpus<br />

7


CALBC workshop II<br />

Mar 16, 2011, EBI<br />

The Ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enges<br />

1 st<br />

ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge<br />

2 nd<br />

ch<str<strong>on</strong>g>all</str<strong>on</strong>g>enge<br />

# registered <str<strong>on</strong>g>participants</str<strong>on</strong>g> 40 28<br />

# active <str<strong>on</strong>g>participants</str<strong>on</strong>g> 15 16<br />

# submissi<strong>on</strong>s 18 54<br />

8


Soluti<strong>on</strong><br />

Dicti<strong>on</strong>ary-based<br />

c<strong>on</strong>cept<br />

recogniti<strong>on</strong><br />

Indexing of<br />

tokens and terms<br />

Both, trained &<br />

rule-based<br />

soluti<strong>on</strong>s<br />

Case-based<br />

reas<strong>on</strong>ing<br />

CRF based,<br />

trained NER<br />

soluti<strong>on</strong><br />

CALBC workshop II<br />

Mar 16, 2011, EBI<br />

Partners & Participants<br />

PPs |<br />

CPs<br />

Use of Training<br />

Data<br />

PRGE CHED DISO SPE<br />

P01 [ / ] UniProtKb Jochem UMLS NCBI tax<strong>on</strong>omy<br />

P02 [ / ]<br />

P04 [ / ]<br />

Different<br />

resources incl.<br />

UniProtKb,<br />

EntrezGene<br />

UniProtKb,<br />

EntrezGene<br />

Jochem UMLS NCBI tax<strong>on</strong>omy<br />

Jochem<br />

MeSH,<br />

MedDRA, NCI,<br />

SNOMED-CT<br />

P06 [ / ] UMLS<br />

P10 [ / ]<br />

P13 [ / ]<br />

UniProtKb,<br />

EntrezGene<br />

NCI, MeSH,<br />

SNOMED-CT<br />

NCBI tax<strong>on</strong>omy<br />

P15 [ / ] UMLS UMLS UMLS UMLS<br />

P03 [ / ]<br />

UniProtKb,<br />

EntrezGene<br />

Jochem UMLS NCBI tax<strong>on</strong>omy<br />

P09 [ / ] UMLS<br />

P07 [ / ]<br />

P16 [ / ] Genia UMLS<br />

P11 YES [ / ] [ / ] [ / ] [ / ]<br />

P12 YES [ / ] [ / ] [ / ] [ / ]<br />

P14 YES [ / ] [ / ] [ / ] [ / ]<br />

9


Rec<str<strong>on</strong>g>all</str<strong>on</strong>g><br />

100.0%<br />

90.0%<br />

80.0%<br />

70.0%<br />

60.0%<br />

50.0%<br />

40.0%<br />

30.0%<br />

20.0%<br />

10.0%<br />

0.0%<br />

CALBC workshop II<br />

Mar 16, 2011, EBI<br />

Assessment: PRGE, CHED<br />

PRGE 0.0% 20.0% 40.0% 60.0% 80.0% 100.0%<br />

Precisi<strong>on</strong><br />

Rec<str<strong>on</strong>g>all</str<strong>on</strong>g><br />

100.0%<br />

90.0%<br />

80.0%<br />

70.0%<br />

60.0%<br />

50.0%<br />

40.0%<br />

30.0%<br />

20.0%<br />

10.0%<br />

0.0%<br />

CHED<br />

<str<strong>on</strong>g>SSC</str<strong>on</strong>g>-I<br />

<str<strong>on</strong>g>SSC</str<strong>on</strong>g>-I<br />

0.0% 20.0% 40.0% 60.0% 80.0% 100.0%<br />

Precisi<strong>on</strong><br />

Rec<str<strong>on</strong>g>all</str<strong>on</strong>g><br />

100.0%<br />

90.0%<br />

80.0%<br />

70.0%<br />

60.0%<br />

50.0%<br />

40.0%<br />

30.0%<br />

20.0%<br />

10.0%<br />

0.0%<br />

PRGE<br />

Series1 0.0% 20.0% 40.0% 60.0% 80.0% 100.0%<br />

Rec<str<strong>on</strong>g>all</str<strong>on</strong>g><br />

100.0%<br />

90.0%<br />

80.0%<br />

70.0%<br />

60.0%<br />

50.0%<br />

40.0%<br />

30.0%<br />

20.0%<br />

10.0%<br />

0.0%<br />

CHED<br />

<str<strong>on</strong>g>SSC</str<strong>on</strong>g>-II<br />

Series1 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0%<br />

Precisi<strong>on</strong><br />

Precisi<strong>on</strong><br />

<str<strong>on</strong>g>SSC</str<strong>on</strong>g>-II<br />

10


F-Meas<br />

90%<br />

80%<br />

70%<br />

60%<br />

50%<br />

40%<br />

30%<br />

20%<br />

10%<br />

0%<br />

CALBC workshop II<br />

Mar 16, 2011, EBI<br />

Assessment: PRGE, CHED<br />

PRGE F-Meas CHED<br />

<str<strong>on</strong>g>SSC</str<strong>on</strong>g>-I / PRGE <str<strong>on</strong>g>SSC</str<strong>on</strong>g>-II / PRGE<br />

0 2 4 6 8 10<br />

� Performance of the partners’ systems decrease for<br />

PRGE/CHEM from <str<strong>on</strong>g>SSC</str<strong>on</strong>g>-I to <str<strong>on</strong>g>SSC</str<strong>on</strong>g>-II<br />

� Performance of the <str<strong>on</strong>g>participants</str<strong>on</strong>g>’ systems increase<br />

90%<br />

80%<br />

70%<br />

60%<br />

50%<br />

40%<br />

30%<br />

20%<br />

10%<br />

0%<br />

0 2 4 6 8<br />

<str<strong>on</strong>g>SSC</str<strong>on</strong>g>-I / CHEM<br />

<str<strong>on</strong>g>SSC</str<strong>on</strong>g>-II / CHEM<br />

11


CALBC workshop II<br />

Mar 16, 2011, EBI<br />

Assessment: SPE, DISO<br />

F-Meas <str<strong>on</strong>g>SSC</str<strong>on</strong>g>-I / SPE <str<strong>on</strong>g>SSC</str<strong>on</strong>g>-II / SPE<br />

F-Meas<br />

100%<br />

90%<br />

80%<br />

70%<br />

60%<br />

50%<br />

40%<br />

30%<br />

20%<br />

10%<br />

0%<br />

Partners<br />

SPE DISO<br />

Part.<br />

0 2 4 6 8 10<br />

� For SPE/DISO the drop in performance for the PP’s<br />

systems is not very big<br />

� Performance for the <str<strong>on</strong>g>participants</str<strong>on</strong>g>’ systems increase<br />

90%<br />

80%<br />

70%<br />

60%<br />

50%<br />

40%<br />

30%<br />

20%<br />

10%<br />

0%<br />

<str<strong>on</strong>g>SSC</str<strong>on</strong>g>-I / DISO <str<strong>on</strong>g>SSC</str<strong>on</strong>g>-II / DISO<br />

0 2 4 6 8 10<br />

12


CALBC workshop II<br />

Mar 16, 2011, EBI<br />

Partners & Participants<br />

Nr. Of<br />

anntoati<strong>on</strong>s<br />

in the <str<strong>on</strong>g>SSC</str<strong>on</strong>g>-<br />

I<br />

Nr. Of CPs<br />

Nr. Of<br />

submissi<strong>on</strong>s<br />

from CPs<br />

Average<br />

nr. Of<br />

annotati<strong>on</strong>s<br />

from <str<strong>on</strong>g>all</str<strong>on</strong>g><br />

CPs<br />

Nr. Of<br />

annotati<strong>on</strong>s<br />

in the <str<strong>on</strong>g>SSC</str<strong>on</strong>g>-<br />

II<br />

CHED 228,622 6 11 233,398 238,431<br />

PRGE 275,235 9 15 343,681 435,797<br />

DISO 300,637 8 11 255,599 245,524<br />

SPE 317,211 7 9 277,071 304,503<br />

Cos-98 P12 P11 P03 P04 P01 P02 P10 P14 P08 P15 P09 P06 P07 P09 P13 P16<br />

SPE 93% 93% 79% 83% 71% 69% 84% 69% 56% 42% 2%<br />

DISO 87% 89% 71% 69% 82% 76% 78% 62% 51% 32% 3% 73%<br />

CHEM 83% 84% 75% 82% 49% 68% 51% 20% 17% 3% 23%<br />

PRGE 81% 73% 77% 66% 66% 59% 40% 52% 12% 18% 2% 50% 11% 28%<br />

Avg. 86% 85% 76% 75% 67% 68% 68% 58% 35% 27% 2%<br />

13


CALBC workshop II<br />

Mar 16, 2011, EBI<br />

Partners & Participants<br />

C<strong>on</strong>fusi<strong>on</strong> Matrix, exact matching<br />

Reference set<br />

Assessment PGN DISO SPE CHEM<br />

PGN 412,866 2,673 395 106,560<br />

DISO 451,175 4,024 2,126<br />

SPE 474,453 912<br />

CHEM 414,798<br />

C<strong>on</strong>fusi<strong>on</strong> Matrix, nested matching<br />

Reference set<br />

Assessment PGN DISO SPE CHEM<br />

PGN 412,866 2,927 695 113,546<br />

DISO 3,055 451,175 7,910 2,826<br />

SPE 516 7,992 474,453 1,376<br />

CHEM 107,436 2,859 1,155 414,798<br />

14


CALBC workshop II<br />

Mar 16, 2011, EBI<br />

Stakes<br />

� Corpus alignment is doable: different semantic types, single<br />

document<br />

� Different similarity measures for harm<strong>on</strong>isati<strong>on</strong>: exact, nested, cos.<br />

� <str<strong>on</strong>g>SSC</str<strong>on</strong>g> generati<strong>on</strong> scales / generalisable: Iterative generati<strong>on</strong><br />

� Partner-<str<strong>on</strong>g>SSC</str<strong>on</strong>g> <str<strong>on</strong>g>outperforms</str<strong>on</strong>g> partners <strong>on</strong> <strong>GSC</strong><br />

� <str<strong>on</strong>g>all</str<strong>on</strong>g>-<str<strong>on</strong>g>SSC</str<strong>on</strong>g> <str<strong>on</strong>g>outperforms</str<strong>on</strong>g> <str<strong>on</strong>g>participants</str<strong>on</strong>g> <strong>on</strong> <strong>GSC</strong><br />

� <str<strong>on</strong>g>SSC</str<strong>on</strong>g> is the best opti<strong>on</strong> when no <strong>GSC</strong> available<br />

� Incremental improvement of <str<strong>on</strong>g>SSC</str<strong>on</strong>g> performance: Large number of<br />

c<strong>on</strong>tributi<strong>on</strong>s leads to higher performance<br />

� Performance of the <str<strong>on</strong>g>SSC</str<strong>on</strong>g> c<strong>on</strong>cerning the other types<br />

� Normalisati<strong>on</strong> of the <str<strong>on</strong>g>SSC</str<strong>on</strong>g>: c<strong>on</strong>cept id annotati<strong>on</strong>s for menti<strong>on</strong>s<br />

� Use cases of the <str<strong>on</strong>g>SSC</str<strong>on</strong>g><br />

15


CALBC workshop II<br />

Mar 16, 2011, EBI<br />

End<br />

16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!