- Page 1 and 2: Automatic functional annotation of
- Page 3 and 4: Summary Kevin Nagel European Bioinf
- Page 5 and 6: Acknowledgements This thesis would
- Page 7 and 8: Contents 1 Introduction 15 1.1 Prot
- Page 9 and 10: 5 Identification of protein residue
- Page 11 and 12: B Examples of extracted functional
- Page 13 and 14: 4.5 Re-discovery of the catalytic t
- Page 15 and 16: 6.3 Evaluation of syntactical langu
- Page 17 and 18: Amino Acid 3-Letter 1-Letter Side-c
- Page 19 and 20: site 1. evolutionary site 1.1. cons
- Page 21 and 22: Figure 1.3: The protein universe an
- Page 23 and 24: oth approaches, results from data m
- Page 25: teins. In contrast, to extract a pr
- Page 29 and 30: Figure 2.1: Data banks in the prote
- Page 31 and 32: various metal binding sites. The Ca
- Page 33 and 34: Key INIT MET SIGNAL PROPEP TRANSIT
- Page 35 and 36: Annotation Sentence Manual GO ”Th
- Page 37 and 38: process consists of the following m
- Page 39 and 40: esult actually incurs some bias, be
- Page 41 and 42: the extraction that would describe
- Page 43 and 44: Chapter 3 Mining residue interactio
- Page 45 and 46: 3.1.1 Structural feature extraction
- Page 47 and 48: from the input data set by providin
- Page 49 and 50: factorised, we attempt to approxima
- Page 51 and 52: whereas in a system with one-pair i
- Page 53 and 54: 3.1.3 Grouping and selecting freque
- Page 55 and 56: key features of each dataset. The m
- Page 57 and 58: esults from data mining on OLDFIELD
- Page 59 and 60: Figure 3.5: Comparison of extracted
- Page 61 and 62: Figure 3.6: The effect of varying t
- Page 63 and 64: probabilistic classification approa
- Page 65 and 66: 4.1 Evaluation methods The biologic
- Page 67 and 68: MSDsite Reference Dataset Determine
- Page 69 and 70: PDBID Description Bound metal 1h2r
- Page 71 and 72: PDBID Description Bound metal 1iml
- Page 73 and 74: 3Cys SCOP classification SCOP domai
- Page 75 and 76: 3D pattern (k=2) Cross-validated Pa
- Page 77 and 78:
PDBID Asp-His-Ser His-2Ser Ala-His-
- Page 79 and 80:
that were found in this study. The
- Page 81 and 82:
Figure 5.1: Overview of processes a
- Page 83 and 84:
5.1.2 Entity recognition of protein
- Page 85 and 86:
RANGE-TO = ("-"+ ("to" "-+")? | "to
- Page 87 and 88:
matches the protein sequence; (2) s
- Page 89 and 90:
abstract texts was drawn from the U
- Page 91 and 92:
Unique residue entities Reference D
- Page 93 and 94:
Unique resi.-prot.-org.-association
- Page 95 and 96:
triplet association/UTRP Resource A
- Page 97 and 98:
is the set of 18,427 out of 40,750
- Page 99 and 100:
fraction [LHC07] in biomedical text
- Page 101 and 102:
identifier annotations in combinati
- Page 103 and 104:
Figure 6.1: Overview of processes a
- Page 105 and 106:
discussion on semantic relation and
- Page 107 and 108:
data collation method is necessary
- Page 109 and 110:
evaluation study: shallow parser ba
- Page 111 and 112:
REL = NP PP* VP. The extracted rela
- Page 113 and 114:
MAN FEAT Category Defintion Categor
- Page 115 and 116:
”The GlyNH2 was removed and the r
- Page 117 and 118:
”Mutation K241Q completely abolis
- Page 119 and 120:
6.3.2 Performance analysis of the c
- Page 121 and 122:
MAN FEAT Category Precision Recall
- Page 123 and 124:
The extracted information is diffic
- Page 125 and 126:
Chapter 7 Extraction of functional
- Page 127 and 128:
7.2 Results 7.2.1 Evaluation of the
- Page 129 and 130:
has the following sources: a false
- Page 131 and 132:
The knowledge base does not provide
- Page 133 and 134:
7.2.3 Cross-validation of mined cat
- Page 135 and 136:
Figure 7.3: Cross-validaiton of tex
- Page 137 and 138:
tions indicates, that the informati
- Page 139 and 140:
Figure 8.1: Overview of processes a
- Page 141 and 142:
PDB UniProtKB PDBID chainID serial
- Page 143 and 144:
mapped to 24,500 RID+UID. The ident
- Page 145 and 146:
RID+UID Sentence PAS RID+UID Senten
- Page 147 and 148:
8.3.4 General correlation found bet
- Page 149 and 150:
Residue Annotations -HSSP +HSSP OPR
- Page 151 and 152:
Chapter 9 Conclusions and future wo
- Page 153 and 154:
9.2 Limitations and future works Du
- Page 155 and 156:
Bibliography [AGM + 90] SF Altschul
- Page 157 and 158:
[BMC08] BMC. Biomed central. http:/
- Page 159 and 160:
[DCG + 04] F Diella, S Cameron, C G
- Page 161 and 162:
[Gue96] F Guenthner. Electronic lex
- Page 163 and 164:
[JK95] J Justeson and S Katz. Techn
- Page 165 and 166:
[MB99] Y Matsuo and SH Bryant. Iden
- Page 167 and 168:
[POHS05] M Pesu, J O’Shea, L Henn
- Page 169 and 170:
[STB06] MH Saier, CV Tran, and RD B
- Page 171 and 172:
[WK07] R Witte and T Kappler. Enhan
- Page 173 and 174:
Table A.1: Examples of errors in th
- Page 175 and 176:
. RID+UID Table B.1: Comparison of
- Page 177 and 178:
. . . continuation of table B.1 PAS
- Page 179 and 180:
Table C.1: Examples of literature m
- Page 181 and 182:
. . . continuation of table C.1 PMI
- Page 183 and 184:
. . . continuation of table C.1 RID
- Page 185 and 186:
Table D.1: Examples of literature m
- Page 187 and 188:
Appendix E Examples of extracted fu
- Page 189 and 190:
. . . continuation of table E.1 Sen
- Page 191 and 192:
Table F.1: Identified catalytic tri
- Page 193 and 194:
Appendix G Glossary 3D pattern - a