Escherichia coli stra<strong>in</strong> K-12, substra<strong>in</strong> DH10B 4,126 prote<strong>in</strong>s, 3,797 families Escherichia coli stra<strong>in</strong> K-12, substra<strong>in</strong> W3110 4,226 prote<strong>in</strong>s, 3,965 families Escherichia coli stra<strong>in</strong> K-12, substra<strong>in</strong> MG1655 4,150 prote<strong>in</strong>s, 3,912 families 4.3 % 167 / 3,912 95.3 % 3,843 / 4,034 91.5 % 3,685 / 4,027 4.3 % 170 / 3,965 93.1 % 3,742 / 4,020 Escherichia coli stra<strong>in</strong> K-12, substra<strong>in</strong> MG1655 4,150 prote<strong>in</strong>s, 3,912 families 6.4 % 242 / 3,797 Escherichia coli stra<strong>in</strong> K-12, substra<strong>in</strong> W3110 4,226 prote<strong>in</strong>s, 3,965 families <strong>Comparative</strong> Genomics Escherichia coli stra<strong>in</strong> K-12, substra<strong>in</strong> DH10B 4,126 prote<strong>in</strong>s, 3,797 families Figure 2.9: Construction of the BLASTmatrix diagram. Proteome similarity between three E. coli genomes. Lower part of the diagram corresponds to <strong>in</strong>tra-proteome similarity. lari jejuni concisus curvus fetus hom<strong>in</strong>is 2.3 % 34 / 1,494 57.2 % 1,123 / 1,965 Campylobacter fetus subsp. fetus 82-40 1,719 prote<strong>in</strong>s, 1,665 families Campylobacter hom<strong>in</strong>is ATCC BAA-381 1,687 prote<strong>in</strong>s, 1,623 families Campylobacter jejuni RM1221 1,838 prote<strong>in</strong>s, 1,780 families Campylobacter jejuni subsp. doylei 269.97 1,731 prote<strong>in</strong>s, 1,650 families Campylobacter jejuni subsp. jejuni 81-176 1,758 prote<strong>in</strong>s, 1,702 families Campylobacter jejuni subsp. jejuni 81116 1,626 prote<strong>in</strong>s, 1,585 families Campylobacter jejuni subsp. jejuni NCTC 11168 1,624 prote<strong>in</strong>s, 1,581 families Campylobacter lari RM2100 1,546 prote<strong>in</strong>s, 1,494 families 56.7 % 1,123 / 1,979 1.7 % 27 / 1,581 55.2 % 1,145 / 2,073 84.7 % 1,448 / 1,709 Campylobacter concisus 13826 2,080 prote<strong>in</strong>s, 1,972 families Campylobacter curvus 525.92 1,931 prote<strong>in</strong>s, 1,885 families 49.4 % 1,062 / 2,150 83.5 % 1,481 / 1,773 1.5 % 24 / 1,585 53.0 % 1,143 / 2,158 67.3 % 1,316 / 1,955 82.9 % 1,474 / 1,778 22.8 % 596 / 2,619 76.9 % 1,466 / 1,906 64.4 % 1,289 / 2,003 2.3 % 39 / 1,702 30.0 % 742 / 2,476 22.9 % 614 / 2,676 74.6 % 1,441 / 1,931 62.2 % 1,304 / 2,096 24.7 % 682 / 2,756 30.6 % 774 / 2,526 23.1 % 617 / 2,675 71.4 % 1,451 / 2,032 4.0 % 66 / 1,650 24.5 % 704 / 2,875 24.8 % 698 / 2,820 30.3 % 770 / 2,538 22.5 % 628 / 2,795 63.5 % 1,345 / 2,118 Campylobacter lari RM2100 24.4 % 718 / 2,948 25.1 % 706 / 2,816 28.7 % 767 / 2,669 21.2 % 595 / 2,802 2.3 % 41 / 1,780 1,546 prote<strong>in</strong>s, 1,494 families Campylobacter jejuni subsp. jejuni NCTC 11168 1,624 prote<strong>in</strong>s, 1,581 families 24.3 % 717 / 2,950 23.7 % 699 / 2,950 27.5 % 736 / 2,676 21.4 % 618 / 2,886 Campylobacter jejuni subsp. jejuni 81116 1,626 prote<strong>in</strong>s, 1,585 families 23.6 % 723 / 3,070 22.5 % 668 / 2,964 27.9 % 767 / 2,750 2.0 % 33 / 1,623 22.7 % 698 / 3,076 23.0 % 698 / 3,036 30.4 % 782 / 2,576 22.5 % 713 / 3,175 26.1 % 741 / 2,838 1.5 % 25 / 1,665 lari Campylobacter jejuni subsp. jejuni 81-176 1,758 prote<strong>in</strong>s, 1,702 families Campylobacter jejuni subsp. doylei 269.97 1,731 prote<strong>in</strong>s, 1,650 families Campylobacter jejuni RM1221 25.8 % 765 / 2,961 34.7 % 929 / 2,678 1,838 prote<strong>in</strong>s, 1,780 families 32.4 % 916 / 2,828 1.8 % 34 / 1,885 50.3 % 1,317 / 2,616 jejuni Campylobacter hom<strong>in</strong>is ATCC BAA-381 1,687 prote<strong>in</strong>s, 1,623 families Campylobacter fetus subsp. fetus 82-40 1,719 prote<strong>in</strong>s, 1,665 families Campylobacter curvus 525.92 3.5 % 69 / 1,972 1.5 % Homology between proteomes 1,931 prote<strong>in</strong>s, 1,885 families Campylobacter concisus 13826 2,080 prote<strong>in</strong>s, 1,972 families hom<strong>in</strong>is fetus curvus concisus Homology with<strong>in</strong> proteomes Figure 2.10: Proteome similarity between ten Campylobacter species. Color encod<strong>in</strong>g corresponds to percentage of shared prote<strong>in</strong> families. 21.2 % 84.7 % 4.0 % 17
Genome Comparisons A.salmonicida LFI1238 V.species Ex25 V.campbellii AND4 V.harveyi BAA1116 V.shilonii AK1 P.profundum SS9 27.2 % 1,946 / 7,165 27.1 % 31.2 % 1,964 / 7,245 2,143 / 6,862 27.5 % 31.1 % 32.5 % 1,971 / 7,179 2,163 / 6,948 2,385 / 7,336 26.3 % 31.5 % 32.6 % 35.8 % 1,893 / 7,208 2,169 / 6,884 2,405 / 7,380 2,018 / 5,637 28.0 % 30.4 % 33.1 % 35.9 % 38.7 % 1,962 / 7,016 2,098 / 6,893 2,415 / 7,299 2,049 / 5,713 2,143 / 5,536 28.7 % 32.3 % 31.7 % 36.4 % 38.3 % 32.1 % 1,944 / 6,766 2,164 / 6,706 2,323 / 7,337 2,055 / 5,647 2,156 / 5,631 1,846 / 5,747 28.2 % 33.0 % 33.6 % 34.7 % 38.8 % 32.1 % 34.0 % 1,960 / 6,957 2,137 / 6,467 2,410 / 7,181 1,968 / 5,677 2,162 / 5,566 1,873 / 5,828 1,963 / 5,771 27.6 % 32.4 % 34.3 % 37.3 % 37.9 % 32.5 % 33.7 % 35.0 % 1,965 / 7,122 2,155 / 6,649 2,377 / 6,932 2,045 / 5,477 2,110 / 5,560 1,873 / 5,769 1,977 / 5,865 1,949 / 5,561 27.7 % 31.8 % 33.8 % 38.7 % 40.3 % 30.6 % 34.2 % 34.8 % 40.3 % 1,965 / 7,093 2,169 / 6,817 2,403 / 7,116 2,021 / 5,225 2,167 / 5,378 1,777 / 5,804 1,983 / 5,797 1,967 / 5,647 2,326 / 5,771 27.8 % 32.1 % 33.3 % 37.4 % 41.6 % 33.3 % 32.5 % 35.3 % 39.8 % 38.4 % 1,967 / 7,064 2,173 / 6,778 2,418 / 7,252 2,032 / 5,428 2,140 / 5,139 1,863 / 5,593 1,896 / 5,827 1,972 / 5,581 2,339 / 5,873 2,291 / 5,971 25.7 % 32.2 % 33.5 % 36.7 % 40.6 % 34.4 % 35.3 % 33.6 % 40.4 % 38.0 % 41.7 % 1,850 / 7,198 2,173 / 6,752 2,420 / 7,225 2,048 / 5,585 2,159 / 5,323 1,846 / 5,360 1,981 / 5,619 1,884 / 5,612 2,345 / 5,808 2,307 / 6,067 2,552 / 6,116 25.6 % 30.3 % 33.6 % 37.0 % 39.5 % 33.4 % 36.6 % 36.3 % 38.6 % 38.5 % 41.2 % 44.3 % 1,841 / 7,194 2,079 / 6,856 2,420 / 7,193 2,051 / 5,545 2,169 / 5,493 1,852 / 5,547 1,964 / 5,371 1,965 / 5,413 2,251 / 5,839 2,311 / 6,004 2,564 / 6,224 2,515 / 5,683 28.1 % 29.7 % 31.0 % 37.2 % 39.7 % 32.7 % 35.5 % 37.7 % 41.7 % 37.0 % 41.9 % 43.7 % 42.2 % 1,904 / 6,782 2,044 / 6,887 2,282 / 7,362 2,052 / 5,516 2,168 / 5,459 1,868 / 5,705 1,974 / 5,563 1,947 / 5,165 2,346 / 5,626 2,227 / 6,026 2,575 / 6,151 2,527 / 5,781 2,215 / 5,254 26.9 % 32.4 % 30.8 % 34.4 % 40.0 % 33.0 % 34.6 % 36.6 % 42.9 % 39.7 % 40.0 % 44.5 % 41.6 % 40.0 % 1,851 / 6,869 2,098 / 6,481 2,270 / 7,379 1,944 / 5,645 2,171 / 5,428 1,872 / 5,667 1,982 / 5,732 1,961 / 5,354 2,314 / 5,388 2,312 / 5,825 2,473 / 6,185 2,539 / 5,707 2,225 / 5,354 2,421 / 6,055 28.2 % 31.2 % 33.3 % 34.8 % 38.2 % 33.2 % 34.8 % 35.7 % 41.9 % 40.6 % 42.9 % 42.8 % 42.3 % 39.6 % 70.3 % 1,949 / 6,915 2,045 / 6,565 2,327 / 6,984 1,952 / 5,606 2,104 / 5,504 1,872 / 5,641 1,984 / 5,694 1,969 / 5,522 2,334 / 5,571 2,270 / 5,592 2,564 / 5,977 2,449 / 5,718 2,236 / 5,283 2,438 / 6,154 2,933 / 4,174 27.9 % 32.6 % 32.1 % 38.1 % 37.3 % 30.2 % 35.0 % 35.9 % 40.9 % 39.9 % 44.1 % 45.9 % 41.3 % 40.0 % 69.2 % 73.6 % 1,942 / 6,969 2,153 / 6,600 2,268 / 7,062 1,994 / 5,228 2,064 / 5,537 1,747 / 5,786 1,985 / 5,667 1,971 / 5,485 2,343 / 5,733 2,299 / 5,768 2,533 / 5,743 2,535 / 5,526 2,181 / 5,277 2,440 / 6,094 2,953 / 4,267 3,045 / 4,135 27.9 % 31.8 % 34.2 % 36.4 % 41.6 % 30.0 % 31.9 % 36.1 % 41.2 % 38.9 % 43.3 % 47.1 % 43.8 % 38.4 % 69.7 % 74.9 % 71.6 % 1,941 / 6,954 2,123 / 6,682 2,394 / 7,002 1,935 / 5,317 2,134 / 5,135 1,736 / 5,791 1,857 / 5,817 1,971 / 5,458 2,346 / 5,697 2,309 / 5,932 2,559 / 5,916 2,503 / 5,310 2,234 / 5,101 2,348 / 6,120 2,944 / 4,221 3,101 / 4,142 3,010 / 4,205 27.9 % 32.0 % 33.4 % 37.7 % 39.1 % 33.6 % 32.1 % 32.8 % 41.4 % 39.3 % 42.3 % 46.4 % 45.9 % 41.4 % 66.3 % 75.5 % 72.6 % 75.9 % 1,909 / 6,851 2,130 / 6,656 2,359 / 7,060 2,026 / 5,367 2,048 / 5,244 1,805 / 5,377 1,861 / 5,795 1,843 / 5,611 2,346 / 5,670 2,314 / 5,892 2,572 / 6,075 2,534 / 5,464 2,223 / 4,842 2,445 / 5,905 2,833 / 4,271 3,089 / 4,092 3,068 / 4,226 3,094 / 4,077 29.6 % 32.0 % 33.4 % 37.3 % 40.4 % 31.9 % 35.6 % 33.1 % 38.0 % 39.4 % 42.7 % 45.2 % 44.3 % 42.4 % 73.2 % 69.8 % 73.5 % 77.2 % 68.7 % 2,295 / 7,753 2,097 / 6,549 2,375 / 7,115 2,022 / 5,418 2,139 / 5,293 1,743 / 5,469 1,922 / 5,398 1,848 / 5,585 2,213 / 5,823 2,314 / 5,868 2,578 / 6,032 2,546 / 5,633 2,232 / 5,038 2,408 / 5,683 2,952 / 4,034 2,942 / 4,217 3,065 / 4,172 3,155 / 4,088 2,874 / 4,181 27.9 % 35.2 % 33.0 % 37.3 % 39.4 % 33.5 % 34.2 % 36.7 % 38.0 % 36.9 % 42.9 % 45.5 % 43.0 % 41.8 % 73.5 % 76.0 % 68.5 % 78.0 % 67.2 % 70.4 % 1,972 / 7,061 2,581 / 7,333 2,325 / 7,056 2,019 / 5,407 2,118 / 5,370 1,845 / 5,501 1,872 / 5,473 1,906 / 5,192 2,209 / 5,811 2,208 / 5,989 2,579 / 6,005 2,548 / 5,599 2,240 / 5,212 2,434 / 5,818 2,863 / 3,897 3,059 / 4,025 2,914 / 4,256 3,149 / 4,038 2,880 / 4,288 2,922 / 4,153 29.4 % 34.3 % 46.4 % 37.8 % 40.3 % 32.9 % 35.7 % 34.9 % 41.8 % 36.4 % 39.4 % 45.8 % 43.4 % 40.8 % 76.4 % 75.2 % 74.1 % 71.5 % 69.7 % 70.3 % 64.7 % 2,212 / 7,534 2,276 / 6,634 3,371 / 7,266 2,001 / 5,288 2,145 / 5,320 1,824 / 5,545 1,970 / 5,513 1,843 / 5,282 2,264 / 5,418 2,186 / 6,003 2,432 / 6,172 2,552 / 5,568 2,242 / 5,171 2,445 / 5,993 2,970 / 3,887 2,954 / 3,928 3,024 / 4,083 2,986 / 4,175 2,916 / 4,183 2,965 / 4,217 2,888 / 4,463 27.8 % 34.4 % 34.9 % 47.0 % 39.8 % 33.0 % 34.9 % 36.8 % 39.9 % 39.9 % 39.1 % 42.2 % 43.6 % 41.1 % 73.1 % 80.4 % 73.0 % 79.5 % 69.0 % 72.2 % 64.9 % 76.9 % 2,222 / 7,979 2,472 / 7,184 2,496 / 7,160 2,741 / 5,827 2,086 / 5,245 1,831 / 5,549 1,952 / 5,586 1,951 / 5,307 2,202 / 5,514 2,238 / 5,609 2,413 / 6,176 2,409 / 5,711 2,244 / 5,143 2,450 / 5,957 2,977 / 4,072 3,080 / 3,831 2,908 / 3,986 3,125 / 3,932 2,860 / 4,145 2,986 / 4,136 2,940 / 4,533 3,165 / 4,117 28.1 % 33.0 % 38.7 % 37.8 % 64.9 % 33.1 % 35.2 % 36.1 % 42.0 % 38.0 % 43.0 % 41.4 % 41.1 % 41.3 % 73.4 % 77.3 % 78.5 % 77.9 % 71.8 % 68.5 % 67.6 % 76.7 % 83.4 % 2,155 / 7,667 2,516 / 7,615 2,880 / 7,439 2,081 / 5,503 3,384 / 5,214 1,804 / 5,448 1,954 / 5,558 1,940 / 5,373 2,320 / 5,530 2,171 / 5,707 2,483 / 5,781 2,372 / 5,735 2,153 / 5,242 2,449 / 5,936 2,971 / 4,050 3,098 / 4,009 3,061 / 3,901 3,002 / 3,856 2,896 / 4,036 2,869 / 4,191 2,983 / 4,413 3,195 / 4,167 3,315 / 3,973 29.5 % 36.5 % 37.0 % 39.9 % 45.0 % 31.9 % 35.3 % 36.2 % 41.1 % 40.1 % 40.1 % 46.3 % 41.2 % 37.9 % 73.8 % 77.1 % 75.8 % 83.0 % 71.5 % 73.7 % 65.1 % 81.6 % 81.3 % 82.4 % 2,198 / 7,456 2,593 / 7,105 2,900 / 7,832 2,372 / 5,942 2,357 / 5,232 2,074 / 6,494 1,926 / 5,455 1,940 / 5,352 2,303 / 5,603 2,293 / 5,719 2,373 / 5,919 2,464 / 5,326 2,152 / 5,228 2,313 / 6,099 2,975 / 4,030 3,088 / 4,007 3,073 / 4,056 3,135 / 3,777 2,801 / 3,915 2,947 / 4,001 2,880 / 4,423 3,264 / 4,000 3,320 / 4,085 3,302 / 4,009 30.3 % 36.7 % 34.6 % 37.5 % 46.1 % 32.3 % 35.5 % 36.3 % 41.6 % 39.2 % 43.5 % 43.5 % 46.0 % 38.2 % 65.6 % 78.0 % 75.1 % 79.4 % 72.2 % 81.0 % 67.3 % 77.5 % 81.9 % 80.8 % 83.2 % 2,110 / 6,968 2,562 / 6,982 2,682 / 7,762 2,396 / 6,387 2,626 / 5,697 1,842 / 5,705 2,270 / 6,400 1,906 / 5,250 2,314 / 5,569 2,272 / 5,796 2,550 / 5,859 2,367 / 5,437 2,220 / 4,821 2,320 / 6,080 2,791 / 4,256 3,097 / 3,971 3,061 / 4,077 3,144 / 3,961 2,861 / 3,960 2,989 / 3,688 2,909 / 4,320 3,153 / 4,066 3,311 / 4,041 3,319 / 4,106 3,325 / 3,995 29.7 % 30.4 % 36.7 % 36.9 % 43.2 % 32.6 % 34.5 % 35.9 % 41.5 % 39.8 % 42.2 % 45.9 % 42.7 % 42.3 % 65.2 % 71.3 % 76.3 % 79.3 % 69.0 % 74.9 % 67.8 % 78.4 % 76.3 % 81.6 % 80.7 % 85.8 % 2,127 / 7,169 2,085 / 6,866 2,759 / 7,516 2,259 / 6,124 2,655 / 6,143 2,040 / 6,250 1,965 / 5,696 2,233 / 6,219 2,272 / 5,479 2,292 / 5,756 2,506 / 5,941 2,501 / 5,451 2,113 / 4,953 2,399 / 5,675 2,768 / 4,246 2,953 / 4,142 3,076 / 4,029 3,138 / 3,958 2,868 / 4,158 2,944 / 3,932 2,836 / 4,184 3,157 / 4,029 3,142 / 4,120 3,311 / 4,057 3,321 / 4,117 3,291 / 3,837 28.3 % 29.4 % 29.6 % 38.6 % 40.2 % 30.5 % 35.3 % 36.2 % 43.9 % 39.2 % 42.9 % 46.1 % 44.3 % 39.7 % 71.6 % 70.2 % 69.2 % 80.3 % 69.1 % 73.3 % 68.1 % 74.3 % 83.7 % 75.3 % 81.4 % 82.5 % 79.6 % 1,980 / 6,989 2,083 / 7,082 2,214 / 7,478 2,289 / 5,931 2,413 / 5,999 2,050 / 6,715 2,191 / 6,211 1,976 / 5,464 2,762 / 6,293 2,230 / 5,684 2,536 / 5,906 2,513 / 5,455 2,213 / 5,001 2,303 / 5,796 2,802 / 3,915 2,930 / 4,172 2,925 / 4,226 3,147 / 3,918 2,864 / 4,145 2,983 / 4,071 2,876 / 4,226 2,987 / 4,018 3,275 / 3,915 3,136 / 4,162 3,309 / 4,067 3,278 / 3,971 3,139 / 3,944 28.0 % 26.7 % 29.3 % 33.6 % 42.3 % 33.1 % 33.1 % 36.3 % 45.4 % 41.4 % 42.3 % 45.7 % 43.5 % 42.9 % 68.6 % 77.2 % 64.3 % 73.1 % 70.0 % 73.6 % 66.4 % 78.6 % 86.6 % 82.6 % 76.0 % 83.4 % 78.1 % 92.9 % 2,022 / 7,222 1,916 / 7,168 2,244 / 7,665 1,915 / 5,695 2,451 / 5,795 2,074 / 6,269 2,209 / 6,672 2,179 / 6,005 2,507 / 5,523 2,698 / 6,523 2,475 / 5,845 2,506 / 5,480 2,200 / 5,058 2,463 / 5,745 2,743 / 4,001 2,983 / 3,866 2,805 / 4,365 3,000 / 4,103 2,873 / 4,102 2,979 / 4,045 2,917 / 4,393 3,113 / 3,962 3,253 / 3,757 3,267 / 3,954 3,147 / 4,141 3,267 / 3,919 3,147 / 4,032 3,489 / 3,754 25.5 % 34.5 % 28.3 % 32.5 % 34.5 % 34.9 % 35.7 % 34.2 % 43.7 % 43.7 % 46.4 % 45.1 % 44.9 % 40.8 % 77.1 % 71.8 % 71.6 % 69.5 % 68.3 % 74.3 % 66.3 % 75.5 % 91.2 % 85.6 % 82.9 % 79.4 % 80.2 % 89.7 % 77.1 % 1,872 / 7,339 2,335 / 6,762 2,095 / 7,406 1,919 / 5,903 1,963 / 5,692 2,114 / 6,065 2,219 / 6,213 2,205 / 6,448 2,670 / 6,112 2,492 / 5,705 3,042 / 6,550 2,444 / 5,415 2,242 / 4,998 2,400 / 5,876 2,975 / 3,861 2,855 / 3,974 2,868 / 4,006 2,908 / 4,185 2,820 / 4,126 2,982 / 4,014 2,908 / 4,386 3,125 / 4,141 3,355 / 3,679 3,244 / 3,790 3,277 / 3,954 3,143 / 3,956 3,169 / 3,953 3,485 / 3,884 3,186 / 4,134 26.1 % 30.9 % 43.4 % 30.3 % 33.9 % 55.5 % 38.1 % 36.6 % 40.8 % 41.9 % 43.2 % 48.9 % 43.5 % 42.4 % 73.0 % 82.5 % 67.9 % 76.7 % 68.0 % 68.5 % 67.0 % 74.6 % 91.7 % 90.1 % 83.2 % 87.0 % 75.1 % 81.1 % 74.9 % 80.4 % 2,254 / 8,624 2,144 / 6,948 2,981 / 6,875 1,795 / 5,923 1,991 / 5,874 2,683 / 4,838 2,277 / 5,979 2,201 / 6,016 2,680 / 6,565 2,637 / 6,301 2,597 / 6,013 2,994 / 6,128 2,155 / 4,958 2,451 / 5,781 2,911 / 3,989 3,117 / 3,780 2,780 / 4,092 2,961 / 3,861 2,806 / 4,126 2,844 / 4,150 2,915 / 4,348 3,103 / 4,160 3,455 / 3,766 3,346 / 3,715 3,208 / 3,855 3,272 / 3,762 3,024 / 4,028 3,280 / 4,046 3,187 / 4,253 3,303 / 4,109 25.9 % 30.1 % 45.0 % 46.2 % 30.5 % 52.4 % 75.0 % 38.7 % 72.3 % 39.7 % 67.5 % 47.2 % 43.5 % 40.9 % 74.7 % 78.0 % 78.6 % 71.8 % 73.1 % 70.6 % 64.7 % 75.4 % 96.0 % 90.4 % 91.4 % 83.0 % 80.7 % 77.3 % 80.2 % 88.8 % 88.1 % 2,170 / 8,370 2,581 / 8,574 3,018 / 6,702 2,452 / 5,307 1,813 / 5,939 2,666 / 5,085 3,261 / 4,346 2,246 / 5,808 3,688 / 5,101 2,672 / 6,728 3,741 / 5,540 2,608 / 5,524 2,547 / 5,858 2,360 / 5,769 2,922 / 3,914 3,045 / 3,906 3,059 / 3,894 2,849 / 3,968 2,818 / 3,854 2,886 / 4,087 2,847 / 4,403 3,111 / 4,124 3,531 / 3,678 3,439 / 3,805 3,373 / 3,689 3,126 / 3,768 3,108 / 3,853 3,164 / 4,093 3,271 / 4,079 3,489 / 3,927 3,495 / 3,966 5.0 % 243 / 4,897 3.9 % 200 / 5,078 3.9 % 201 / 5,117 V.parahaemolyticus 2210633 V.parahaemolyticus 16 V.vulnificus CMCP6 V.vulnificus YJ016 V.species MED222 V.splendidus LGP32 V.fischeri ES114 V.fischeri MJ11 2.3 % 88 / 3,822 2.7 % 103 / 3,886 3.3 % 111 / 3,378 2.9 % 112 / 3,894 2.6 % 96 / 3,691 2.8 % 118 / 4,277 2.3 % 103 / 4,463 3.1 % 150 / 4,773 V.cholerae MO10 V.cholerae BX330286 V.cholerae RC9 V.cholerae MJ1236 V.cholerae B33VCE V.cholerae 2740-80 V.cholerae AM-19226 V.cholerae MZO-2 V.cholerae 12129 V.cholerae TM11079-80 V.cholerae TMA21 V.cholerae VL426 V.cholerae 1587 2.8 % 121 / 4,337 2.1 % 79 / 3,683 V.cholerae N16961 V.cholerae 0395 TEDA V.cholerae 0395 TIGR V.cholerae V52 V.cholerae M66-2 3.2 % 147 / 4,662 1.9 % 62 / 3,316 2.9 % 99 / 3,427 P.profundum SS9 2.4 % 83 / 3,442 V.shilonii AK1 V.harveyi BAA1116 2.1 % 72 / 3,454 V.campbellii AND4 V.species Ex25 2.2 % 73 / 3,311 A.salmonicida LFI1238 V.fischeri MJ11 2.5 % 84 / 3,305 V.fischeri ES114 V.splendidus LGP32 2.8 % 99 / 3,586 V.species MED222 V.vulnificus YJ016 3.5 % 125 / 3,567 V.vulnificus CMCP6 V.parahaemolyticus 16 2.6 % 92 / 3,593 V.parahaemolyticus 2210633 V.cholerae VL426 3.0 % 109 / 3,575 V.cholerae TMA21 V.cholerae TM11079-80 2.8 % 102 / 3,619 V.cholerae 12129 V.cholerae MZO-2 2.9 % 100 / 3,429 V.cholerae AM-19226 V.cholerae 1587 1.8 % 59 / 3,353 30.0 % V.cholerae 2740-80 V.cholerae B33VCE 2.8 % 99 / 3,560 0.0 % Homology between proteomes Homology with<strong>in</strong> proteomes V.cholerae MJ1236 V.cholerae RC9 3.3 % 120 / 3,599 V.cholerae BX330286 V.cholerae MO10 4.3 % 157 / 3,665 V.cholerae M66-2 V.cholerae V52 4.2 % 155 / 3,729 V.cholerae 0395 TIGR V.cholerae 0395 TEDA 3.0 % 110 / 3,665 90.0 % 6.0 % V.cholerae N16961 Figure 2.11: Proteome comparison of 32 Vibrionaceae genomes. Environmental V. cholerae stra<strong>in</strong>s lack<strong>in</strong>g the cholera enterotox<strong>in</strong> genes are highlighted <strong>in</strong> bright green, whilst pathogenic V. cholerae stra<strong>in</strong>s genomes are shown <strong>in</strong> dark green. Large similarities between environmental <strong>and</strong> pathogenic V. cholerae The BLAST matrix shown <strong>in</strong> figure 2.11 <strong>in</strong>cludes environmental <strong>and</strong> pathogenetic stra<strong>in</strong>s of V. cholerae. The figures shows that with<strong>in</strong> <strong>and</strong> between these two groups the V. cholerae stra<strong>in</strong>s share a large number of genes. Intra- vs. <strong>in</strong>ter-proteome similarity The lower row of the diagram shows the special case of organism A versus itself. This shows the <strong>in</strong>tra-proteome similarity. If not dealt with separately, this part would appear as 100% similar s<strong>in</strong>ce the proteome is BLASTed aga<strong>in</strong>st itself. However, all self-match<strong>in</strong>g prote<strong>in</strong>s are excluded, leav<strong>in</strong>g this part to reflect the paraloges of the organism. Also, this part has a separate color encod<strong>in</strong>g (red) whereas the <strong>in</strong>tra-protome comparison is coded green (see figure 2.10). 2.3.7 BLASTatlas - visualiz<strong>in</strong>g while-genome homology The BLASTmatrix tool described earlier condenses the similarity between two proteomes <strong>in</strong>to a s<strong>in</strong>gle number. This simplification allows for an all-aga<strong>in</strong>st-all comparison, but lacks detailed <strong>in</strong>formation on the conserved genes <strong>and</strong> where these are located. The BLASTatlas method overcomes these issues by compar<strong>in</strong>g the proteomes to a s<strong>in</strong>gle reference chromosome. When a s<strong>in</strong>gle representative chromosome has been selected, all ORF’s or prote<strong>in</strong>s of that reference is BLASTed aga<strong>in</strong>st each of the proteome to be <strong>in</strong>cluded <strong>in</strong> the comparison. The most optimal alignment of each proteome, disregard<strong>in</strong>g the significance, is mapped back to the reference genome. A numerical value of zero is mapped at mismatches or gaps, 0.5 at conservative mismatches, <strong>and</strong> one is mapped to matches. This method has proved powerful because it answers several questions <strong>in</strong> one diagram: Which reference prote<strong>in</strong>s are found <strong>in</strong> which query genomes? How well are they conserved? And is there 18
- Page 1 and 2: Peter Fischer Hallin | 2009 Peter F
- Page 4: Preface This Ph.D. thesis is writte
- Page 7 and 8: thesis, the work is just being publ
- Page 9 and 10: ved at blive publiceret i Standards
- Page 11 and 12: viii
- Page 13 and 14: Paper VI [Lagesen K, Hallin P] 1 ,
- Page 15 and 16: xii
- Page 17 and 18: 3.3.3 Refining E. coli and Shigella
- Page 19 and 20: xvi
- Page 21 and 22: xviii 2.17 Pan- and core-genome plo
- Page 24 and 25: Chapter 1 Introduction Introduction
- Page 26 and 27: Chapter 2 Comparative Genomics 2.1
- Page 28 and 29: Comparative Genomics the publicly a
- Page 30 and 31: Comparative Genomics source CDS tot
- Page 32 and 33: Comparative Genomics 1 mysql -N -B
- Page 34 and 35: Listing 2.8: R code to generate a 2
- Page 36 and 37: 1st U C A G U 2nd position C A G 3r
- Page 38 and 39: 1st U C A G U 2nd position C A G 3r
- Page 42 and 43: Comparative Genomics
- Page 44 and 45: 3M 2.5M 3.5M 2.5M 2M 0M 2M 0.5M B.
- Page 46 and 47: Streptococcus Escherichia Bacillus
- Page 48 and 49: 2.4 Summary Comparative Genomics Th
- Page 50 and 51: Comparative Genomics 2.5 Instant in
- Page 52 and 53: ‘ReSourCe is he best online submi
- Page 54 and 55: up to a total of 41 different E. co
- Page 56 and 57: Fig. 2 Genes (or segments) from eac
- Page 58 and 59: Fig. 5 BLASTatlas of Clostridium bo
- Page 60 and 61: different applications, such as ide
- Page 62 and 63: 1 Comparative Genomics 2.7 Paper II
- Page 64 and 65: 166 literally millions of bacterial
- Page 66 and 67: 168
- Page 68 and 69: 170 resistance genes on mobile gene
- Page 70 and 71: 172 involved in generating diversit
- Page 72 and 73: 174 recipient DNA. A feature observ
- Page 74 and 75: 176 Fig. 5 Genome length distributi
- Page 76 and 77: 178
- Page 78 and 79: 180 reasons why organisms remain un
- Page 80 and 81: 182 A final problem has to do with
- Page 82 and 83: 184 Middendorf B, Hochhut B, Leipol
- Page 84 and 85: 1 Comparative Genomics 2.8 Paper II
- Page 86 and 87: 2 O. N. Reva et al. Fig. 1. Genome
- Page 88 and 89: 4 O. N. Reva et al. decrease of the
- Page 90 and 91:
6 O. N. Reva et al. of which are kn
- Page 92 and 93:
8 O. N. Reva et al. compiled into a
- Page 94 and 95:
10 O. N. Reva et al. encoded by a c
- Page 96 and 97:
12 O. N. Reva et al. systems and ef
- Page 98 and 99:
1 2.9 Paper IV: The origins of Vibr
- Page 100 and 101:
phylogenies based on alternative ho
- Page 102 and 103:
Figure 1 Phylogenetic tree of the 1
- Page 104 and 105:
25000 20000 15000 10000 5000 0 Pan
- Page 106 and 107:
Gap F 2M 2.5M Gap E 875k 750k 625k
- Page 108 and 109:
Table 2 A selection of genes locate
- Page 110 and 111:
Open Access This article is distrib
- Page 112 and 113:
1 Comparative Genomics 2.10 Paper V
- Page 114 and 115:
4314 74 Tools Abstract: Of the plet
- Page 116 and 117:
4316 74 Tools Size distribution of
- Page 118 and 119:
4318 74 Tools Genome atlas Intrinsi
- Page 120 and 121:
4320 74 Tools Genome atlas Intrinsi
- Page 122 and 123:
4322 74 Tools for Comparison of Bac
- Page 124 and 125:
4324 74 Tools for Comparison of Bac
- Page 126 and 127:
4326 74 Tools information, as genet
- Page 128 and 129:
Chapter 3 rRNA operons and promoter
- Page 130 and 131:
tuB murI Fis III Fis II Fis I UP -3
- Page 132 and 133:
Bits 2.0 1.5 1.0 0.5 0.0 Bits 2.0 1
- Page 134 and 135:
RNA operons and promoter analysis O
- Page 136 and 137:
Bits 2.0 1.5 1.0 0.5 0.0 Bits T A T
- Page 138 and 139:
Code Meaning Example C Coding CCCCC
- Page 140 and 141:
RNA operons and promoter analysis 3
- Page 142 and 143:
P2 -10 -35 UP P1 -10 -35 UP FIS FIS
- Page 144 and 145:
1 rRNA operons and promoter analysi
- Page 146 and 147:
Using HMMs also simplifies the use
- Page 148 and 149:
Information content Information con
- Page 150 and 151:
of the annotation. Some of the majo
- Page 152 and 153:
where match states stop around 10 c
- Page 154 and 155:
1 rRNA operons and promoter analysi
- Page 156 and 157:
synthesis in flow cells to simultan
- Page 158 and 159:
Read absence. A boolean where ‘on
- Page 160 and 161:
Hallin, et al. Figure 4 | The dataf
- Page 162 and 163:
Genome homology: Comparing multiple
- Page 164 and 165:
ing platform-‐independent Java
- Page 166 and 167:
34. Wang H, Noordewier M, Benham CJ
- Page 168 and 169:
Chapter 4 Web Services and Interope
- Page 170 and 171:
Web Services and Interoperability i
- Page 172 and 173:
Web Services and Interoperability i
- Page 174 and 175:
Web Services and Interoperability i
- Page 176 and 177:
Web Services and Interoperability i
- Page 178 and 179:
Chapter 5 Conclusion and perspectiv
- Page 180 and 181:
Appendix A Appendix: Workshops, tea
- Page 182 and 183:
Appendix B Appendix: Ph.D. study pl
- Page 184 and 185:
Danmarks Tekniske Universitet AFI,
- Page 186 and 187:
Danmarks Tekniske Universitet AFI,
- Page 188 and 189:
Appendix C Appendix: Courses C.1 Gl
- Page 190 and 191:
D.2 Sample output from queryGenomes
- Page 192 and 193:
Appendix: Software 13 w a r n " $ o
- Page 194 and 195:
Appendix: Software 109 m y ( $ m i
- Page 196 and 197:
Appendix: Software 25 [ ] A l t e r
- Page 198 and 199:
BIBLIOGRAPHY J. Rogers, P. F. Stadl
- Page 200 and 201:
BIBLIOGRAPHY Q. Jin, Z. Yuan, J. Xu
- Page 202 and 203:
BIBLIOGRAPHY Velicer, F.-J. Vorholt