13.07.2015 Views

Tutorial: Using BWA aligner to identify low-coverage genomes in ...

Tutorial: Using BWA aligner to identify low-coverage genomes in ...

Tutorial: Using BWA aligner to identify low-coverage genomes in ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4901664 Lep<strong>to</strong>thrix_cholodnii_SP-61634756 Methanocaldococcus_jannaschii_DSM_26611770072 Methanococcus_maripaludis_C51650606 Methanococcus_maripaludis_stra<strong>in</strong>_S2,_complete_sequence459768 Nanoarchaeum_equitans_K<strong>in</strong>4-M2798863 Nitrosomonas_europaea_ATCC_197186409848 Nos<strong>to</strong>c_sp._PCC_7120_DNA3017983 Pelodictyon_phaeoclathratiforme_BU-11929147 Persephonella_mar<strong>in</strong>a_EX-H12354658 Porphyromonas_g<strong>in</strong>givalis_ATCC_33277_DNA2219120 Pyrobaculum_aerophilum_str._IM21998126 Pyrobaculum_calidifontis_JCM_115481738110 Pyrococcus_horikoshii_OT3_DNA7145136 Rhodopirellula_baltica_SH_1_complete_genome4106385 Ruegeria_pomeroyi_DSS-35762894 Sal<strong>in</strong>ispora_arenicola_CNS-2055168083 Sal<strong>in</strong>ispora_tropica_CNB-4405229615 Shewanella_baltica_OS1855145817 Shewanella_baltica_OS223,3478897 Sulfi<strong>to</strong>bacter_NAS-14.1_scf_1099451320477_3029105 Sulfi<strong>to</strong>bacter_sp._EE-36_scf_1099451318008_2666279 Sulfolobus_<strong>to</strong>kodaii1520314 Sulfurihydrogenibium_yel<strong>low</strong>s<strong>to</strong>nense_SS-51828687 SulfuriYO3AOP12361532 Thermoanaerobacter_pseudethanolicus_ATCC_332231884531 Thermo<strong>to</strong>ga_neapolitana_DSM_43591823470 Thermo<strong>to</strong>ga_petrophila_RKU-11877693 Thermo<strong>to</strong>ga_sp._RQ22829658 Treponema_denticola_ATCC_354052274638 Treponema_v<strong>in</strong>centii_ATCC_35580_NZACYH00000000.12052189 Zymomonas_mobilis_subsp._mobilis_ZM4Step 11: We will now use the above file with lengths.genome <strong>to</strong> calculate the proportions us<strong>in</strong>g the fol<strong>low</strong><strong>in</strong>g onel<strong>in</strong>er.It reads lengths.genome l<strong>in</strong>e by l<strong>in</strong>e, assigns the genome identifier <strong>to</strong> myArray[0] and it's length <strong>to</strong>


myArray[1]. It then searches the identifier <strong>in</strong> aln-pe.mapped.bam.perbase.count, extracts the base count, anduses bc <strong>to</strong> calculate the proportion.[uzi@qu<strong>in</strong>ce-srv2 ~/TSB_METAGENOMES/prob_<strong>genomes</strong>]$ while IFS=$'\t' read -r -a myArray; do echo -e "${myArray[0]},$(echo "scale=5;"$(awk -v pattern="${myArray[0]}" '$2==pattern{pr<strong>in</strong>t $1}' alnpe.mapped.bam.perbase.count)"/"${myArray[1]}| bc ) "; done < lengths.genome > aln-pe.mapped.bam.genomeproportionStep 12: Some of the reference <strong>genomes</strong> were downloaded from NCBI. We will use a file IDs_meta<strong>genomes</strong>2.txtthat conta<strong>in</strong>s the mean<strong>in</strong>gful mapp<strong>in</strong>g from accession numbers <strong>to</strong> genome names.[uzi@qu<strong>in</strong>ce-srv2 ~/TSB_METAGENOMES/prob_<strong>genomes</strong>]$ cat IDs_meta<strong>genomes</strong>2.txtgi|220903286|ref|NC_011883.1|,Desulfovibrio_desulfuricansgi|222528057|ref|NC_012034.1|,Caldicellulosirup<strong>to</strong>r_besciigi|307128764|ref|NC_014500.1|,Dickeya_dadantiigi|55979969|ref|NC_006461.1|,Thermus_thermophilusgi|83591340|ref|NC_007643.1|,Rhodospirillum_rubrumStep 13: Download my annotation script (http://userweb.eng.gla.ac.uk/umer.ijaz/bio<strong>in</strong>formatics/convIDs.pl) andthen use IDs_meta<strong>genomes</strong>2.txt <strong>to</strong> annotate column 1.[uzi@qu<strong>in</strong>ce-srv2 ~/TSB_METAGENOMES/prob_<strong>genomes</strong>]$ perl convIDs.pl -i aln-pe.mapped.bam.genomeproportion -lIDs_meta<strong>genomes</strong>2.txt -c 1 -t comma > aln-pe.mapped.bam.genomeproportion.annotatedStep 14: We have a <strong>to</strong>tal of 59 <strong>genomes</strong> <strong>in</strong> the reference database. To see how many <strong>genomes</strong> we recovered, wewill use the fol<strong>low</strong><strong>in</strong>g one-l<strong>in</strong>er:[uzi@qu<strong>in</strong>ce-srv2 ~/TSB_METAGENOMES/prob_<strong>genomes</strong>]$ awk -F "," '{sum+=$NF} END{pr<strong>in</strong>t "Total <strong>genomes</strong> covered:"sum}'aln-pe.mapped.bam.genomeproportionTotal <strong>genomes</strong> covered:55.8055Step 15: Now we will <strong>identify</strong> those <strong>genomes</strong> for which the proportions are less than 0.99.


De<strong>in</strong>ococcus_radiodurans_R1_chromosome_1,_complete_sequence,.99601Desulfovibrio_desulfuricans,.02709Desulfovibrio_piger_ATCC_29098,.98358Dickeya_dadantii,.99943Dictyoglomus_turgidum_DSM_6724,.98030Enterococcus_faecalis_V583,.78333Fusobacterium_nucleatum_subsp._nucleatum_ATCC_25586,.67088Gemmatimonas_aurantiaca_T-27_DNA,.99981Herpe<strong>to</strong>siphon_aurantiacus_ATCC_23779,.99980Hydrogenobaculum_sp._Y04AAS1,.99961Ignicoccus_hospitalis_KIN4/I,.98534Lep<strong>to</strong>thrix_cholodnii_SP-6,.99842Methanocaldococcus_jannaschii_DSM_2661,.98185Methanococcus_maripaludis_C5,.99399Methanococcus_maripaludis_stra<strong>in</strong>_S2,_complete_sequence,.99366Nanoarchaeum_equitans_K<strong>in</strong>4-M,.93661Nitrosomonas_europaea_ATCC_19718,.99529Nos<strong>to</strong>c_sp._PCC_7120_DNA,.99938Pelodictyon_phaeoclathratiforme_BU-1,.99991Persephonella_mar<strong>in</strong>a_EX-H1,.99941Porphyromonas_g<strong>in</strong>givalis_ATCC_33277_DNA,.99990Pyrobaculum_aerophilum_str._IM2,.99851Pyrobaculum_calidifontis_JCM_11548,.99443Pyrococcus_horikoshii_OT3_DNA,.99977Rhodopirellula_baltica_SH_1_complete_genome,.99993Rhodospirillum_rubrum,.00181Ruegeria_pomeroyi_DSS-3,.99925Sal<strong>in</strong>ispora_arenicola_CNS-205,.99594Sal<strong>in</strong>ispora_tropica_CNB-440,.99705Shewanella_baltica_OS185,.99998Shewanella_baltica_OS223,,.99998Sulfi<strong>to</strong>bacter_NAS-14.1_scf_1099451320477_,.99697Sulfi<strong>to</strong>bacter_sp._EE-36_scf_1099451318008_,.99921Sulfolobus_<strong>to</strong>kodaii,.98943Sulfurihydrogenibium_yel<strong>low</strong>s<strong>to</strong>nense_SS-5,.99087SulfuriYO3AOP1,.99469


Thermoanaerobacter_pseudethanolicus_ATCC_33223,.99945Thermo<strong>to</strong>ga_neapolitana_DSM_4359,.99998Thermo<strong>to</strong>ga_petrophila_RKU-1,.99997Thermo<strong>to</strong>ga_sp._RQ2,1.00000Thermus_thermophilus,.94016Treponema_denticola_ATCC_35405,.99523Treponema_v<strong>in</strong>centii_ATCC_35580_NZACYH00000000.1,.90457Zymomonas_mobilis_subsp._mobilis_ZM4,.99797

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!