Tutorial: Using BWA aligner to identify low-coverage genomes in ...
Tutorial: Using BWA aligner to identify low-coverage genomes in ...
Tutorial: Using BWA aligner to identify low-coverage genomes in ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4901664 Lep<strong>to</strong>thrix_cholodnii_SP-61634756 Methanocaldococcus_jannaschii_DSM_26611770072 Methanococcus_maripaludis_C51650606 Methanococcus_maripaludis_stra<strong>in</strong>_S2,_complete_sequence459768 Nanoarchaeum_equitans_K<strong>in</strong>4-M2798863 Nitrosomonas_europaea_ATCC_197186409848 Nos<strong>to</strong>c_sp._PCC_7120_DNA3017983 Pelodictyon_phaeoclathratiforme_BU-11929147 Persephonella_mar<strong>in</strong>a_EX-H12354658 Porphyromonas_g<strong>in</strong>givalis_ATCC_33277_DNA2219120 Pyrobaculum_aerophilum_str._IM21998126 Pyrobaculum_calidifontis_JCM_115481738110 Pyrococcus_horikoshii_OT3_DNA7145136 Rhodopirellula_baltica_SH_1_complete_genome4106385 Ruegeria_pomeroyi_DSS-35762894 Sal<strong>in</strong>ispora_arenicola_CNS-2055168083 Sal<strong>in</strong>ispora_tropica_CNB-4405229615 Shewanella_baltica_OS1855145817 Shewanella_baltica_OS223,3478897 Sulfi<strong>to</strong>bacter_NAS-14.1_scf_1099451320477_3029105 Sulfi<strong>to</strong>bacter_sp._EE-36_scf_1099451318008_2666279 Sulfolobus_<strong>to</strong>kodaii1520314 Sulfurihydrogenibium_yel<strong>low</strong>s<strong>to</strong>nense_SS-51828687 SulfuriYO3AOP12361532 Thermoanaerobacter_pseudethanolicus_ATCC_332231884531 Thermo<strong>to</strong>ga_neapolitana_DSM_43591823470 Thermo<strong>to</strong>ga_petrophila_RKU-11877693 Thermo<strong>to</strong>ga_sp._RQ22829658 Treponema_denticola_ATCC_354052274638 Treponema_v<strong>in</strong>centii_ATCC_35580_NZACYH00000000.12052189 Zymomonas_mobilis_subsp._mobilis_ZM4Step 11: We will now use the above file with lengths.genome <strong>to</strong> calculate the proportions us<strong>in</strong>g the fol<strong>low</strong><strong>in</strong>g onel<strong>in</strong>er.It reads lengths.genome l<strong>in</strong>e by l<strong>in</strong>e, assigns the genome identifier <strong>to</strong> myArray[0] and it's length <strong>to</strong>
myArray[1]. It then searches the identifier <strong>in</strong> aln-pe.mapped.bam.perbase.count, extracts the base count, anduses bc <strong>to</strong> calculate the proportion.[uzi@qu<strong>in</strong>ce-srv2 ~/TSB_METAGENOMES/prob_<strong>genomes</strong>]$ while IFS=$'\t' read -r -a myArray; do echo -e "${myArray[0]},$(echo "scale=5;"$(awk -v pattern="${myArray[0]}" '$2==pattern{pr<strong>in</strong>t $1}' alnpe.mapped.bam.perbase.count)"/"${myArray[1]}| bc ) "; done < lengths.genome > aln-pe.mapped.bam.genomeproportionStep 12: Some of the reference <strong>genomes</strong> were downloaded from NCBI. We will use a file IDs_meta<strong>genomes</strong>2.txtthat conta<strong>in</strong>s the mean<strong>in</strong>gful mapp<strong>in</strong>g from accession numbers <strong>to</strong> genome names.[uzi@qu<strong>in</strong>ce-srv2 ~/TSB_METAGENOMES/prob_<strong>genomes</strong>]$ cat IDs_meta<strong>genomes</strong>2.txtgi|220903286|ref|NC_011883.1|,Desulfovibrio_desulfuricansgi|222528057|ref|NC_012034.1|,Caldicellulosirup<strong>to</strong>r_besciigi|307128764|ref|NC_014500.1|,Dickeya_dadantiigi|55979969|ref|NC_006461.1|,Thermus_thermophilusgi|83591340|ref|NC_007643.1|,Rhodospirillum_rubrumStep 13: Download my annotation script (http://userweb.eng.gla.ac.uk/umer.ijaz/bio<strong>in</strong>formatics/convIDs.pl) andthen use IDs_meta<strong>genomes</strong>2.txt <strong>to</strong> annotate column 1.[uzi@qu<strong>in</strong>ce-srv2 ~/TSB_METAGENOMES/prob_<strong>genomes</strong>]$ perl convIDs.pl -i aln-pe.mapped.bam.genomeproportion -lIDs_meta<strong>genomes</strong>2.txt -c 1 -t comma > aln-pe.mapped.bam.genomeproportion.annotatedStep 14: We have a <strong>to</strong>tal of 59 <strong>genomes</strong> <strong>in</strong> the reference database. To see how many <strong>genomes</strong> we recovered, wewill use the fol<strong>low</strong><strong>in</strong>g one-l<strong>in</strong>er:[uzi@qu<strong>in</strong>ce-srv2 ~/TSB_METAGENOMES/prob_<strong>genomes</strong>]$ awk -F "," '{sum+=$NF} END{pr<strong>in</strong>t "Total <strong>genomes</strong> covered:"sum}'aln-pe.mapped.bam.genomeproportionTotal <strong>genomes</strong> covered:55.8055Step 15: Now we will <strong>identify</strong> those <strong>genomes</strong> for which the proportions are less than 0.99.
De<strong>in</strong>ococcus_radiodurans_R1_chromosome_1,_complete_sequence,.99601Desulfovibrio_desulfuricans,.02709Desulfovibrio_piger_ATCC_29098,.98358Dickeya_dadantii,.99943Dictyoglomus_turgidum_DSM_6724,.98030Enterococcus_faecalis_V583,.78333Fusobacterium_nucleatum_subsp._nucleatum_ATCC_25586,.67088Gemmatimonas_aurantiaca_T-27_DNA,.99981Herpe<strong>to</strong>siphon_aurantiacus_ATCC_23779,.99980Hydrogenobaculum_sp._Y04AAS1,.99961Ignicoccus_hospitalis_KIN4/I,.98534Lep<strong>to</strong>thrix_cholodnii_SP-6,.99842Methanocaldococcus_jannaschii_DSM_2661,.98185Methanococcus_maripaludis_C5,.99399Methanococcus_maripaludis_stra<strong>in</strong>_S2,_complete_sequence,.99366Nanoarchaeum_equitans_K<strong>in</strong>4-M,.93661Nitrosomonas_europaea_ATCC_19718,.99529Nos<strong>to</strong>c_sp._PCC_7120_DNA,.99938Pelodictyon_phaeoclathratiforme_BU-1,.99991Persephonella_mar<strong>in</strong>a_EX-H1,.99941Porphyromonas_g<strong>in</strong>givalis_ATCC_33277_DNA,.99990Pyrobaculum_aerophilum_str._IM2,.99851Pyrobaculum_calidifontis_JCM_11548,.99443Pyrococcus_horikoshii_OT3_DNA,.99977Rhodopirellula_baltica_SH_1_complete_genome,.99993Rhodospirillum_rubrum,.00181Ruegeria_pomeroyi_DSS-3,.99925Sal<strong>in</strong>ispora_arenicola_CNS-205,.99594Sal<strong>in</strong>ispora_tropica_CNB-440,.99705Shewanella_baltica_OS185,.99998Shewanella_baltica_OS223,,.99998Sulfi<strong>to</strong>bacter_NAS-14.1_scf_1099451320477_,.99697Sulfi<strong>to</strong>bacter_sp._EE-36_scf_1099451318008_,.99921Sulfolobus_<strong>to</strong>kodaii,.98943Sulfurihydrogenibium_yel<strong>low</strong>s<strong>to</strong>nense_SS-5,.99087SulfuriYO3AOP1,.99469
Thermoanaerobacter_pseudethanolicus_ATCC_33223,.99945Thermo<strong>to</strong>ga_neapolitana_DSM_4359,.99998Thermo<strong>to</strong>ga_petrophila_RKU-1,.99997Thermo<strong>to</strong>ga_sp._RQ2,1.00000Thermus_thermophilus,.94016Treponema_denticola_ATCC_35405,.99523Treponema_v<strong>in</strong>centii_ATCC_35580_NZACYH00000000.1,.90457Zymomonas_mobilis_subsp._mobilis_ZM4,.99797