02.02.2015 Views

Boletim do resumo e programas (XIV EMR 2015)

A Escola de Modelos de Regressão (EMR) é um evento científico na área de Estatística, de repercussão nacional, realizado com o patrocínio da Associação Brasileira de Estatística (ABE) que, em 2015, se encontrará em sua 14ª edição.

A Escola de Modelos de Regressão (EMR) é um evento científico na área de Estatística, de repercussão nacional, realizado com o patrocínio da Associação Brasileira de Estatística (ABE) que, em 2015, se encontrará em sua 14ª edição.

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>XIV</strong> Escola de Modelos de Regressão<br />

De 2 a 5 de Março de <strong>2015</strong> - Centro de Convenções - Unicamp<br />

22<br />

Sessão Temática 1 ( ST1.2): Big Data<br />

Testing Association without Calling Genotypes Allows for Systematic<br />

Differences in Read Depth and Sequencing Error Rate<br />

Between Data from Case and Control Participants<br />

Glen Satten<br />

Centers for Disease Control and Prevention,<br />

Atlanta, USA<br />

Resumo: The quality of genotype calling for next-generation sequence data<br />

depends on read depth. Loci with high coverage can typically be reliably called,<br />

while those with low coverage may be difficult to call. In a case-control study, if data<br />

from case participants is sequenced to a greater depth than data from controls, the<br />

difference in genotype quality can introduce a systematic bias. This can easily occur<br />

when historical controls (e.g., data from the 1000 Genomes Project) are used. This<br />

imbalance may also occur by design, to reduce genotyping costs among controls.<br />

For trios, bias can arise even when the coverage is the same in parents and offspring<br />

since errors in parental genotype calls are considered non-transmissions while<br />

errors in offspring genotype calls are detected as non-Mendelian transmissions.<br />

Methods: We develop likelihood-based methods for analyzing data from casecontrol<br />

and trio studies that directly uses data on reads without first making<br />

intermediate genotype calls. When the location of polymorphic loci is known, we<br />

show these likelihood approaches have appropriate size and good power compared<br />

with methods that use called genotypes. When the locations of polymorphic loci are<br />

not known in advance, we develop screening methods to screen out loci that are<br />

estimated to be monomorphic, based on read data alone. We use a bootstrap<br />

approach to estimate which of the loci that screen in are truly polymorphic. Using<br />

these estimates, we then construct bootstrap tests for association that properly<br />

account for screening and preserve size. We further show that restricting to loci with<br />

estimated allele frequency ≥ 1/2N, so that the expected number of alleles seen is<br />

greater than one, increases the power of our approach by excluding loci that have<br />

negligible effect.<br />

Results: We illustrate our approach using data from the UK10K project. We use data<br />

from 784 cases from the Severe Childhood Onset Obesity Project, and are exome<br />

sequenced at 60x. Data for 1702 controls are from the Avon Longitudinal Study of<br />

Parents and Children and the TwinsUK study (only one twin used), and are whole<br />

genome sequenced at 6x coverage.<br />

: emrxiv@gmail.com<br />

: http://www.ime.usp.br/~abe/emr<strong>2015</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!