12.07.2015 Views

DNA for Genealogists (Intro) - Rcasey.net

DNA for Genealogists (Intro) - Rcasey.net

DNA for Genealogists (Intro) - Rcasey.net

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Intro</strong> to <strong>DNA</strong> <strong>for</strong> <strong>Genealogists</strong>Type of tests usedHow common tests workCompanies that offer testsHow does <strong>DNA</strong> get analyzedAutosomal – best fit <strong>for</strong> post-1850 researchY-<strong>DNA</strong> – best fit <strong>for</strong> pre-1850 researchBooks to get up to speedForums, <strong>DNA</strong> projects, etc.Costs of testingFuture Trends in <strong>DNA</strong> testingAugust, 2012 2


Types of tests availableY-<strong>DNA</strong> – uses both Y-STR tests and Y-SNP tests- Limited to all male lines only- 100 to 1,000 years (Y-STRs)- 500 to 50,000+ years (Y-SNPs)Autosomal – 50 % change every generation- Tests all ancestral lines- Limited to 25 to 150 yearsMitochondrial (mt<strong>DNA</strong>)- Limited to all female lines only- 1,000 to 50,000+ years- No fast mutating STRs to complementAugust, 2012 3


How Autosomal worksCovers all ancestral lines but is limited to 100 to 150years in accuracy (reliable <strong>for</strong> 4 or 5 generations)Each generation has 50 % change resulting inshorter and fewer common segmentsRequires multiple tests to assign matching segmentsto various ancestral linesWorks great <strong>for</strong> recent adoptions, breaking recentbrick walls or just starting out with genealogyNot reliable beyond 200 years where most brickwalls existA few random ancestral matches can be found at sixand seven generationsAugust, 2012 5


How Autosomal works(Recombination at work)Parents – 50 % - 1950 (all)Grandparents – 25 % 1925 (all)Great grandparents – 12.5 % - 1900 (all)2G grandparents – 6.3 % - 1870 (90 %)3G grandparents – 3.1 % - 1850 (80 %)4G grandparents – 1.4 % - 1825 (20 %)5G grandparents – 0.8 % - 1800 (5 %)August, 2012 6


GEDMATCH Database - at<strong>DNA</strong>Attempts to merge two major companies massiveat<strong>DNA</strong> databasesGives a combined view database of bothFT<strong>DNA</strong> and 23andme submissionsAncestry.com does not allow access toraw data – do not order their productUn<strong>for</strong>tunately, it is a volunteer downloadfrom each company, so coverage will be inconsistentAlso has leading edge analysis tools as wellSimilar to Y-Search as public repository <strong>for</strong> at<strong>DNA</strong>testsMany test twice to get full access to both databasesAugust, 2012 7


How Y-STR worksOnly works with all male linesRelatively faster mutating <strong>DNA</strong> that matchesgenealogical time frame (100 to 1,000 years)Great <strong>for</strong> answering yes / no / maybe relationshipsTakes a lot of submissions to build a ge<strong>net</strong>ic clusterand determine relationshipsGenerates clusters of related lines but does not showhow lines are connectedOverlapping haplotyes sometimes makes itimpossible to assign to only one ge<strong>net</strong>ic clusterY-SNPs greatly complement some of the shortfalls ofthe Y-STRs by themselvesAugust, 2012 8


How Y-SNPs workDefines ge<strong>net</strong>ic branches between 500 and 5,000+ yearsWith 58,000,000 base pairs <strong>for</strong> possible Y-SNP testing, less thanone percent of Y-SNPs have been discovered / analyzed to dateSeveral new Y-SNPs being discovered every weekY-SNPs create father / son relationships that reveal exactgenealogical relationships between Y-SNPsMutations between Y-SNP and genealogical cluster definefingerprints which show common mutationsMany unrelated Y-STRs have such common marker values thatclose matches are not even related (Y-SNPs help break up theseclusters of unrelated Y-STR submissions)Y-SNPs will grow from 500 to 50,000 where future Y-SNPs willcreate thousands of branches within genealogical clustersAugust, 2012 9


Four primary testing companiesFamily Tree <strong>DNA</strong> provides best overall value with most offerings,largest database and leading edge testing23andme has strong autosomal test and useful Y-SNP test but lackscritical Y-STR testing and advanced Y-SNP testingAncestry.com offers reasonable entry level Y-STR tests but has no Y-SNP testing or high resolution Y-STR testingAncestry.com is beta testing autosomal but will not release actual rawdata – do not order this predatory offeringFamily Tree <strong>DNA</strong> offers unbelievable Y-SNP testing that will eventuallybecome the primary tool <strong>for</strong> future genealogical researchAll three companies offer robust mt<strong>DNA</strong> test but only FT<strong>DNA</strong> offers fullmt<strong>DNA</strong> test (do not recommend testing of mt<strong>DNA</strong> from any company)National Geographic and FT<strong>DNA</strong> recently announced NatGeo 2.0 test(orders being taken – results due in the fall) includes static test ofmassive Y-SNPs, extensive mt<strong>DNA</strong> and limited ethnic autosomal.August, 2012 10


How do Y-STRs workOnly found on Y-<strong>DNA</strong> chromosomeY-STRs are where patterns repeat many times andthe number of repeats vary generation to generationTesting companies scan the Y-<strong>DNA</strong> until they find thelandmark indicating they have arrived at the Y-STRFrom that landmark, they then know how to locatethe repeating patterns and count the number ofrepeats (Short Tandem Repeats)The Y-STR values (numbers of repeats) vary overtime allowing genealogists to track ancestorsAugust, 2012 11


How do Y-SNPs workOnly found on Y chromosomeMost are one time mutationsDiscovers branches from 500 to 50,000 yearsUnlike Y-STRs, have a very hierarchical relationships(father / son relationships)Create true genealogical like descendant treeOnce you find your most recent Y-SNP (usually 500to 2,000 years old) Y-STRs complement Y-SNPs <strong>for</strong>more recent mutationsRecent explosion from 2,000 to 10,000 Y-SNPs, manymore to be discovered in the futureAugust, 2012 12


How Autosomal tests worksCovers all ancestral lines – but limited to post-1850Comes from most chromosomes but not Y-<strong>DNA</strong>Recombines 50 % from each parent every generationEach recombination results in long segments ofcommon <strong>DNA</strong> that is partially passed to childrenSegments get shorter and shorter every generationuntil no longer reliable <strong>for</strong> identification purposesMultiple tests of close relatives required to sort outwhich segments belong to which lineThe total amount of longer segments can estimatethe degree of relationship (3 rd cousin, once removed)August, 2012 13


How mt<strong>DNA</strong> worksMother passes to daughter over many generations(also passes to sons but sons can not pass it on)Over time, mutations occur that allows us to build aall female descendant treeCan answer questions about no / maybe relationshipsIf you do not share a common mutation that is 3,000years old, you obviously are not related in the last300 yearsIf you do share a common mutation that is 3,000years old, supports connection at 300 yearsThe more rare the mutation and the more recent themutation, the more support there is <strong>for</strong> a connectionAugust, 2012 14


Books to get up to speed withTrace Your Roots with <strong>DNA</strong> by Megan Smolenyak &Ann Turner, 2004 solid book – not real deep<strong>DNA</strong> & Genealogy by Colleen Fitzpatrick & AndrewYeiser, 2005 solid book – little more depth<strong>DNA</strong> & Social Networking: A Guide to Genealogy byDebbie Ken<strong>net</strong>t, 2012 - updated (includes autosomal)Family History in the Genes by Chris Pomeroy, 2007 –by far the best on Y-<strong>DNA</strong> – more in depth & complex<strong>DNA</strong> and Family History by Chris Pomeroy & SteveJones, 2004 – both versions worth gettingGo to Amazon.com and search <strong>DNA</strong> & genealogyAugust, 2012 15


Forums, <strong>DNA</strong> ProjectsUnbelievable amount of excellent <strong>DNA</strong> <strong>for</strong>ums & projectsLook at Surname projects – many are excellentJoin Surname project – key to having your <strong>DNA</strong> analyzed byexpertsSkill levels of <strong>for</strong>ums and projects vary a lot, be prepared <strong>for</strong>minimal support from some – specially less common surnamesExpect minimal assistance from testing companies98 % of the analysis is done by amateur researchers and someare extremely skilled (as much as testing companies)Higher skills in Y-SNP projects (but most are biased towardsanthological research but trend is changing)Be respectful of volunteers who help – sugar works better thanvinegar – you should also test as recommendedAugust, 2012 16


Cost of Testing (Retail - FT<strong>DNA</strong> only)Always join project be<strong>for</strong>e ordering to get theproject discounts & use twice a year salesFamily Finder (autosomal) - $289NatGeo 2.0 (Y-SNPs, mt<strong>DNA</strong>, autosomal)-$199 – only from National Geographicextremely robust test – leading edgeY-STR - $169 (37), $268 (67) & $359 (111)Full mt<strong>DNA</strong> - $299, partial $159Special order Y-SNPs - $29 eachWalk the Y - $950August, 2012 17


Cost of testing – other companies23andme – “one size fits all” test - $299 <strong>for</strong> health, autosomaland limited Y-SNPs (no Y-STRs) – first with autosomal23andme - good starting point but many end up migrating toFT<strong>DNA</strong> <strong>for</strong> more Y-<strong>DNA</strong> testing – health markers are uniqueAncestry.com – Y-STR $149 (33) & $179 (46) – No Y-SNPsAncestry.com – 46 marker is good starting test but manymigrate to FT<strong>DNA</strong> <strong>for</strong> more Y-STR & Y-SNP testingAncestry.com – Autosomal $99 to existing customers(no raw data will be provided & scope of test not revealeddo not order – very predatory offering)Ancestry.com – Partial mt<strong>DNA</strong> $179(NatGeo 2.0 better offering)August, 2012 18


Future trendsY-SNP tests will become the most important testsNatGeo 2.0 increased static Y-SNP test from 500 to12,000 Y-SNPs (a lot to absorb)Potential <strong>for</strong> 10,000s of genealogical Y-SNPs &estimates show that 50,000 or more are possibleAutosomal test good <strong>for</strong> post-1850 brick walls butrequires a lot of tests to triangulateFull mt<strong>DNA</strong> is only discovery test <strong>for</strong> all female line(only available from FT<strong>DNA</strong>) – minimal genealogicalapplicabilityAugust, 2012 19


Sample at<strong>DNA</strong> comparisonAugust, 2012 20


War stories – at<strong>DNA</strong> <strong>for</strong> CaseyCommon ancestor believed to be born around 1700(way too early <strong>for</strong> at<strong>DNA</strong>)However, common segments found between all 12 Caseyrelated testsIt is very frustrating that each pair has different segments(not as expected when they descend from the same ancestor)However, does imply that all are probably part of the South CarolinaCasey ge<strong>net</strong>ic cluster (Y-<strong>DNA</strong> cluster that several have tested positive<strong>for</strong> via Y-<strong>DNA</strong> testing)Half of the at<strong>DNA</strong> submissions did not know any male Caseyancestor – only the maiden name of Casey female ancestorWith no Casey males (or Casey descendant males to test), no furtherY-<strong>DNA</strong> research can be conductedMost distant Casey lines showed only modest interest in Casey linesince it was not one of their primary linesAugust, 2012 21


War stories - Y<strong>DNA</strong> provides answersTwo different Pace lines claimed same ancestorEach had different wives, overlapping children and residencesBoth claimed same person who was proven back to JamestownAt least 20 books claimed the same man (including my book)The Jamestown line was connected to a part of London<strong>DNA</strong> proved both lines could not be closely relatedTwo random submissions solved the mysteryOne submission that still lived in London where the Jamestownline resided - traced back <strong>for</strong> five generations within one mileOne submission from Canada traced back to rural England withsupporting parish records and matched the second Pace lineAugust, 2012 22


War stories - Y<strong>DNA</strong> provides answersTwo men named Jordan Brooks resided in common counties andneighboring counties - at least five different areas (GA & AL)Both lines had very similar given namesBoth lines borrowed from each other due to similar residencesMany publications actually turned speculation into firm connectionson the Inter<strong>net</strong> databasesMale descendants were located from each line and both submitted <strong>DNA</strong><strong>for</strong> comparisonFT<strong>DNA</strong> MRCA states that there is less than 1 of 10,000 chance oflines being related in the last 600 yearsGe<strong>net</strong>ically proved these lines are not related as once believedHowever, one line was very closely related to third different lineAugust, 2012 23


War stories - Y<strong>DNA</strong> provides answersThere are believed to be around 40 different Casey linesresiding in four neighboring SC counties from 1760s to 1820sMany years of research has been unable to make muchprogress in tying these Casey lines together<strong>DNA</strong> has proven that about 12 Casey lines are very closelyrelated (and <strong>DNA</strong> contains extremely unique marker values)<strong>DNA</strong> has proven that one Casey line is not closely related<strong>DNA</strong> has allowed the most probable <strong>DNA</strong> Descendancy Chart(connections of these lines based on <strong>DNA</strong> in<strong>for</strong>mation)Around one half of the submissions are part one branch and theremaining half are part of second branchAuthor of 600 page Hanvey book found out that his Hanvey lineis actually a Casey line ge<strong>net</strong>ically and was found not to berelated to their Hanvey ge<strong>net</strong>icallyAugust, 2012 24


War stories - Y<strong>DNA</strong> provides answersWas contacted by Butler submission that had a known NPEevent in the 1850sThis Butler line was an out of wedlock birth of an unknown malewhich would normally be very difficult to make any progress<strong>DNA</strong> showed a match with a Brooks cluster (I am co-admin <strong>for</strong>the Brooks surname as well)This Brooks cluster is has many submissions and very activelyresearched and included possible NPE lines with similar <strong>DNA</strong>The Butler fingerprint closely aligned with a Bradberry NPE lineA Butler sister married a Bradberry and a Bradberry in 1860census just a few households away was one of the BradberryNPEs that had been testedThe conclusion is that the father of Butler boy was very likely aBradberry – but it could be one of many Bradberry malesAugust, 2012 25


War stories - Y<strong>DNA</strong> provides answersMy mother’s line, Brooks, has many genealogical anomalieswhere the two oldest sons can not be confirmed as they werenot included is extensive probate recordsHowever, these two unproven sons were found living in thesame household via personal property tax lists and thesupposed father signed the marriage bond <strong>for</strong> one sonThere were many Wade families residing in the same area andunproven family history stated the oldest ancestor married aWade and this couple also named a son named Wade<strong>DNA</strong> then showed that the very unique <strong>DNA</strong> <strong>for</strong> Brookssubmissions was a common <strong>DNA</strong> pattern <strong>for</strong> many WadesubmissionsThe conclusion is that this oldest male ancestor may havemarried a woman who was previously married to a Wade andthat the two oldest sons may have been in<strong>for</strong>mally adoptedAugust, 2012 26


Questions & AnswersIt takes a while to get up to speed. Ge<strong>net</strong>ic <strong>DNA</strong>takes as many skills as traditional genealogyToo high expectations by manyJust get started – be sure to have well defined goalsso you can later assess if you met those goalsA lot of willing volunteers to assist – make it a twoway interchange (test what they recommend)Be<strong>for</strong>e we talk about a couple of advanced topics orfuture trends, it is time <strong>for</strong> questions & answersAugust, 2012 27


Overlapping HaplotypesThis issue is rarely covered by books & web sitesSome haplotypes contain such common <strong>DNA</strong> markervalues that even very close matches may not relatedAs much of 10 to 20 percent of all submissions fall into the“overlapping haplotype” scenarioGe<strong>net</strong>ic distance (the number of mutations that are different) isnot always reliable by itselfYou want to categorize non-surname matches into twocategories: “overlapping haplotypes” or “possible NPEs”Overlapping haplotypes need to be filtered out by Y-SNP testsNPEs can be a new gold mine of genealogical treasuresThere are methodologies <strong>for</strong> determining the category(too advanced <strong>for</strong> this session)http://www.rcasey.<strong>net</strong>/<strong>DNA</strong>/Casey/Sources/Overlapping_Haplotypes_Jobling_2000.pdfhttp://www.rcasey.<strong>net</strong>/<strong>DNA</strong>/Casey/Analysis/Analysis_South_Carolina.htmlAugust, 2012 28


This issue is rarely covered by books & web sitesSome haplotypes contain such common <strong>DNA</strong> markervalues that even very close matches may not relatedAs much of 10 to 20 percent of all submissions fall into the“overlapping haplotype” scenarioGe<strong>net</strong>ic distance (the number of mutations that are different) isnot always reliable by itselfYou want to categorize non-surname matches into twocategories: “overlapping haplotypes” or “possible NPEs”Overlapping haplotypes need to be filtered out by Y-SNP testsNPEs can be a new gold mine of genealogical treasuresSignificant confusion on distinction of two categoriesThere are methodologies <strong>for</strong> determining the category(too advanced <strong>for</strong> this session)August, 2012 29


Fingerprints are keyMost analysis of Y-STRs depends too much only onge<strong>net</strong>ic distance (number mutations that differ)The common mutations are a much better indicatorof relatednessDetermine the haplotype of your Y-SNP anddetermine the fingerprint of your ge<strong>net</strong>ic cluster (themutations between the Y-SNP and your cluster)Combination of Y-SNP, fingerprint matches, ge<strong>net</strong>icdistance and surname are a power combination ofin<strong>for</strong>mation that must be used in any analysisAugust, 2012 30


August, 2012 31


The futureThe costs of testing of the full genome will be under $1,000 in the nextfew years – so you never have test any donor again just analyzeThe amount of useful data will increase by 1,000,000 fold !at<strong>DNA</strong> is currently under 1,000,000 base pairs – it could be extended to10M or 100M base pairs – but the usefulness of the in<strong>for</strong>mationexponentially decreasesmt<strong>DNA</strong> is only 16,000 base pairs – already being analyze since it is such asmall <strong>DNA</strong> strandY-STRs are estimated to between 400 and 500 useful Y-STRsYou have to double the number to have an impact or use faster mutatingmarkers which require more submissions to analyzeY-SNPs – FTNDA has only around 500 useful Y-SNPs in the haplotree (ifyou ignore duplicate SNPs)NatGeo 2.0 will test 12,000 Y-SNPs – probably doubling 500 to 1,000useful SNPs <strong>for</strong> western European research (majority are Chinese andSardinian)It is believe that useful Y-SNP should exceed 50,000 when it becomeseconomical feasible to scan the entire Y-<strong>DNA</strong> strandAugust, 2012 32


Y-SNP analysis is the futureEvery surname cluster should get several Y-SNPs that create brancheswithin surname clustersThe Y-SNPs have father-son relationships vs. Y-STRs which are onlyclusters of related submissions that overlap with each otherBetween all combinations of Y-STRs and Y-SNPs, most living individual aswell as most deceased ancestors can have unique haplotypes assignedIt will take several years to establish the connections between thethousands of Y-SNPs and most research is done by fellow researchersGenealogical Y-SNPs are already being discovered with only around 500useful Y-SNPs, 50,000 Y-SNPs will produce thousands moreBen<strong>net</strong>t Greenspan (president of FT<strong>DNA</strong>) stated that Y-SNPs will be thegenealogical tree and the Y-STRs will become leaves on this treeFT<strong>DNA</strong> is the only ge<strong>net</strong>ic testing company doing any serious Y-SNPadvancement while others take advantage of their researchNatGeo 2.0 provides a static 12,000 Y-SNP test starting in November andis taking orders now (tests are processed by FT<strong>DNA</strong> and uploadable toFT<strong>DNA</strong> databases at no charge)August, 2012 33


IT costs will drive testing costsAlmost all people really hate this chart and this chart will provide anextreme challenge to the ge<strong>net</strong>ic testing communityTesting costs will continue to decline between 80 % and 90 % per year –most companies will just offer more data vs. lower costsEventually, you will be able to order a full genome test from China <strong>for</strong> afraction of cost of existing testing companies – but you only get raw dataWith amount of data required <strong>for</strong> analysis increasing by ten times eachyear and the testing costs declining by ten times each year is an issueEventually, you will have to pay <strong>for</strong> online analysis separate from testingcosts as existing companies start losing business to Chinese companies23andme had the correct idea – but the wrong strategy to go with it asthe should withdrawn only access <strong>for</strong> those that did not pay vs.withdrawing data which was a huge marketing disasterReally advanced analysis tools are possible, but nobody wants to makemajor investments and not get paid <strong>for</strong> it – so we deserve the tools thatwe get which are marginal at bestAs the data gets so huge in the future, today’s manual methods will notscale to databases that large and tools will be required <strong>for</strong> analysisAugust, 2012 34

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!