Thursday, September 13, 2012

From Cro-Magnon man to Celt

There are now a couple of different themes running through this short series of posts on DNA and what it means in the search for our ancestors. I had started out wondering when Europeans first arrived in America, and found that there appear to be more possible answers to that question than I had first thought. I must also confess that I had not realized until last week the scale of the warfare that was a common feature of life in the Paleo-Indian times. (Though I should not have been surprised, since it was still common at the time that the Pilgrims showed up, as the story of the Quawpaw peoples – who now live around Baxter Springs, Kansas – can attest). But while this is going to remain the main theme of the thread, the changes in the analysis of DNA over the last few years has brought some other surprises with it. This means that it is becoming possible to be more refined in tracking ancestral paths that was the case even five years ago.

To illustrate what I mean, let me expand a little using some of my own results to help explain the development, and then this might help in understanding some of the debate when I come back to the main theme later.

There are now three different places that scientists can look at a DNA sample. There is the basic Y-chromosome (Y-DNA), which provides the paternal line, but which is not passed down through women. Then there is the mitochondrial DNA (mtDNA), which comes down from the female line alone, and which goes to all children. With refinement it is now possible to look at the entirety of the DNA string and what is known as the autosomal DNA (auDNA), using new techniques to specifically focus on different regions, and, from historical records, to start to build a broader family history. Those investigations are beginning to show that there may be many medical benefits, coming from a better understanding of how the body works, and how DNA acts as the control.

I am going to cover those latter topics in posts that follow this, but for today I am going to return to the Y-DNA and the paternal, or patriarchal line. This is not meant as a personal history, but some background might help with context for the example. I am a Summers, of whom it estimated that there are some 15,564 individuals in the UK.

Figure 1. Distribution of the Summers families in the United Kingdom (Dynastree

Our group comes from the area around the village of Eglingham, in Northumberland where the First Duke of Northumberland (Hugh Smithson, who took the family name of Percy) founded a coal mine at Tarry, about 1750, with 9 workers, including a Grey, an Embleton and my relative George.

The region had, historically, been a mess, since it was the ground over which the Scots and English fought for hundreds of years until 1603 when James VI of Scotland became James I of England, following the death of Elizabeth. Bands of my Scottish ancestors (Littles, Maxwells, Moffats etc) came through the hills to steal the cattle and everything else from anyone they could find. Of course the Percys then led bands back, to burn, loot and destroy in return. Both sides of the current family lived through the time of the Border Reivers, and were likely participants albeit on opposite sides.

The current number of Summers is a little overwhelming, but looking at the lower density in the County, and that as one goes back in time, so the numbers of individuals drops considerably, and can focus back down to just one or two individuals. Bryan Sykes did a remarkable study which showed that the Celtic hero Somerled, who died in 1164, has had 500,000 descendants since (including the clans McDonald, McDougall, and McAlister). The lineage can be found here.

So how can this be proved, how can the different branches of the clan be distinguished, and how can I apply that to my own case?

Well the answer lies in the structure of the Y-DNA. Within the Y chromosome are a number of regions where a particular sequence repeats a number of times. For example (to quote Smolenyak) the four base pattern GATA might be repeated five times: GATA GATA GATA GATA GATA. But sometimes when this occurs the exact number of repetitions changes. This is particularly prone to happen when the number of repetitions gets to be more than ten where, as an example, one might find that the ten-sequence suddenly, in the next generation, becomes an eleven-count, or perhaps drops to a nine count. A generational marker thus exists, and it appears to be one where these changes can occur about every 10 to 20 generations. (Say 2-300 years). These “stutters” are known as Short Tandem Repeats (STRs) and they have become the definitive way of identifying much smaller groups of people, within the broader classification of a haplogroup.

Because the numbers can change at different marker sites at random intervals, it is only when comparing across a number of different sites that one can establish who might be related to who. Consider the results from a group of individuals, looking at the number of repeats which are counted at different markers along the chromosome. (The marker site locations are standardized and given at the top of the table).

Figure 2. Short Tandem Repeat counts for 12 individuals that all share the same basic haplogroup designation (R1b1a2) but with varying count numbers at different marker sites. The different shades show variations from the numbers of the defined count for this particular lineage, given as the bolder numbers on the top line.

To return to the McDonald lineage, as time passed the number of repetitions at different locations changed in an individual, and his children, so that different patterns (as in the above) developed. However they did not necessarily follow the divisions of the clan, though once separate any changes in the counts would be restricted to the line in which they occurred.

Figure 3. Variations in the Y-DNA counts as the branches of the Somerled lineage have developed over the centuries. (The values are the sites and the count changes at different marker locations). (Clan Donald)

For the nerds among us it is interesting that the Clan study shows
The calculated haplotype of Good John is the same as participant &PGTBN to 37 markers. We have descendants of three of John's sons. Nine of these have excellent paper trail pedigrees. As shown above on the two charts, there are 22 single-step mutations in 37 markers in 139 transmission events. This is a mutation rate of 0.0043±.0009 (one standard deviation) per marker per birth. The value calculated by the Webmaster from all available data, both academic papers and surname studies, is 0.0031, only slightly more than one standard deviation off from our Clan Donald value.
Note that the rate is calculated by dividing the number of mutations by the number of births and by the number of sites monitored. If one were looking only at the rate at which a single mutation would occur, then this would be 22/139 = 0.16 mutations per birth, or one mutation every roughly 7 generations. With an average generation being around 30 years, this gives a dividing mutation every roughly 200 years per line.

Figure 4. Division of the Clan Donald into various Septs into the 16th Century (Clan Donald). The divisions have multiplied since.

So, by establishing the basic marker counts for a group (i.e. the Haplogroup) one can then, as with the clan lineage above, establish, over time, the branching of that tree, and, for those on one of the branches, this also provide a path back to the original ancestors. There are several ways in which the different groups can be designated.

Oxford Ancestors (OA) has built on the classification system initially created by Bryan Sykes in Saxons, Vikings and Celts, which divides the results from Y-DNA STR counts at 12 sites (those in the table above) into five different major groups. This seems to now be quite widely accepted as a primary classification system, although the names of the five groups can vary from site to site. I found the Border Reiver Ancestry site useful, since it looks at several ways of assessing my particular ancestors.

The most common is the R1b group (that with the baseline values of table 2) and this Sykes called Oisin. It is now being called Celtic and can make up some 67.5% of the DNA profiles from the Border Reiver region, although Sykes found that it designated 62.8% of the samples he took in Northumberland. (The group includes me). Using the ordering of the STR count marker sites as shown in Figure 2 the values would be:


The second is Group I, called Wodan by Sykes, and now being also referred to as Anglo Saxon/Danish. The STR counts for this Haplogroup, some 20.7% in the Border Reiver sample, 15.9% in Sykes’ Northumberland, would be:


The Haplogroup R1a is the Norse Viking, which Sykes called Sigurd. Interestingly this is also the group into which Somerled falls. The STR counts for the basic member of this Haplogroup, which makes up 3.7% of Border stock but 7.3% of Sykes Northumberland, possibly because of a close Scandinavian tie to that county even today, would be:


The fourth grouping is the E1b1b group, which Sykes called Eshu. This is some 1.6% of the Sykes sample but was not recorded in the Border Reiver analysis. The counts are:


And the fifth group, defined by the letter J, which Sykes referred to as Re, and which is also referred to as Ancient Roman. It makes up 3.1% of the Border Reiver sample, and 2.4% of those sampled by Dr Sykes in Northumberland. The counts are:


Even within a single surname, and haplogroup there are individual marker count changes that have built up over the centuries that surnames have been around (roughly a thousand years). It turns out that there are Summers whose DNA falls into all five of the groups. But knowing that I am a Celt, although the Summers are reputed to be either Anglo Saxon or perhaps Norman French does perhaps explain my predilection for bagpipe music, including Spanish and Portuguese.

The story will continue . . . .


  1. Dịch vụ kế toán ACB chuyên cung cấp dich vu ke toan trọn gói uy tín chuyên nghiêp giá rẻ nhất tại HCM và các tỉnh lân cận với chi phí bỏ ra chỉ từ 500.000-1.500.000đ.
    Tri ân khách hàng, ACB giảm giá lên đến 50% giá trị hợp đồng khi doanh nghiệp trở thành đối tác của chúng tôi.
    Liên hệ: Dịch vụ kế toán , Dich vu ke toan .
    Lầu 4, Tòa nhà Long Mã, 602 Cộng Hòa,P.13,Q.Tân Bình,HCM.
    Hotline: (08) 62 97 97 97 - 0966 660 888.