The other day I found myself wondering what proportion of genes cousins would expect to share compared to biological siblings. This took more time to figure out than I would have expected, in part because I knew that siblings who share two parents have 1/2 their genes in common on average, so I thought cousins sharing two grandparents might have a quarter. They don’t, though – it’s half that. In reasoning it out, it turned out to be easiest to think of moving up the biological family tree to a common ancestor, which led to one general formula and a few specific cases:

**The Generalization:**

Given two people A and B, find their closest common ancestor C. If there are n generations from A to C and m generations from B to C, then the expected proportion of shared genes is (½)^{n+m}. If there are two closest common ancestors (for example, both parents) then this number would double.

In the case of a parent and child, for example, there is 1 generation from the child to the parent (the common ancestor) and 0 from the parent to itself, so the proportion of shared genes would be (½)^{1}, or just ½. Cousins would each be 2 generations from common grandparents, leading to (½)^{4}, or 1/16, for cousins with one grandparent in common (sometimes called half cousins) and twice that for cousins with two grandparents in common (sometimes called full cousins). Double cousins — that is, people who are cousins on both sides of the family tree (for example, cousins whose mothers are sisters and whose fathers are brothers) — would still have grandparents as the closest common ancestor, but now it would be up to four common grandparents instead of just one or two: the expected proportion of shared genes between cousins with four common grandparents would be 4·(½)^{4}, or just ¼. Likewise, an aunt and nephew with two parents/grandparents in common would be 1 and 2 generations respectively from this pair of common ancestors, so the expected proportion of shared genes would be 2·(½)^{3}, also ¼.

Special Case 1: great-great-…-great grandparents

In this case the older relative is the common ancestor, so if “g” is the number of “great”s then the proportion of shared genes is (½)^{g+2}. The additional 2 in the exponent is because the number of “great”s counts the generations after grandparents, who are already 2 generations away from their grandchildren. This is the only case where the proportion is exact: in all the others, it’s only an expected proportion because siblings could have anywhere from no overlap of genes to complete overlap of genes from each common parent.

Special Case 2: great-great-…-great aunts and uncles

In this case the older relative’s parent(s) are the common ancestor. With a great-uncle and great-niece, for example, the great-uncle’s parent(s) are the great-grandparent(s) of the great-nephew. This means that there is 1 generation from the great-uncle to his parent(s), but 3 from the great-niece to that common ancestor, with each additional “great” adding another generation. If “g” is the number of “great”s, then the expected proportion of shared genes would be (½)^{g+3} if there is one parent in common, and (½)^{g+2} if there are two. (I personally find it interesting that you can expect to share the same proportion of genes with a sibling who shares both parents as you do with either of the individual parents, the same proportion with an aunt or uncle who shares both grandparents as you do with either of the individual grandparents, and the same proportion with a great-great-…-great aunt/uncle who shares both great-great-…-great grandparents as you do with either of those great-great-…-great grandparents themselves.)

One clarification: great-aunt is the term I grew up with, but in looking around I just discovered that “grand-aunt” may be the technically correct term, since that person is in the same generation as a grandparent; likewise, the sister of a great-grandparent would be a great-grand-aunt. This appeals to me aesthetically. If you were to use these terms, then you’d have one fewer “great” in describing the relationship, and you’d need to add 1 to the exponent in the formulas above.

Special Case 3: second cousins once removed (and the like)

Cousins share at least one grandparent, second cousins share at least one great-grandparent, and x^{th} cousins share at least one great^{(x-1)} grandparents. This means that x^{th} cousins are each (x+1) generation removed from the common ancestor(s), and would expect to share (½)^{2x+2} of their genes if there is one common relative and (½)^{2x+1} if there are two. Each removal refers to one of the people being one more generation removed from any common ancestors, and so increases the power of ½ by 1. This means that x^{th} cousins who are y-times removed would expect to share (½)^{2x+y+2} of their genes if there is one common relative and (½)^{2x+y+1} if there are two. Second cousins once removed would share either (½)^{7} or (½)^{6} of their genes, while first cousins twice removed would share (½)^{6} or (½)^{5}.

For those who like the visual, there is a handy little chart below, which appears to be in the public domain on Wikipedia. It does make some assumptions, however – namely, that siblings, cousins, aunts and nieces, etc. have exactly two closest relatives in common (both parents, two grandparents, etc.).

June 23, 2014 at 9:36 am |

So, what if A and B share a father, but A’s mother is B’s grandmother? This does occur in practice, you know.

June 23, 2014 at 5:31 pm |

The formula should still apply: you’d apply it twice, once for the father and once for the mother. A and B are each 1 generation away from the father, so that component would be (1/2)^2=1/4, and then on the mother’s side the common ancestor is A’s mother, who is 1 generation away from A and 2 generations away from B, so that component would be (1/2)^3=1/8. The total amount of shared genes would be the sum, so 3/8, which is a little less than siblings.

I think where it would get more complicated mathematically would be something like siblings whose parents were siblings — thinking of the Ancient Egyptian kings and queens, because that’s a lot less distasteful than thinking of other kinds of parallels. My suspicion is that siblings would expect to share 3/4 of their genes instead of 1/2, but maybe it’s 5/8.

October 10, 2015 at 1:23 pm |

The formula doesn’t consider chromosomal recombination, so it stands for direct ancestors but not for siblings, since they can be more or less than 50% similar depending on what chunks of DNA the sperms and eggs that originated them shared between each other

October 13, 2015 at 6:05 am |

True – I tried to hint at that by using the phrase “expected proportion” rather than just proportion and mentioning averages, but didn’t do any more than that.

What I’m not certain of us what the range is. In theory siblings of the same two parents could share anywhere from 0% to 100% of their genes, with 50% being the average, but I don’t know how large or small a 95% confidence interval would be, say. (I started to figure it out with a binomial distribution, but then realized it would need to be applied twice, once for each genetic parent, so it’s a little more complicated than that.)