Friday, November 17, 2017

Calculating time of divergence using genome sequences and mutation rates (humans vs other apes)

There are several ways to report a mutation rate. You can state it as the number of mutations per base pair per year in which case a typical mutation rate for humans is about 5 × 10-10. Or you can express it as the number of mutations per base pair per generation (~1.5 × 10-8).

You can use the number of mutations per generation or per year if you are only discussing one species. In humans, for example, you can describe the mutation rate as 100 mutations per generation and just assume that everyone knows the number of base pairs (6.4 × 109).

Wednesday, November 08, 2017

How much mitochondrial DNA in your genome?

Most mitochondrial genes have been transferred from the ancestral mitochondrial genome to the nuclear genome over the course of 1-2 billion years of evollution. They are no longer present in mitochondria but they are easily recognized because they resemble α-proteobacterial sequences more than the other nuclear genes [see Endosymbiotic Theory].

This process of incorporating mitochondrial DNA into the nuclear genome continues to this day. The latest human reference genome has about 600 examples of nuclear sequences of mitochondrial origin (= numts). Some of them are quite recent while others date back almost 70 million years—the limit of resolution for junk DNA [see Mitochondria are invading your genome!].

Tuesday, November 07, 2017

Lateral gene transfer in eukaryotes - where's the evidence?

Lateral gene transfer (LGT), or horizontal gene transfer (HGT), is widespread in bacteria. It leads to the creation of pangenomes for many bacterial species where different subpopulations contain different subsets of genes that have been incorporated from other species. It also leads to confusing phylogenetic trees such that the history of bacterial evolution looks more like a web of life than a tree [The Web of Life].

Bacterial-like genes are also found in eukaryotes. Many of them are related to genes found in the ancestors of modern mitochondria and chloroplasts and their presence is easily explained by transfer from the organelle to the nucleus. Eukaryotic genomes also contain examples of transposons that have been acquired from bacteria. That's also easy to understand because we know how transposons jump between species.

Contaminated genome sequences

The authors of the original draft of the human genome sequence claimed that hundreds of genes had been acquired from bacteria by lateral gene transfer (LGT) (Lander et al., 2001). This claim was abandoned when the "finished" sequence was published a few years later (International Human Genome Consortium, 2004) because others had shown that the data was easily explained by differential gene loss in other lineages or by bacterial contamination in the draft sequence (see Salzberg, 2017).

Thursday, November 02, 2017

Parental age and the human mutation rate

Theme

Mutation

-definition
-mutation types
-mutation rates
-phylogeny
-controversies

Mutations are mostly due to errors in DNA replication. We have a pretty good idea of the accuracy of DNA replication—the overall error rate is about 10-10 per bp. There are about 30 cell divisions in females between zygote and formation of all egg cells. In males, there are about 400 mitotic cell divisions between zygote and formation of sperm cells. Using these average values, we can calculate the number of mutations per generation. It works out to about 130 mutations per generation [Estimating the Human Mutation Rate: Biochemical Method].

This value is similar to the estimate from comparing the sequences of different species (e.g. human and chimpanzee) based on the number of differences and the estimated time of divergence. This assumes that most of the genome is evolving at the rate expected for fixation of neutral alleles. This phylogenetic method give a value of about 112 mutations per generation [Estimating the Human Mutation Rate: Phylogenetic Method].

The third way of measuring the mutation rate is to directly compare the genome sequence of a child and both parents (trios). After making corrections for false positives and false negatives, this method yields values of 60-100 mutations per generation depending on how the data is manipulated [Estimating the Human Mutation Rate: Direct Method]. The lower values from the direct method call into question the dates of the split between the various great ape lineages. This controversy has not been resolved [Human mutation rates] [Human mutation rates - what's the right number?].

It's clear that males contribute more to evolution than females. There's about a ten-fold difference in the number of cell divisions in the male line compared to the female line; therefore, we expect there to be about ten times more mutations inherited from fathers. This difference should depend on the age of the father since the older the father the more cell divisions required to produce sperm.

This effect has been demonstrated in many publications. A maternal age effect has also been postulated but that's been more difficult to prove. The latest study of Icelandic trios helps to nail down the exact effect (Jónsson et al., 2017).

The authors examined 1,548 trios consisting of parents and at least one offspring. They analyzed 2.682 Mb of genome sequence (84% of the total genome) and discovered an average of 70 mutations events per child.1 This gives an overall mutation rate of 83 mutations per generation with an average generation time of 30 years. This is consistent with previous results.

Jónsson et al. looked at 225 cases of three generation data in order to make sure that the mutations were germline mutations and not somatic cell mutations. They plotted the numbers of mutations against the age of the father and mother to produce the following graph from Figure 1 of their paper.


Look at parents who are 30 years old. At this age, females contribute about 10 mutations and males contribute about 50. This is only a five-fold difference—much lees than we expect from the number of cell divisions. This suggests that the initial estimates of 400 cell divisions in males might be too high.

An age effect on mutations from the father is quite apparent and expected. A maternal age effect has previously been hypothesized but this is the first solid data that shows such an effect. The authors speculate that oocyotes accumulate mutations with age, particularly mutations due to strand breakage.


Of these, 93% were single nucleotide changes and 7% were small deletions or insertions.

Jónsson, H., Sulem, P., Kehr, B., Kristmundsdottir, S., Zink, F., Hjartarson, E., Hardarson, M.T., Hjorleifsson, K.E., Eggertsson, H.P., and Gudjonsson, S.A. (2017) Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature, 549:519-522. [doi: 10.1038/nature24018]

Tuesday, October 31, 2017

The history of DNA sequencing

This year marks the 40th anniversary of DNA sequencing technology (Gilbert and Maxam, 1977; Sanger et al., 1977)1 The Sanger technique soon took over and by the 1990s it was the only technique used to sequence DNA. The development of reliable sequencing machines meant the end of those large polyacrylamide gels that we all hated.

Pyrosequencing was developed in the mid 1990's and by the year 2000 massive parallel sequencing using this technique was becoming quite common. This "NextGen" sequencing technique was behind the massive explosion in sequences in the early part of the 21st century.2

Even newer techniques are available today and there's a debate about whether they should be called Third Generation Sequencing (Heather and Chain, 2015).

Nature has published a nice review of the history of DNA sequencing (Shendure et al., 2017). I recommend it to anyone who's interested in the subject. The figure above is taken from that article.


1. Many labs were using the technology in 1976 before the papers were published.

2. New software and enhanced computer power played an important, and underappreciated, role.

Heather, J.M., and Chain, B. (2015) The sequence of sequencers: The history of sequencing DNA. Genomics, 107:1-8. [doi: 10.1016/j.ygeno.2015.11.003]

Maxam, A.M., and Gilbert, W. (1980) Sequencing end-labeled DNA with base-specific chemical cleavages. Methods in enzymology, 65:499-560. [doi: 10.1016/S0076-6879(80)65059-9]

Sanger, F., Nicklen, S., and Coulson, A.R. (1977) DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, 74:5463-5467. [PDF]

Shendure, J., Balasubramanian, S., Church, G.M., Gilbert, W., Rogers, J., Schloss, J.A., and Waterston, R.H. (2017) DNA sequencing at 40: past, present and future. Nature, 550:345-353. [doi: 10.1038/nature24286]


Escape from X chromosome inactivation

Mammals have two sex chromosomes: X and Y. Males have one X chromosome and one Y chromosome and females have two X chromosomes. Since females have two copies of each X chromosome gene, you might expect them to make twice as much gene product as males of the same species. In fact, males and females often make about the same amount of gene product because one of the female X chromosomes is inactivated by a mechanism that causes extensive chromatin condensation.

The mechanism is known as X chromosome inactivation. The phenomenon was originally discovered by Mary Lyon (1925-2014) [see Calico Cats].

Saturday, October 28, 2017

Creationists questioning pseudogenes: the GULO pseudogene

This is the second post discussing creationist1 papers on pseudogenes. The first post addressed a paper by Jeffrey Tomkins on the β-globin pseudogene [Creationists questioning pseudogenes: the beta-globin pseudogene]. This post covers another paper by Tomkins claiming that the GULO pseudogenes in various primate species are not derived from a common ancestor but instead have been deactivated independently in each lineage.

The Tomkins' article was published in 2014 in Answers Research Journal, a publication that describes itself like this:
ARJ is a professional, peer-reviewed technical journal for the publication of interdisciplinary scientific and other relevant research from the perspective of the recent Creation and the global Flood within a biblical framework.

Saturday, October 14, 2017

Creationists questioning pseudogenes: the beta-globin pseudogene

Jonathan Kane recently (Oct. 6, 2017) posted an article on The Panda's Thumb where he claimed that Young Earth Creationists often don't get enough credit for raising serious issues about evolution [Five principles for arguing against creationism].

He mentioned some articles about pseudogenes as prime examples. I asked him for references and he responded with two articles by Jeffrey Tomkins that were published on the Answers in Genesis website. The first was on the β-globin pseudogene and the second was on the GULO pseudogene. Both articles claim that these DNA sequences aren't really pseudogenes because they have functions.

I'll deal with the β-globin pseudogene in this post and the GULO pseudogene in a subsequent post.

Wednesday, October 11, 2017

Historical evolution is determined by chance events

Modern evolutionary theory is based on the idea that alleles become fixed in a population over time. They can be fixed by natural selection if they confer selective advantage or they can be fixed by random genetic drift if they are nearly neutral or slightly deleterious [Learning about modern evolutionary theory: the drift-barrier hypothesis]. Alleles arise by mutation and the path that a population follows over time depends on the timing of mutations [Mutation-Driven Evolution]. That's largely a chance event.

Wednesday, September 13, 2017

Sequencing human diploid genomes

Most eukaryotes are diploid, including humans. They have two copies of each autosome. Thousands of human genomes have been sequenced but in almost all cases the resulting genome sequence is a mixture of sequences from homologous chromosomes. If a site is heterogeneous—different alleles on each chromosome—then these are entered as variants.

Monday, September 11, 2017

What's in Your Genome?: Chapter 4: Pervasive Transcription (revised)

I'm working (slowly) on a book called What's in Your Genome?: 90% of your genome is junk! The first chapter is an introduction to genomes and DNA [What's in Your Genome? Chapter 1: Introducing Genomes ]. Chapter 2 is an overview of the human genome. It's a summary of known functional sequences and known junk DNA [What's in Your Genome? Chapter 2: The Big Picture]. Chapter 3 defines "genes" and describes protein-coding genes and alternative splicing [What's in Your Genome? Chapter 3: What Is a Gene?].

Chapter 4 is all about pervasive transcription and genes for functional noncoding RNAs. I've finally got a respectable draft of this chapter. This is an updated summary—the first version is at: What's in Your Genome? Chapter 4: Pervasive Transcription.

Saturday, September 09, 2017

Cold Spring Harbor tells us about the "dark matter" of the genome (Part I)


This is a podcast from Cold Spring Harbor [Dark Matter of the Genome, Pt. 1 (Base Pairs Episode 8)]. The authors try to convince us that most of the genome is mysterious "dark matter," not junk. The main theme is that the genome contains transposons that could play an important role in evolution and disease.

Wednesday, August 30, 2017

Experts meet to discuss non-coding RNAs - fail to answer the important question

The human genome is pervasively transcribed. More than 80% of the genome is complementary to transcripts that have been detected in some tissue or cell type. The important question is whether most of these transcripts have a biological function. How many genes are there that produce functional non-coding RNA?

There's a reason why this question is important. It's because we have every reason to believe that spurious transcription is common in large genomes like ours. Spurious, or accidental, transcription occurs when the transcription initiation complex binds nonspecifically to sites in the genome that are not real promoters. Spurious transcription also occurs when the initiation complex (RNA plymerase plus factors) fires in the wrong direction from real promoters. Binding and inappropriate transcription are aided by the binding of transcription factors to nonpromoter regions of the genome—a well-known feature of all DNA binding proteins [see Are most transcription factor binding sites functional?].