Molecular Evidence 2: DNA Functional Redundancy
The second piece of evidence is DNA functional redundancy.
The basic concept behind this piece of evidence is very similar to that which I discussed last week, which should be pretty obvious since they both have very similar names. They’re so similar that I’ll go ahead and review the basic argument behind last week’s evidence, since it has relevance here. All organisms share a number of proteins which are universally necessary for basic life processes; these proteins are called “ubiquitous proteins.” Because of the functional redundancy implicit in the structure/function relationship of amino acid sequences, there are a vast number of potential sequences for any given ubiquitous protein. Since the only mechanism for sequence similarity between organisms is common ancestry, similar amino acid sequences imply a phylogenetic relationship. As a specific example, I pointed out the ubiquitous protein cytochrome C, which has the exact same amino acid sequence in humans and chimpanzees, which strongly indicates common ancestry between the two species. The sequence similarity of cytochrome C between humans and just about every other species is higher than would be predicted if evolution is not a valid hypothesis. Thus, protein functional redundancy is strong evidence supporting evolutionary theory.
DNA functional redundancy is basically the same phenomenon, but instead of comparing amino acid sequences of protein, the underlying DNA sequences are compared. Now, you’ll remember from the Molecular Biology Primer two weeks ago that the amino acid sequence of a protein is determined by the nucleic acid (that is, DNA) sequence found in the corresponding gene. The nucleotide sequence of a gene is transcribed into an RNA message, which is then translated into an amino acid sequence, forming a functional protein. All right, and I’m sure you also remember that the genetic code which translates nucleotides to amino acids also reads the nucleotide sequence in groups of three, called codons. Since there is a 1:1 relationship between codons and amino acids, that means that there’s a 3:1 relationship between nucleotides and amino acids. If you haven’t already guessed it by now, the short answer to the question of DNA functional redundancy is that you take the strength of the evidence for protein functional redundancy and raise it to the power of 3.
I’ll try to explain a few of the details of this before getting into specific examples again. You’ll remember from the Molecular Biology Primer that I said that there are 64 different codons. You get this by raising the number of different nucleotides, 4, to the power of 3, which is the number of individual nucleotides in a codon. You’ll also remember that I said that there were only 20 different amino acids that are used to make proteins. Obviously, this means that you have 44 more codons than you actually need, if you were trying to be as efficient as possible. Theoretically, codons could be assigned completely at random, and the genetic code could be different for different organisms. If it were true, it would be an excellent refutation of evolutionary theory, but this is not what we observe. Interestingly, we find that for any three-nucleotide codon, the identity of the third nucleotide is less important for determining the corresponding amino acid than the first two. This phenomenon is referred to as codon degeneracy. Degeneracy means that for just about any codon, the third nucleotide can be changed to something different without affecting the corresponding amino acid that will result from translation. For example, the amino acid alanine has a four-fold degenerate codon, since any codon starting with guanine and cytosine will result in the translation of alanine. That is, you can find GCT, GCC, GCA, or GCG in the sequence of a gene, and all four will be eventually translated as an alanine. Other amino acids are less degenerate-tyrosine, for example, is only translated by codons beginning with thymine and adenine and ending with thymine or cytosine. The other two degenerate codons, the ones ending in adenine or guanine, are reserved as signals telling the transcription machinery to stop- they’re basically called “stop codons.”
So what does all this coding redundancy imply? Well, when all is said and done, it basically means that there are an astronomical number of ways that one could encode just about any given gene, without changing a single amino acid of the final protein sequence. Thus, there is no reason to assume, a priori, that any two organisms would have the same nucleotide sequence for any particular gene, even if they had the exact same amino acid sequence. Let me stress that again. Two different species with the exact same amino acid sequence for a protein have no biological reason, outside of common ancestry, to have high similarity between their corresponding nucleotide sequences. There isn’t even a name (I think) for the number of different possible nucleotide sequences. You just have to use exponents and powers of 10.
Well, let’s go back to the same example I used last week- cytochrome C. This, again, is a ubiquitous gene- it’s found in all living organisms. For this gene, the number of possible nucleotide sequences for any given amino acid sequence is higher than 10^49. That’s quite a lot. And remember, the human and chimpanzee cytochrome C sequences are exactly the same. So, there’s 10^49 different nucleotide sequences that could exist for the human and chimpanzee genes. Now, what happens when we compare the human and chimp sequences? We find they’re only different by 4 nucleotides. That’s only 1.2% different between them. The chance of this happening without common ancestry is infinitesimally small. And this evidence supports the existing fossil evidence. Most fossil evidence estimates that humans and chimpanzees separated from a common lineage somewhere around 10 million years ago, maybe sooner. We can measure the background mutation rate in humans (and other mammals), and we’ve shown it to be about 1-5 every 100 million nucleotides per generation. Since the average primate generation is 20 years, the predicted difference between a chimpanzee gene and a human gene is less than 3%. For cytochrome C, this prediction is undoubtedly fulfilled. And this is true for most other genes too- every gene that I’ve looked at, no less. In fact, I’d like to challenge anyone who’d like to disprove this evidence to find a gene that shows more than 3% difference- I’ll even do the work for you, even thought it’s easy to do by yourself.
Fortunately, the good people at the American National Center for Biotechnology Information (which can be found at the difficult to remember address of “ncbi.nlm.nih.gov”- I’d recommend just typing in “NCBI” to your Google search) have done everyone the service of publishing the entire human genome and the entire chimpanzee genome online. You can, if you like, download the entire human genome right to your computer. Burn it on a CD. Upload it to your iPod. Whatever. But the great thing is that you can use tools that they provide on their site to directly compare the sequences for yourself. You don’t need to take my word for it. But there’s not enough time now for me to tell you exactly how to use the website, so you can either spend some time fooling around with it on your own, or you can see what I’ve done with it. There’s a new resource website that I’ve started that has some gene comparisons including cytochrome C that you can look at for yourself. Just go to: http://www.drzach.net/evolution101/. I know, I know- another website to remember. I’ve tried to keep links reasonably redundant between the Freethought Media site, the blog, and this resource site- you can decide which one you want to bookmark. I’ll be updating the resource page with more information eventually, including a tutorial on how to use NCBI’s tools to analyze sequence similarity on your own. I’ve also tried to compare genes between as many organisms as are available, including orangutan, gorilla, cow, pig, dog, zebrafish, mouse, rat, etc. Comparing all these different organisms allows me to construct a genetic cladogram, and the predictions based on genetic similarity reinforce the phylogenetic relationships predicted by anatomy.
So, to review, DNA functional redundancy shows that the extra layer of redundancy implicit in the coding of DNA reinforces the evidence from protein functional redundancy, and makes it even less likely that organisms share similar DNA sequences for any reason other than common ancestry. This one-two punch of protein and DNA evidence has hopefully been convincing- next week we’re going to leave the strong evidence within the coding part of the genome and look at some equally strong, if not stronger evidence within the noncoding part of the genome.