Molecular Biology Primer
Molecular biology is a fascinating component of the biological sciences. It was born in the early part of the twentieth century out of a desire to find some way to unite the related fields of biological chemistry, microbiology (the study of microorganisms such as bacteria), genetics, and virology. The goal of molecular biology is to study biological systems by analyzing their macromolecular components. I’ll assume that most of you know that a molecule is nothing more than the smallest amount of any substance that still retains its properties, but what are macromolecules? They’re called “macro” molecules because unlike a molecule of, say, water, which is made up of only three atoms, macromolecules are composed of anywhere from dozens to thousands to millions of atoms, depending on the molecule. There are essentially four classes in biology- proteins, carbohydrates (sugars), nucleic acids (DNA), and lipids (fat). They are also somewhat unique in their ability to form polymers, or long chains of repeating segments. The longer the chain is, the larger the molecule.
Proteins perform most of the basic biological tasks in organisms- they form the internal structural support of cells, link cells together, cut up and assemble other proteins or nucleic acids, provide communication pathway between the inside and outside of a cell, immobilize and target invading microbes for destruction, and convert energy currencies to run the whole show. Carbohydrates and lipids are used primarily for energy storage, although they do a number of other things as well- I don’t want to slight those people who are interested in lipid biology- I come from a lipid background myself, and I know how essential they are, but I’d like to jump ahead to the final macromolecule, nucleic acids, and its connection with protein expression.
Nucleic acids form the central aspect of the replication of life. DNA is a nucleic acid, and is the beginning of the process that ends in production of a particular protein. DNA is a polymer, which means that it’s a long chain of subunits. These subunits, or nucleotides, come in four types, called adenine, cytosine, guanine, and thymine. These are usually abbreviated to the first letters of their name, A,C,G, or T. A DNA molecule is made up of only these four nucleotides, and they can be placed in any order. DNA molecules are millions of nucleotides long, which basically makes them very long string-like molecules. Unless they’re being copied, DNA molecules are usually wound up tightly around themselves- sort of like a telephone cord that’s been stretched too far and too many times. These wound up DNA molecules are called chromosomes- and humans have 23 pairs of them, or 46 total. The sequence of nucleotides that makes up a chromosome is copied every time a cell divides- in the process called mitosis. Mitosis occurs whenever new cells are being made- and this is happening in your body all the time. Skin cells, hair follicles, liver cells, muscle cells, bone marrow cells- all these cells are undergoing mitosis as you listen to this.
Mutations are mistakes in DNA replication. The molecular machinery that copies DNA during mitosis is not perfect, and it is susceptible to a number of factors, including radiation, certain chemicals, or viruses. Radiation, especially ultraviolet radiation, tends to affect adjacent thymine bases, so it’s not completely random, but it’s very close. But there is also a base rate of mutation that occurs randomly but at a measurable average rate, that results in one base being switched with another during copying. In humans, this rate is at about 1 mistake per 100 million base pairs every generation. This is about 175 total mutations per individual. If one of these mutations occurs in one of the cells that is transferred to the next generation- we call these “germ cells” and they would be either sperm in the male or eggs in the female- then the mutation is incorporated into the genome of the next generation.
This is an important concept- since we observe time and time again that inheritance is the mechanism for transfer of mutation from one generation to the next, we can infer genetic relationships between organisms based on shared mutations. For example, let’s say that your grandfather was the first person to have a unique and dominant mutation, call it “Mutation X”, which was passed on to all of his children, including your father, and then on to you. You happen to meet someone who claims to be a long-lost cousin, but how do you know? If you were to compare your DNA sequence to this supposed cousin and find that they had Mutation X as well, that would be genetic proof that you share the same grandfather. Thus, shared DNA sequence implies shared ancestry.
OK, so that’s how DNA works, but how do you get protein from DNA? Well, as I’ve mentioned before, the DNA sequence of most organisms is divided up into transcribed and non-transcribed parts. The transcribed parts are called “genes.” Gene transcription is the process by which an RNA copy is made of a DNA sequence. RNA is similar in structure to DNA, but it isn’t used as the genetic storage molecule. Instead, it’s used as an intermediate to ferry copies of the DNA sequence out of the nucleus of the cell and into the main part of the cell, where protein is made. RNA is kind of like a librarian who goes into the basement of the library, makes a photocopy of a book, and then brings the photocopy to a person who requested it. It’s basically an exact copy of the original gene, but constructed out of RNA nucleotides, instead of DNA nucleotides. These copies are called transcripts, because we talk about RNA being transcribed from DNA. The RNA travels from where the DNA is stored in the nucleus out into the main part of the cell, where protein is made.
Proteins are also polymers, or long chains of subunit molecules. Instead of being made out of nucleotides, however, proteins are made out of amino acids. Now, whereas there are only four different nucleotides that are incorporated into DNA, there are twenty different amino acids that are incorporated into proteins. That means that it would be impossible to have a 1:1 relationship between a nucleotide and an amino acid sequence- there just are too many amino acids. So what’s the solution? The solution is that there is a kind of code in the nucleotide sequence that requires it to be subdivided into three nucleotide groups. This way, a sequence such as AGTCTCGAATCC would be read, AGT, CTC, GAA, TCC. These groups of three nucleotides are called codons, because they are the individual units of the genetic code. Since there are 64 possible codons, that makes plenty of possible amino acid counterparts- too many, in fact. Since there are 64 possible codons but only 20 possible amino acids, that means that there are multiple codons that correspond to the same amino acid. The RNA sequence is used directly to make the amino acid sequence, in a process called translation.
Amino acids are themselves somewhat similar is structure to a nucleotide- there is a base structure that is composed of an amino group and a carboxylic acid group- hence the name, amino acid. But each amino acid also has room for another group, called a side chain- and it’s the various structures of the side chain that make one amino acid different from the other. Some amino acids are electrically charged, and some have no charge. Some amino acids associate well with water, others are repelled by water. Some amino acids are very large, and others are very small. All of these factors come into play during the final product, the protein molecule. Ultimately, a protein is just a long chain of amino acids, just like DNA is a long chain of nucleic acids. But instead of staying a long, floppy string of amino acids, proteins fold up into specific conformations, depending on the specific amino acids that are used to make them. Chemical bonds between different amino acids cause parts of the chain to stick together, specific orders of amino acids can cause the chain to fold back and forth or spiral around itself, much like DNA does. Because of all this folding, each protein has a different appearance, or what we call a structure. And it’s this structure that makes a protein able to do the specific things that it can do- all the things that I mentioned at the beginning of this episode.
All right, that’s a lot of information to soak up. Let me just go over the basics again. DNA is made up of a chain of four different nucleotides. The nucleotide sequence is transcribed into RNA, which is then translated into an amino acid sequence. The translation is carried out by virtue of the genetic code, in which 64 different 3-nucleotide codons are translated into 20 different amino acids. The specific order of amino acids confers physical and chemical properties to the final protein, influencing the way it is folded up into its final structure. And the structure of the protein is directly related to its function.