The Genetic Code Overview This module will examine how information is encoded in DNA, and how that information is interpreted to bring about changes in cells and tissues. Objectives 1. Understand the triplet nature of the genetic code, and know the meaning of the term codon. 2. Know that the code is degenerate, and what that means. 3. Know that the code is unambiguous, and what that means. 4. Know the identities of the start and stop codons, and understand how they work. The Genetic Code It has been mentioned in a variety of modules that DNA stores genetic information.
That much was clear from the experiments of Avery, Macleod, and McCarty and Hershey and Chase. However, these experiments did not explain how DNA stores genetic information. Elucidation of the structure of DNA by Watson and Crick did not offer an obvious explanation of how the information might be stored. DNA was constructed from nucleotides containing only four possible bases (A, G, C, and T). The big question was: how do you code for all of the traits of an organism using only a four letter alphabet? Recall the central dogma of molecular biology.
The information stored in DNA is ultimately transferred to protein, which is what gives cells and tissues their particular properties. Proteins are linear chains of amino acids, and there are 20 amino acids found in proteins. So the real question becomes: how does a four letter alphabet code for all possible combinations of 20 amino acids? By constructing multi-letter “words” out of the four letters in the alphabet, it is possible to code for all of the amino acids. Specifically, it is possible to make 64 different three letter words from just the four letters of the genetic alphabet, which covers the 20 amino acids easily.
This kind of reasoning led to the proposal of a triplet genetic code. Experiments involving in vitro translation of short synthetic RNAs eventually confirmed that the genetic code is indeed a triplet code. The three-letter “words” of the genetic code are known as codons. This experimental approach was also used to work out the relationship between individual codons and the various amino acids. After this “cracking” of the genetic code, several properties of the genetic code became apparent: * The genetic code is composed of nucleotide triplets.
In other words, three nucleotides in mRNA (a codon) specify one amino acid in a protein. * The code is non-overlapping. This means that successive triplets are read in order. Each nucleotide is part of only one triplet codon. * The genetic code is unambiguous. Each codon specifies a particular amino acid, and only one amino acid. In other words, the codon ACG codes for the amino acid threonine, and only threonine. * The genetic code is degenerate. In contrast, each amino acid can be specified by more than one codon. * The code is nearly universal.
Almost all organisms in nature (from bacteria to humans) use exactly the same genetic code. The rare exceptions include some changes in the code in mitochondria, and in a few protozoan species. * A Non-overlapping Code * The genetic code is read in groups (or “words”) of three nucleotides. After reading one triplet, the “reading frame” shifts over three letters, not just one or two. In the following example, the code would not be read GAC, ACU, CUG, UGA… * * Rather, the code would be read GAC, UGA, CUG, ACU… * * Degeneracy of the Genetic Code There are 64 different triplet codons, and only 20 amino acids. Unless some amino acids are specified by more than one codon, some codons would be completely meaningless. Therefore, some redundancy is built into the system: some amino acids are coded for by multiple codons. In some cases, the redundant codons are related to each other by sequence; for example, leucine is specified by the codons CUU, CUA, CUC, and CUG. Note how the codons are the same except for the third nucleotide position. This third position is known as the “wobble” position of the codon.
This is because in a number of cases, the identity of the base at the third position can wobble, and the same amino acid will still be specified. This property allows some protection against mutation – if a mutation occurs at the third position of a codon, there is a good chance that the amino acid specified in the encoded protein won’t change. * Reading Frames * If you think about it, because the genetic code is triplet based, there are three possible ways a particular message can be read, as shown in the following figure: * * Clearly, each of these would yield completely different results.
To illustrate the point using an analogy, consider the following set of letters: * theredfoxatethehotdog * If this string of letters is read three letters at a time, there is one reading frame that works: * the red fox ate the hot dog * and two reading frames that produce nonsense: * t her edf oxa tet heh otd og * th ere dfo xat eth eho tdo g * Genetic messages work much the same way: there is one reading frame that makes sense, and two reading frames that are nonsense. * So how is the reading frame chosen for a particular mRNA? The answer is found in the genetic code itself.
The code contains signals for starting and stopping translation of the code. The start codon is AUG. AUG also codes for the amino acid methionine, but the first AUG encountered signals for translation to begin. The start codon sets the reading frame: AUG is the first triplet, and subsequent triplets are read in the same reading frame. Translation continues until a stop codon is encountered. There are three stop codons: UAA, UAG, and UGA. To be recognized as a stop codon, the triplet must be in the same reading frame as the start codon. A reading frame between a start codon and an in-frame stop codon is called an open reading frame.
Let’s see how a sequence would be translated by considering the following sequence: 5′-GUCCCGUGAUGCCGAGUUGGAGUCGAUAACUCAGAAU-3′ First, the code is read in a 5′ to 3′ direction. The first AUG read in that direction sets the reading frame, and subsequent codons are read in frame, until the stop codon, UAA, is encountered. Note that there are three nucleotides, UAG (indicated by asterisks) that would otherwise constitute a stop codon, except that the codon is out of frame and is not recognized as a stop. In this sequence, there are nucleotides at either end that are outside of the open reading frame.
Because they are outside of the open reading frame, these nucleotides are not used to code for amino acids. This is a common situation in mRNA molecules. The region at the 5′ end that is not translated is called the 5′ untranslated region, or 5′ UTR. The region at the 3′ end is called the 3′ UTR. These sequences, even though they do not encode any polypeptide sequence, are not wasted: in eukaryotes these regions typically contain regulatory sequences that can affect when a message gets translated, where in a cell an mRNA is localized, and how long an mRNA lasts in a cell before it is destroyed.
A detailed examination of these sequences is beyond the scope of this course. The Genetic Code: Summary of Key Points * The genetic code is a triplet code, with codons of three bases coding for specific amino acids. Each triplet codon specifies only one amino acid, but an individual amino acid may be specified by more than one codon. * A start codon, AUG, sets the reading frame, and signals the start of translation of the genetic code. Translation continues in a non-overlapping fashion until a stop codon (UAA, UAG, or UGA) is encountered in frame. The nucleotides between the start and stop codons comprise an open reading frame.