13.09.2022 Views

Molecular Biology of the Cell by Bruce Alberts, Alexander Johnson, Julian Lewis, David Morgan, Martin Raff, Keith Roberts, Peter Walter by by Bruce Alberts, Alexander Johnson, Julian Lewis, David Morg

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ANALYZING AND MANIPULATING DNA

477

With the possible exception of identical twins, the genome of each human differs

in DNA sequence from that of every other person on Earth. Using primer pairs

targeted at genome sequences that are known to be highly variable in the human

population, PCR makes it possible to generate a distinctive DNA fingerprint for

any individual (Figure 8–39). Such forensic analyses can be used not only to help

identify those who have done wrong, but also—equally important—to exonerate

those who have been wrongfully accused.

Both DNA and RNA Can Be Rapidly Sequenced

Most current methods of manipulating DNA, RNA, and proteins rely on prior

knowledge of the nucleotide sequence of the genome of interest. But how were

these sequences determined in the first place? And how are new DNA and RNA

molecules sequenced today? In the late 1970s, researchers developed several

strategies for determining, simply and quickly, the nucleotide sequence of any

purified DNA fragment. The one that became the most widely used is called dideoxy

sequencing or Sanger sequencing (Panel 8–1). This method was used to

determine the nucleotide sequence of many genomes, including those of E. coli,

fruit flies, nematode worms, mice, and humans. Today, cheaper and faster methods

are routinely used to sequence DNA, and even more efficient strategies are

being developed (see Panel 8–1). The original “reference” sequence of the human

genome, completed in 2003, cost over $1 billion and required many scientists from

around the world working together for 13 years. The enormous progress made in

the past decade makes it possible for a single person to complete the sequence of

an individual human genome in less than a day.

The methods summarized in Panel 8–1 for rapidly sequencing DNA can also

be applied to RNA. Although methods are being developed to sequence RNA

directly, it is most commonly carried out by converting the RNA to complementary

DNA (using reverse transcriptase) and using one of the methods described for

DNA sequencing. It is important to keep in mind that although genomes remain

the same from cell to cell and from tissue to tissue, the RNA produced from the

genome can vary enormously. We will see later in this chapter that sequencing

the entire repertoire of RNA from a cell or tissue (known as deep RNA sequencing,

or RNA-seq) is a powerful way to understand how the information present in

the genome is used by different cells under different circumstances. In the next

section, we shall see how RNA-seq has also become a valuable tool for annotating

genomes.

To Be Useful, Genome Sequences Must Be Annotated

Long strings of nucleotides, at first glance, reveal nothing about how this genetic

information directs the development of a living organism—or even what types

of DNA, protein, and RNA molecules are produced by a genome. The process of

genome annotation attempts to mark out all the genes (both protein-coding and

noncoding) in a genome and ascribe a role to each. It also seeks to understand

more subtle types of genome information, such as the cis-regulatory sequences

that specify the time and place that a given gene is expressed and whether its

mRNA undergoes alternative splicing to produce different protein isotypes.

Clearly, this is a daunting task, and we are far short of completing it for any form of

life, even the simplest bacterium. For many organisms, we know the approximate

number of genes, and, for very simple organisms, we understand the functions of

about half their genes.

In this section, we discuss broadly how genes are identified in genome

sequences and what clues we can discern about their roles from simply inspecting

their sequences. Later in the chapter, we turn to the more difficult problem of

experimentally determining gene function.

How does one begin to make sense of a genome sequence? The first step is

usually to translate in silico the entire genome into protein. There are six different

reading frames for any piece of double-stranded DNA (three on each strand).

We saw in Chapter 6 that a random sequence of nucleotides, read in frame, will

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!