Science —

Genome sequencing pioneer: How biology entered the information age

It took a trinity of biochem, genetics, and molecular biology to spur progress.

Eric Lander
Eric Lander

STOCKHOLM, SWEDEN—Eric Lander was one of the leaders behind the effort to sequence the human genome. He has also continued to work on various follow-up projects through his involvement with the Broad Institute, a leading sequencing center. So, Lander makes an excellent choice to provide some perspective about how the growing availability of genomes has driven the biological sciences over the last decade. He did just that at a Nobel Week Dialogue talk.

But Lander didn't stop at ten years. Instead, he backed up all the way to the start of the 20th century and ran through the history of biology since. His reasoning was that it can take decades for the impact of scientific discoveries to be clear. And, according to Lander, the story parallels that of the 20th century as a whole, the rise of the information age. Information was a theme that pervaded the rest of his talk, and Lander blamed life itself. "Life was fundamentally about information."

Biology reaches the information age

At the end of the 1800s, vitalism—the idea that life had features that were distinct from the rest of the physical world—was still popular. It was the parallel progress in genetics and biochemistry that helped bring vitalism to a close. It took until the 1940s for the two fields to overlap, as researchers started to study the genetics of biochemical pathways and the biochemistry of heritable material. This is when the central role of DNA became clear. And that's what inspired Watson and Crick (plus a number of others) to think that understanding the structure of DNA would be critical.

The double helix moved biology into the information age, as Lander put it, since it answered some key questions: how can you store information, faithfully replicate it, but still allow room for the new variants that drive evolution? The identification of the genetic code was the next (and obvious) step in understanding how information was stored in a string of nucleotides. The problem, Lander said, was that you couldn't actually read any of this—at least not as it exists in a cell.

From a biochemical perspective, all DNA was more or less the same, so you couldn't deal with that information by the standard methodology of the time. It took the development of molecular biology and recombinant DNA to start dealing with the actual information content.

"How do you ask how to read DNA?" Lander asked. "You ask the master—the cell. The cell is in the business of reading the information in DNA." Molecular biology and biotechnology developed around the purification and use of the proteins used by the cells themselves to manipulate DNA.

The next hurdle, according to Lander, was that we had no idea how to apply the principles of molecular biology to something as big as the human genome in a systematic way. That problem first started to crack with the development of RFLP mapping, when human genes were first associated with a nearby DNA feature that could only be revealed using the techniques of molecular biology. In principle, these worked. But Lander said without the genome sequence, each research group had to essentially start from scratch.

Conceptually, he said the key step was the development of a hierarchical map. Lay out genetic markers on a map, identify the DNA associated with those markers, and then dig down into the actual DNA sequences. The first human genetic map appeared in 1987. That set the stage for the genome sequencing to kick off in earnest in the 1990s. The final draft was announced in 2003, on the 50th anniversary of the Watson and Crick paper.

Maps on tops of maps

Once we had the sequence, we could start making notes about the specific features of different genome regions, a concept Lander referred to as mapping. By comparing genomes, you could map the parts of the genome conserved through evolution; the sites of common human variations; the sites where proteins bind to DNA to regulate genes; and so on. These maps have been essential to understanding the meaning of the sequences that were identified in the genome sequencing project.

All that information helped us start to make some progress in medicine. Over the last 25 years, we've gone from knowing the causative mutations behind 20 human diseases to knowing about 3,500. We only knew something about 1 polygenic disorder, while now we know more about over 2,000.

Having reached the present, Lander joked it was a mistake to ever talk about the future. But that didn't stop him from trying. He kept his discussion to problems that are already obvious. One was the challenge of integrating all the information that's now flowing in such a way that respects privacy while providing access to researchers. We still don't know how to consistently translate knowledge about biological systems to effective medicine. And we're only just starting to learn how to effectively "write" using the genetic code, to create novel proteins or combinations of them.

And with that, Lander wrapped up, leaving it for the rest of the speakers to fill in the details of the future.

This piece is part of our ongoing coverage of the Nobel Week Dialog, and first appeared on the Nobel site.

Listing image by Amin Tabrizi

Channel Ars Technica