How did genome sequencing change so rapidly?
Just how did genome sequencing change so rapidly? From sloshing around in radioactive material to high throughput advanced genomics.
From sloshing around in radioactive material to high throughput advanced genomics, Dr Daniel Swan tells us about the changing face of genome sequencing
For those of us who have been in genomics for a while, the face of sequencing has changed beyond recognition.
During my PhD, I spent a good proportion of my student days in the ‘hot lab’, a room dedicated to radioisotopes that dominated genomic investigation.
Dealing with Radioactive Southern blots1 to identify DNA fractions for size selection, radioactive probing of lambda phage libraries to identify genomic clones of interest, followed by 33P Sanger sequencing2 on polyacrylamide gels, uncovering novel sequences as you went (being precariously short of fully sequenced genomes at the time).
Even as I came to the end of my PhD lab work in the late 1990’s, the way we generated sequence data was already changing.
I was outsourcing publication quality work to be run on LI-COR 4200 automated DNA sequencers. These were still run on gels, but lasers detected fluorescent chain terminators – hot lab work no longer required.
Sanger sequencing dominated for a quarter of a century with variations on a theme, and capillary sequencing drove the required throughput of the Human Genome Project. The Maxam-Gilbert technique3, while having an automation protocol published4, was never developed as a commercial concern.
The Platforms & Pipelines Group at the Earlham Institute now have a mix of platforms in their labs, closing our capillary-based Sanger sequencing operations two years ago. There is still a demand for the technology, but it no longer supports the kind of investigations that our customers and Science Faculty need.
It’s not just the Sanger genome sequencing platforms that we have carefully mothballed, but also iterations of ‘next-generation’ sequencing machines – our Roche 454’s and Illumina GAII are museum pieces already, but also a visual reminder to keep around the lab to show the march of progress to visitors.
What takes their place is a mix of platforms, from the Life Technologies Ion Proton, a large suite of Illumina machines, the Pacific Biosciences RSII to the Oxford Nanopore MinIONs, which aren’t currently commercially available.
EI’s advanced suite of platforms represent a mix of sequencing by synthesis and direct sensing covering optical and semiconductor detection.
But why do we need the proliferation of platforms?
Quite simply they all target different applications in the lab. The MiSeq’s are attractive targets for 16S amplicon studies and small bacterial genomes. The HiSeq’s support high-throughput BAC sequencing (for the modern equivalent of the genome projects to generate sequence across minimal tiling paths), RNA-Seq and also exome sequencing across a range of plants.
The PacBio is another ideal platform for generating and supporting genome assemblies, but has exciting applications via the Iso-Seq protocol to aid genome annotation efforts using RNA-Seq for full-length transcript sequencing.
There are also the other machines, sitting in the lab whose purpose is not to sequence DNA, but to provide scaffolding information to use in more complex genome assembly projects.
They look familiar in that they have optical detection, and flow cells, but they take images of strands of copied and labelled DNA as they migrate across the field of view.
Our platforms, such as the OpGen Argus and BioNano Genomics Irys peer into the structure of single molecules of DNA, allow us to polish existing reference assemblies, create new ones and providing information on larger structural variability from one organism to another.
One of the challenges of utilising these platforms is the requirement for their input DNA. These can be heavily trimmed (e.g. in exome sequencing) or need very gentle handling to retain high molecular weight (for the optical mapping and the PacBio), or finely separated for mate-pair libraries.
With a profusion of read lengths across the Illumina platforms, each library needs to be tailored to both its application and machine destination. The ability to accurately generate DNA fragment sizes, and subsequently QC them, requires investment in a number of platforms – Diagenode Megaruptors, Covaris AFA ultrasonicators, Agilent TapeStations, Sage ELF and BluePippins are critical parts of the lab workflow.
The software application landscape is now changing to reflect this.
Bioinformatics is transitioning out of its period of short-read introspection and living in a world where Sanger length reads (and beyond!) can be generated in huge volumes.
EI's Platforms & Pipelines’ bioinformaticians are required to support an ever-expanding range of genome sequencing platforms, protocols and investigation types – all with their protocol-specific quirks.
When I walk around our lab I see automation and specialisation everywhere I go; technology and techniques that twenty years ago, at the start of my PhD, would have seemed entirely otherworldly. I’m acutely aware that the few kilobases of mouse genome that I sequenced by hand would barely be a footnote in the daily output of any of our modern genome sequencing machines at EI. It’s a great time to be a scientist in the genomics field – and without a drop of radioactivity in sight!
Where will genome sequencing be in another 20 years?