LITE–ing the way to game-changing technology
Genome sequencing has revolutionised biological sciences, allowing researchers to tackle completely new questions. But that doesn't mean there's universal access.
Genome sequencing has revolutionised biological sciences, allowing researchers to tackle completely new questions. But that doesn't mean there's universal access.
At the Earlham Institute, we are developing protocols to drastically reduce the cost of library preparation for next-generation sequencing. By designing bespoke approaches to whole-genome sequencing, we are contributing to worldwide efforts to generate sets of genomic data at scale.
Sequencing genomes — whether of humans, animals, plants, or microorganisms — provides invaluable information about species evolution, biodiversity, and biological function.
The COVID-19 pandemic has highlighted the importance of global genomic surveillance to monitor SARS-CoV-2 for genetic mutations which could make the virus more contagious, virulent, or more resistant to the available vaccines.
The WHO has endorsed genomic surveillance as a worldwide priority to inform public health decisions and improve preparedness for future pandemics. Advances in genomic technologies (and high demand for them) over the past 25 years have led to substantial reductions in the cost of sequencing.
Nevertheless, the cost of establishing and running sequencing facilities is unaffordable for many low- and middle-income countries (LMICs).
With on-instrument sequencing costs falling rapidly, an increasing proportion of the total cost when sequencing large numbers of DNA or RNA samples is actually incurred when preparing the samples so they can be loaded onto a sequencer, known as library preparation.
The cost of library preparation has remained stubbornly high, largely due to the cost of reagents and the time required for precise pipetting.
To make the process more affordable, Darren Heavens – at the time a Research Assistant at the Earlham Assistant – went about designing a low-cost custom library construction method for large-scale sequencing projects.
Low Input, Transposase Enabled - commonly called LITE - was born.
He is now Dr Darren Heavens and LITE is now being used in laboratories all around the world.
“Twelve years ago, we were approached to help sequence the barley genome, which, at just over 5 gigabases, is almost twice the size of that of humans and has a lot of repeat content,” he explains.
Heavens and colleagues optimised Illumina’s Nextera library construction kit to maximise fragmentation across the largest possible size range using minimal amounts of DNA from bacterial artificial chromosomes (BACs).
The LITE protocol allows researchers to sequence individual BACs in one go for only £2-3 GBP per BAC (total reagent cost), making it possible to achieve cost reductions of 90%.
Using LITE alongside other complementary techniques, Heavens and colleagues developed a reference genome for cultivated barley (Hordeum vulgare) and perennial ryegrass.
“The possibility of generating large, high-quality genome assemblies with reduced DNA input, time, and cost was soon spotted as an opportunity to tackle ambitious resequencing projects such as the 10,000 Salmonella genomes project supported by the Global Challenges Research Fund,” Heavens says.
Salmonella enterica is a major cause of illness and death globally, particularly in Africa. The severity of disease depends on both host factors and the distinctive surface structures found on the bacterium that can be identified by genomics.
In 2016, researchers at the Earlham Institute and the University of Liverpool, along with scientists from 16 countries across the world, embarked on an ambitious project to sequence and analyse 10,000 Salmonella enterica genomes from Africa and Latin America to study the epidemiology, drug resistance and virulence factors of isolates.
Thanks to the LITE protocol, library construction, DNA sequencing, and basic bioinformatic analysis were done with a total reagent cost of less than USD$10 (around £7) per isolate.
“The 10KSG project has established an efficient and relatively inexpensive pipeline for the worldwide collection and sequencing of bacterial genomes that will be an enormous asset to public health and surveillance in LMICs,” Heavens says.
Using the same approach as in the 10KSG project, over 1,000 genomes of Shigella spp. from 7 countries in sub-Saharan Africa and South Asia have been sequenced and analysed6. Shigella spp. are the leading bacterial cause of severe childhood diarrhoea in LMICs and they are becoming increasingly resistant to key antibiotics.
The results of this project will help guide treatment decisions and the development of an effective preventative vaccine.
Scientists at the Earlham Institute are also using the LITE method to work out the species of plants that bees are visiting from mixed samples of pollen. Reverse metagenomics (RevMet) allows the characterization of bee-collected pollen, without requiring the plants’ reference genomes. A reversal of the normal metagenomic protocol, in RevMet the reference sequences are mapped to query sequences.
They have used LITE to generate ‘reference skims’— sequences that only partially cover the complete genome of the plant but can be mapped to individual long reads sequenced from a mixed-species sample of pollen to identify the plant species it belongs to.
Understanding what plants bees like to visit is important to combat their decline. So far, the results indicate that different bees prefer different plants, but that individual bees show a high fidelity to one species of plant.
At a total reagent cost of £90 GBP per genome skim, RevMet could readily be used to study a wide range of mixed samples from species without reference genomes. Studies like this could provide new insights into animal diets, plant–fungi interactions and algae communities.
As DNA sequencing technology advances to generate more data at speed, the workflow bottlenecks have shifted. “The demand for protocols like LITE highlight the need for cost-effective and robust solutions for sequencing large amounts of DNA from very little starting material,” says Heavens, “but we are not resting on our laurels. We’re always looking for ways to do things better and meet the requirements of customers and collaborators.”
The Earlham Institute's Genomics Pipelines Group, led by Dr Karim Gharbi, process between 10,000-15,000 samples per year using the LITE protocol. “The numbers increase year on year which is a clear testament to its usefulness,” he says.
Since April 2017, the Genomics Pipelines Group has worked with over 50 external collaborators on 125 projects that used LITE in a wide range of species, including fish, insects, and plants.
The Group also works closely with the Institute’s Core Bioinformatics Group, who have developed software for automated assembly and annotation of bacterial genomes from LITE data.
By introducing minor but important changes, the Genomics Pipelines Group are continuously improving the quality of the data that can be obtained using LITE. They are aiming to launch a major update of the protocol by the end of the year that increases reproducibility and reduces bias against regions of the genome that are difficult to sequence.
“In addition to applying the new LITE protocol to bulk DNA extracted from tissues or cultures, we’re introducing the same protocol across our production pipelines for single-cell analysis to allow us to embark on even more large-scale projects such as The Darwin Tree of Life Project, for which EI is developing methods to sequence the DNA of every single-celled eukaryote in the UK,” Gharbi says.
Authored by Monica Hoyos Flight, writing for the Earlham Institute
The Earlham Institute is committed to driving open science and developing methods, tools, algorithms and e-infrastructures to advance genomics research in the UK and beyond.
If you are interested in using the LITE protocol or finding out more, get in touch with the Genomics Pipelines Group.