We're ten years old: leading the field in decoding complex, non-human genomes
Ten ways our expertise has led the way in how we decode complex, non-human genomes.
We might be a young Institute, but we’ve punched high over the last decade - staying at the cutting edge of bioscience research and leading the world in decoding complex, non-human genomes.
Chop it up, sequence it, stick it back together again. How hard can it be? We’ve already got a fully sequenced human reference genome, right?
If only it was that easy. DNA sequencing is just the start, and life on earth is as complex as it is abundant. Not only that, many genomes pose huge challenges when it comes to assembling them in the right order, and finding the important genes.
At the Earlham Institute, we’ve been leading the way in decoding non-human genomes of great importance - from wheat to koalas. What’s more, we’ve been developing and building upon the resources that enable us to assemble and annotate the most complex of them, polyploids and all, and ensure that the product is of high quality and accuracy.
So, what are ten things that we’ve brought you in our ten years so far, and how are we enabling the genomics community to tackle complex genomes?
We were the first institute to put together a comprehensive assembly of the wheat genome, with the Clavijo Group's w2rap software leading the field in decoding this complex genome, and we’ve been involved at every step of the way since. As part of the International Wheat Genome Consortium, the Swarbreck Group provided the software tools that helped to annotate it - putting the genes where they should be. Not only that, a year previously, the KAT tool developed here by Bernardo Clavijo (we’ll come to this later) and his group was crucial in the publication of the most complete wheat genome sequence to date.
Our scientists have applied their expertise to understanding this highly complex genome, which has helped to push forward global research on a crop that underpins half of global calories and is essential for improving food security worldwide.
We have also been leading projects that help us to delve into the complexities of this crucial crop. Now, not only do we have a wheat genome, thanks to Anthony Hall’s Group we can better understand its epigenetics - the environmental changes that give us such abundant landraces and potential variation.
Furthermore, we have led on projects that allow us to look at the diversity of wheat, so that we can breed better varieties that can tolerate a changing climate and the rampant onslaught of emerging pests and diseases.
Who’d have thought that the koala, with a genome of a similar size to that of yours or mine, would have been so difficult to decode?
Our bioinformaticians and Genomics Pipelines team at EI helped bring this project to its fruition, providing their expertise and guidance on assembling a genome that was hugely complex to put together due to the many repeated regions and other elements. It was also a landmark genome, the first marsupial reference genome, and one that will prove invaluable in Australia’s battle against devastating biodiversity decline.
Along with the improved assembly, the Di Palma Group’s expertise in bioinformatics contributed to one of the breakthrough findings of the koala genome release - the discovery of genes that might help koalas to eat eucalyptus leaves that would leave most of us with an incredibly upset stomach. Further research on koalas will help us understand other threats to their survival, including a loss of genetic diversity and the spread of diseases such as chlamydia and the HIV-like Koala Retrovirus (KoRV).
Part of the NORNEX consortium developed as a fast-response unit to the emergence of ash dieback disease in the UK, EI’s expertise has contributed to the sequencing and understanding of both the ash dieback fungus and the resistant trees that may hold clues to how we save our iconic woodlands from destruction.
Using our unique capacity for high throughput DNA sequencing and our series of tools that complement and enhance the quality of genome sequences and assemblies, we helped to release the sequence of the “survivor” ash tree open access to the research community within two months.
More recently, work in the Neil Hall Group at EI has contributed greatly to our knowledge of how ash dieback came to be such a devastating pathogen, and the potential devastation of trees that might result from the introduction of just one more spore from the fungus’ native range in East Asia.
Tilapia for supper? Thanks to research in EI’s Di Palma Group, that might increasingly be the case among East African communities, who look to expand aquaculture in a sustainable way for food security and better socioeconomic wellbeing.
In order to better understand how to improve native tilapia it has been important to delve into cichlid fish diversity - unravelling the genetic mechanisms that give cichlids their incredible range of shapes, forms and adaptations that help them to survive. This information is interesting from an evolutionary perspective, but also when understanding how to breed fish that are better adapted to local environments.
Assembling high quality, complex, non-human genomes is impossible without the right tools for the job.
At Earlham Institute, we have the right people to make those tools, who have brought the scientific community WRAP2, KAT, PORTCULLIS, Mikado, GrassRoots Genomics, SalmoNET, GeneSeqToFamily and many more besides. These software tools help scientists to better quality control their DNA assemblies, perform the assemblies themselves, find the genes among the splice junctions, or better understand the relationships between different genes and regulatory networks within and across species.
KAT developed in the Clavijo Group, in addition to PORTCULLIS and Mikado which were both developed by the Swarbreck Group, have all been an integral part of the two most recent updates to the long-awaited wheat genome, while the Davey Group’s GeneSeqToFamily was used extensively in helping us to decode the complex koala genome. NanoOK RT, developed on site by the Leggett Group to help assemble metagenomic datasets using the Oxford Nanopore MinION, is helping to achieve rapid, diagnostic-level analysis of environmental samples that could revolutionise healthcare and science in the field.
Importantly, not only do we develop such tools and resources here at EI, we also adapt the tools of others so that they can be more readily applied to non-human genomes. Furthermore, through CyVerseUK and other projects such as Elixir and Galaxy, we also make these tools easily accessible to the wider community of life scientists in the UK and internationally - a recognised National Capability.
We have been at the heart of the modern genomics revolution that has seen DNA sequencing rapidity, capacity and affordability increase year on year - even month to month. We have led the development and adoption of automation and protocols that have minimised the costs associated with large-scale, high throughput genome sequencing experiments.
We were among the first institutes to adopt several third generation genome sequencing technologies, such as the PacBio RSII platform and recently the PacBio Sequel II, and were instrumental in helping to develop the Oxford Nanopore MinION as part of the open community of researchers testing and improving the latest in the DNA sequencing armoury.
Our genome sequencing capacity helped us to deliver a rapid response genome as part of the NORNEX consortium fighting ash dieback in 2013. It helped us to deliver the bread wheat draft genome in 2014, aided by advances in automation led by EI. In the same year, we were the first institute to publish the genome of the naked mole rat - an important model organism.
We continue to develop our capacity to innovate in Genomics Pipelines, through the adoption of the latest high-throughput and cost effective platforms. A unique capacity of the Earlham Institute is our expertise in single cell analysis - understanding species not only from the perspective of the organism as a whole, but during each and every stage of development, within and between different tissues, under different environmental pressures over time.
Our supercomputing power and capacity is more than OK - in fact, at the time we embraced the data-driven revolution in our early days we had the largest supercomputer dedicated to life science research in Europe. We are constantly updating and augmenting this vital resource, which allows us to analyse complex, non-human genomes efficiently here at EI.
We are all about sharing at EI, and not only is much of our work data-driven, it’s also open access to the community - with a National Capability in e-Infrastructure led by the Davey Group. We have computing resources reserved that are accessible via CyVerseUK, so that other researchers, who don’t have the sort of access that we do to high performance computing (HPC) infrastructure, can perform the experiments they require remotely using virtual machines that we provide free of charge.
We’re also helping scientists around the world embrace HPC and think about what a research cyberinfrastructure might mean for them. As part of our work on GROW in Colombia, our team has delivered a white paper on how the country might embrace HPC for the benefit of increasing research capacity to understand and conserve its unique biodiversity.
It’s not just about reading and assembling DNA sequences and then finding the genes. How does this relate to what happens in the field?
Phenomics is a field that is pushing forward the boundaries of agricultural technology, and EI scientists in the Zhou Group have been driving novel algorithm development that use robotics and artificial intelligence to help us improve agriculture.
CropSight, for example, is a clever method that combines imaging taken from aircraft with other data taken closer to the field that allows us to predict when it is best to harvest lettuce in a given part of the farm. In this way, we might reduce losses and maximise the harvest potential of the crop.
CropQuant is another exciting step forward, which allows us to visualise what’s going on in a field over the course of an entire growth season. Combining image data with environmental measurements and then analysing the growth of crops with different genotypes, we can begin to understand what’s going on at different growth stages to better inform breeding and farming practices.
Thousands of people have passed through our doors and come out of the other side better equipped to perform bioinformatics research. From our hugely popular course in de novo sequencing through to advanced training in python and software carpentry - there’s something for everyone looking to get to grips with modern genomics research.
Our training courses have been increasing research capacity worldwide, with important collaborations ensuring that we can train the next generation of bioinformaticians in East Africa, Vietnam and Colombia.
We also have a very successful Year in Industry programme, the 12 participants of which have gone on to do PhDs or land great jobs in industry, and who have also contributed some great work while here at EI.
As part of our tenth anniversary celebrations, we will be launching five highly complex, non-human genomes sequenced and assembled using the latest technologies and algorithms developed at EI.
Continuing our expertise in decoding non-human genomes, we are advancing graph assembly - a method that will help to decode polyploid, non-human genomes more effectively and accurately than ever before.
The five anniversary genomes represent important species from across the tree of life, including the highly heterozygous diploid Trifolium pretense (clover), the alkaline cichlid Alcolapia grahami, the triploid, single celled green alga Euglena gracilis, the allotetraploid Nicotiana benthamiana and, finally, the immensely complex octoploid genome of the strawberry, Fragaria x ananassa.
Led by the Clavijo group, and combining a number of DNA sequencing platforms through our Genomics Pipelines team, the five anniversary genomes will mark a step forward in how we decode complex, non-human genomes, through using the SDG software that marks the future of DNA assembly.
This is a timely step, which will help to advance the global Earth BioGenome Project, of which EI is playing a major role, that aims to sequence the genomes of all known animals, plants, fungi and protists on Earth.