Decoding Biodiversity: bridging the gap between data and discoveries.
A new era in our relationship with nature is dawning.
Faster and more accurate DNA sequencing means we are generating vast amounts of biological data about living organisms.
As we document the DNA of life around us, we are creating the largest and most rapidly growing resource we have ever had on Earth’s biodiversity. The potential for new science and discovery is equally enormous.
Realising this potential will be no simple matter, and the ability to meaningfully integrate and interrogate datasets has proved elusive.
Different research groups use different data standards, the data aren’t always findable, and access may be restricted in some way. Even when the data are available, new and bespoke tools may need to be developed to analyse them.
Data-intensive bioscience - which includes existing large genome collections and related initiatives - could revolutionise our understanding of biology and evolution, create new benefits for society, improve human welfare, and help us better conserve, protect, and restore biodiversity.
The Earlham Institute’s Decoding Biodiversity strategic research programme addresses the urgent need for tools, standards, and expertise to translate this flood of complex biological information into global benefit.
“There has been a revolution in our ability to generate and assemble high-quality genomes over the last decade or so,” says Head of Plant Genomics and Decoding Biodiversity programme lead Professor Anthony Hall.
“The question is, how do we turn these genomes into scientific discoveries?
“With Decoding Biodiversity, we’ll be creating tools that allow us to look at multiple genomes in new ways. We’ll be able to compare them and identify new types of structural rearrangements, or hybridizations, we haven't been able to see before.
“In the case of crops, for example, these can then be associated with important agricultural traits. It’s going to give us a new level of biological understanding.”
These ambitious plans build on existing research from the Earlham Institute. Professor Hall says the last few years have really laid the foundations for this new 5-year programme of work.
“Over the last five years we've had a research programme that has already started to develop some of these tools. We’ve had a whole series of publications exploring the genomes of fish, plants, and microbial communities.
“Now we're scaling up. We’re moving from analysing single genomes to comparing multiple genomes.”
The Earlham Institute is uniquely placed to do this thanks to its talented mix of scientists from different disciplines - mathematicians, computer scientists, plant scientists, microbiologists, and geneticists - all working together on common problems. The Institute also boasts two National Bioscience Research Infrastructures - Transformative Genomics and the Earlham Biofoundry.
Professor Hall also believes one of the real strengths of the Institute is its in-house infrastructure and expertise with cutting-edge technology platforms.
“It’s that combination - generating sequence data and working with the latest datasets produced by the very best platforms - that gives us the chance to develop and use tools to make discoveries.
“We also have compute infrastructure which has been specifically developed to do this kind of analysis for the life sciences.”
To ensure the wider bioscience community benefits, the Earlham Institute provides access to its technology platforms, as well as advanced training - available to both academic and industry scientists.
“We don’t just generate the tools,” explains Professor Hall. “We train researchers to use them, and the technology we’ve developed, so they can apply them to their own research.”
The Decoding Biodiversity programme has three key areas; creating the tools for the job, using the tools to understand gene function or variation in key aqua/agricultural traits, and applying the tools to complex microbiomes.
The first strand of work is led by Dr David Swarbreck, Core Bioinformatics Group Leader. It will involve developing, testing, and refining bioinformatic workflows and computational tools for the analysis of genomes.
“We have an existing suite of tools and pipelines but we need to optimise these as well as develop new approaches that can work with large and complex plant genomes or highly diverse microbial communities” Dr Swarbreck says.
“As well as the tools themselves, we’ll also be working hard to promote them and offer training to the community so they can make the maximum impact.”
Dr Jose De Vega, Group Leader at the Institute, is leading on the second strand. This focuses on analysis of diversity, exploring gene function and biosynthetic pathways, as well as variation in agri- and aqua-cultural traits.
This will link the tools and methods to genomic data, potentially identifying functions in crops and farmed animals and fish - flagging the genes which could make crops more resilient to warming climates, or farmed fish more tolerant to salty water.
“The exciting advances in high-throughput sequencing and phenotyping platforms will enable us to connect vast genomic data with biological function,” explains Dr De Vega.
“In domains where we have a well-established track record of research - such as unravelling plant bioactives, improving crop resilience, or inferring new gene functions - we’ll contribute to new biotech solutions, food security, and generally understanding the rules of life better.”
Pictured from left to right: Dr Seanna McTaggart, Programme Manager for Decoding Biodiversity, Prof Anthony Hall (left) Head of Plant Genomics at the Earlham Institute and Programme Lead on Decoding Biodiversity, and Dr Jose De Vega, Group Leader at the Earlham Institute and work package lead on Decoding Biodiversity.
The final strand of work will be led by Dr Chris Quince, High-Resolution Microbiomics Group Leader at the Earlham Institute and Quadram Institute.
The focus of this work is on linking fine-scale microbial diversity to ecosystem functions by applying newly-developed tools to complex microbiomes.
Microbes often live in diverse communities. Within these, they interact with and influence each other.
Interactions might include passing genes resistant to antimicrobials around the group, or commensal behaviour, where species of microbes work together. For example, one species of microbe may feed on the waste products of another.
The study of these dynamics are crucial to understanding these groups and how microbes are affected by changes to their environment.
“We’ve historically struggled to look beyond the species level when analysing complex microbial communities,” says Dr Quince. “That means we miss all of the subtlety and nuance that comes from looking at different strains.
“There’s a wealth of information we can potentially uncover, transforming our understanding of anything from soil health to the dynamics of gut microbiota.”
Pictured from left to right: Dr Ilia Leitch from the Royal Botanic Gardens, Kew, partner organisation on Decoding Biodiversity, Dr Chris Quince, Group Leader at the Earlham Institute and work package lead on Decoding Biodiversity, and Dr Daniel Read (right) from the UK Centre for Ecology & Hydrology and Dr Rafal Gutaker (left) from the Royal Botanic Gardens, Kew, both partners on the Decoding Biodiversity programme.
Earlham Institute Programme Manager Dr Seanna McTaggart works day to day on the Decoding Biodiversity strategic programme. Her invaluable work behind the scenes ensures everything goes smoothly.
“I’ve been working on the plans for this for a long time,” she reflects. “There’s some very exciting research at the Earlham Institute and I’m delighted to see work on this new programme starting.”
The programme is highly interdisciplinary and will benefit from collaboration with partners both within and outside the Institute.
As well as the close involvement of the National Bioscience Research Infrastructures at the Institute, the programme includes collaborators at IBM Research, The Royal Botanic Gardens, Kew, the UK Centre for Ecology & Hydrology, and the Institute for Biological, Environmental and Rural Sciences (IBERS).
These partners combine expertise across machine learning, shared computer resources, synthetic biology, software development, advanced sequencing, bioinformatics and genome annotation, and dataset integration and interrogation.
“The next five years are going to be very exciting,” concludes Dr McTaggart.