The Earlham Institute was involved in sequencing, assembly, and analysis of five pilot genomes, contributed to project management, and updated COPO - a data-brokering tool recommended by ERGA to ensure assemblies are linked and standardised from the point of collection.
The project leads believe the continental effort will set the stage for a new, inclusive, and equitable model for biodiversity genomics.
Genomic data holds immense potential to inform conservation actions for endangered species, as well as driving discovery in the fields of human health, the bioeconomy, biosecurity, and other applications.
As the global scientific community strives to unlock the full potential of genomic data, the establishment of a Europe-wide collaborative network under ERGA - the European node of the global Earth BioGenome Project - accelerates scientific progress and facilitates its translation into tangible benefits for biodiversity.
The ERGA Pilot Project was conceived as a route to creating accessible, high-quality reference genomes for all European animals, plants, and fungi.
Other projects, such as the UK’s Darwin Tree of Life project, have shown the power of a centralised approach to biodiversity genomics, with all samples sent to a single sequencing centre. ERGA’s challenge has been to achieve the same success using a decentralised framework.
The Pilot Project has provided valuable lessons and highlighted key challenges, positioning ERGA as a model for decentralised, inclusive, and equitable biodiversity genomics initiatives around the world.
For many of the participating countries and researchers, the project has also offered the first opportunity to actively engage in the generation of state-of-the art reference genomic resources for their local native biodiversity.
Supported by BBSRC strategic funds, research expertise and infrastructure from across the Earlham Institute has been leveraged, leading to new reference genomes for species from Iceland, Malta, the Czech Republic, the Azores, and the UK.
These include Dr Wilfried Haerty and Dr Will Nash, who pieced together the genome of the violet carpenter bee, Professor Neil Hall’s work on the fungus, Professor Anthony Hall’s work on orange foxtail grass, and Dr Karim Gharbi, who has overseen the projects to sequence the Arctic char and the brassica Cardamine caldeirarum.
“We wanted to use our expertise and infrastructure in biodiversity genomics to help demonstrate the feasibility of coordinating the assembly of reference-quality genomes for multiple species from across Europe,” Dr Gharbi said.
“We were particularly interested in capturing a diverse set of genomes across the European tree of life. Most existing technologies and algorithms for reference assembly are optimised for human or human-like genomes.
“Cataloguing the sheer diversity of genomes on a continental scale adds a huge level of complexity, which requires a flexible and collaborative approach.”
The ERGA Pilot Project has helped to identify and address the many challenges of working at the international scale. This includes dealing with the legal and logistical hurdles of shipping biological samples across borders, resource disparities between countries, and striking a balance between decentralisation and the need for standardisation to guarantee only the highest possible quality reference genome assemblies were produced.
This is where the Earlham Institute’s Collaborative OPen Omics (COPO) platform has proved itself invaluable. It helps researchers with uploading, labelling, and tagging their work in a consistent way and is designed to make it easy to share both results and the metadata around them.
Dr Felix Shaw, a research software engineer at the Earlham Institute, said: “Data should be findable, accessible, interoperable, and reusable - FAIR. It’s a fundamental principle, essential for reproducibility of experiments. Metadata is what makes FAIR possible.”
“COPO makes it much easier to prepare metadata for uploading alongside research data. It makes it findable and describable according to agreed terms.”
The Earlham Institute has contributed unique expertise in working with non-model organisms and the integration of wet-lab techniques and bioinformatics across its National Bioscience Research Infrastructure (NBRI) in Transformative Genomics.
This has also included contributions from Dr David Swarbreck, Head of Core Bioinformatics at the Institute, who has a seat on the ERGA genome annotation committee.
Dr Seanna McTaggart, ERGA representative and Programme Manager at the Earlham Institute, said: “ERGA was designed to be collaborative and interdisciplinary, allowing scientists with different specialities from all over Europe to work together easily.
“It allows research organisations like the Earlham Institute – primarily working with technology and data - to collaborate with scientists such as ecologists working in the field, and do so in a really natural and open way.”
Professor Ann McCartney, Member of the ERGA Pilot Committee, Assistant Researcher University of California Santa Cruz, and adjunct Associate Professor at University College Dublin, said “The ERGA Pilot Project attempted to scale the generation of high-quality reference genomes across an entire continent.
“An endeavour of such magnitude was made possible only through its commitment to the principles of inclusion, equity, and collaboration - as well as the dedication of its diverse, transdisciplinary, and cross-sectoral participants.
“I feel incredibly lucky to have worked alongside such an amazing group of colleagues to help kickstart the construction of a genomics encyclopaedia of European species.”
An article and accompanying collection of research publications about the European Reference Genome Atlas Pilot Project are published today in npj Biodiversity.