Bread and butter: decoding wheat

Integral to the international effort that brought us the bread wheat genome, decoding this complex, hexaploid crop - from sequence to assembly, annotation and even mapping the epigenome - has been at the heart of what we do at Earlham Institute.

In January 2014 the International Wheat Genome Sequencing Consortium (IWGSC) whole genome dataset for bread wheat was made available on Ensembl Plants, giving plant breeders and researchers an unprecedented tool to improve crop yields - and access to a genome sequence including over 100 000 genes.

Then known as TGAC, Earlham Institute was integral to this effort, sequencing 14 out of the 21 wheat chromosomes and generating all of the chromosome assemblies. This was enabled by advances in automation at EI which allowed rapid and cost-effective sequencing of crop genomes.

Since then, at each and every step of the way, Earlham Institute researchers have developed the algorithms and tools that have brought us towards the almost-complete version of bread wheat that we have today. As part of the Designing Future Wheat programme, we are now using this knowledge to help ensure that our wheat yields match the demands of a rapidly increasing global population which relies heavily on this crucial resource.

As bread wheat was such a complex genome to tackle, the tools and methods developed at EI are not only useful for decoding wheat, but can be applied to a swathe of similar problems in other organisms with complex, polyploid genomes.

So, what have been our major contributions to wheat, and what is the future of wheat research at the Earlham Institute?

Earlham Institute has been involved at every stage of the process for decoding, assembling and annotating the wheat genome, working towards achieving global food security.

Bread and butter decoding wheat growing field 770

Building on the bread wheat genome

Bread wheat comes with many complications when it comes to genome assembly - the fact that there are actually three plant genomes combined into one as a start. The sheer size of the genome, five times that of our own, combined with an abundance in repetitive elements (which make about 80% of the DNA sequence) along with the duplicated, highly similar genes present across three sets of seven chromosomes add to the challenge.

After the initial release of the wheat genome in 2014, Earlham Institute were quick to follow this up with an improved assembly in 2015, for which Bernardo Clavijo made major modifications to the DISCOVAR software (developed by the Broad Institute in the USA for the analysis of human genomes), in order to distinguish repeat sections and give us the maximal coverage of the genome.

To achieve this, newly generated, high-quality input data was required, which was enabled thanks to the high capacity and skill of EI’s Genomics Pipelines team. There was also the requirement for significant computing power, which was provided by EI’s specialised high performance computing facilities, which were specially configured to run the three week long assembly. In fact, we won a supercomputing award for our wheat research, for the ‘best use of HPC application in life sciences’.

Helping to ensure, in the spirit of open science and as a founding principle of the UK Wheat Initiative, that data could be shared amongst the research community, the new genome was made available on EI’s Grassroots Genomics platform for BLAST searches, before the full data set, including annotated genes, was made available on EBI’s Ensembl Plants.

Grassroots Genomics is still going strong. Led by Rob Davey, Xingdong Bian and Simon Tyrell design and manage the platform to provide a versatile data repository, analytical services and enable marker assisted breeding through a 100% open source infrastructure that is freely available to researchers and the public.

Another outcome of EI’s work on the first wheat genome was the release of w2rap: a bioinformatics pipeline that can decipher complex genomes, not just of wheat, and produce robust assemblies in conjunction with the best next generation genome sequencing methods.

Separating the wheat from the chaff

Further advances in our understanding of the bread wheat genome were made at EI in 2017, when EI published an improved assembly and annotation of the bread wheat genome that called upon the latest advances in genome sequencing and assembly technologies to identify 104,901 protein-coding genes - a fifth of which were absent from previous assemblies, or present as fragments. We also published a further six reference wheat genomes that reflected some of the diversity in UK varieties.

The successful annotation of these genes, described as a satnav for wheat, was enabled thanks to combining the use of both Illumina short reads and longer, full length cDNAs obtained using the PacBio RSII platform (which EI were the first to adopt in the UK, and are now the first to adopt PacBio’s latest platform, the Sequel II). This combinatorial approach is paramount when assembling and annotating complex genomes, as the longer reads help to bridge the gaps and better align the shorter reads - particularly across repeat regions of DNA.

Another useful aspect of using the long reads is that it is possible to identify duplicate genes with very similar sequences, which assemblers can sometimes overlook where there is perhaps only a small, single nucleotide polymorphism involved. Such techniques, developed through collaborations on NRP, such as RenSeq SMRT, are useful for sequencing whole gene regions and have been applied in projects that look to clone disease resistance genes from wild plants and introduce them into crops.

With the culmination of the IWGSC effort published in 2018, EI’s expertise in assembling and annotating complex genomes was brought further into the fore. The year previously, Bernardo Clavijo’s KAT had been used as part of a groundbreaking international collaboration which changed the face of how we tackle large genome sequencing projects and contributed the first near-complete bread wheat genome assembly.

With the final IWGSC release came a detailed annotation of bread wheat that was enabled through the use of Portcullis and Mikado, two pieces of software produced in EI’s Swarbreck Group. With Portcullis, we are able to better distinguish between splice junctions in genes with a precision of ~95-99%, while with Mikado, we have a way to help assemblers correctly identify genes which are sometimes lost in the process due to the presence of highly similar (almost identical in some places) regions of DNA.

EI is at the forefront of using the latest sequencing platforms, including the Illumina NovaSeq for short-reads and the PacBio Sequel II for longer full-length cDNAs.

Bread and butter decoding wheat PacBio Sequel II 770

To infinity and beyond!

Now that we have a very much nearly complete bread wheat genome, the challenge does not stop there. EI is part of the BBSRC’s Designing Future Wheat programme, for which we lead on ‘improved data access and analysis’. Our work in this area will promote a ‘genomic supermarket’ (albeit a free one with no restrictions) to compile the diffuse and vast wheat datasets that are required for large scale complex comparative analyses.

Linked to gaining a better understanding of wheat, we are also part of collaborative efforts such as the 10+ wheat genomes project, which is based on EI’s w2rap assembly algorithms and pushes us towards the pan-genomics era for wheat - the goal being a description of all of the genes and genetic variation in each and every wheat cultivar, which would tremendously advance our understanding of how to improve the crop from a global perspective.

Another area where EI is advancing the field of wheat genomics is through gaining insights into the epigenome of wheat - the genetic regions that undergo methylation in response to environmental cues and can give us a better picture of how local wheat varieties are adapted to specific conditions. The Anthony Hall group found geographical patterns of epigenetic changes between 100 different landraces of wheat in the UK, an exciting finding that opens up new possibilities for breeders looking to develop more climate-resilient crops.

The scope for improving crops increases with each incremental advance in wheat genomics, the goal still being to quickly and stably breed varieties that are resistant to a rapidly changing climate and emerging pests and diseases, all while maintaining good nutrition and providing increased yields for a skyrocketing global population.

Though it’s a tough challenge, it’s one that EI, along with partners on the Norwich Research Park and beyond, is taking in its stride. Armed with such abundant data and the increasing capacity to analyse and share the insights gained, hopefully we’re but a step away from the next green revolution.

Bread and butter: decoding wheat

Building on the bread wheat genome

Separating the wheat from the chaff

To infinity and beyond!

Related reading.

Wheat Side Story

New wheat varieties: why is mapping essential?

Day in the life of … a wheat bioinformatician on her own research path

Earlham Institute helps to finally crack the wheat code

Can we produce a better wheat crop to feed the world? Single to multiple wheat genomics

Epic genetic: the hidden story of wheat

The Earlham Institute (EI) announces an important milestone in wheat research

Earlham Institute receives supercomputing award for wheat research

Sat nav for bread wheat uncovers hidden genes

Scientists uncover hidden wheat treasures

PubhD - Cereal killers, hybrid monsters, yams and wheat breeding

Improving photosynthesis to increase wheat yield

Genetic diversity in wheat