It once took thirteen years and three billion dollars to sequence the human genome. Now, for a fraction of this amount, we can do the same for a wheat genome, five times larger and vastly more complex than our own, in just under three weeks.
However, the complexities associated with putting the pieces of the jigsaw together are still immensely challenging. The wheat genome is full of repeated elements, which confound any attempt to put them in a logical order. Composed of three ancestral organisms; it’s like having 3 parents, from 3 different species, so sometimes it’s just too difficult to identify which bit of DNA belongs to which ancestor.
This isn’t restricted to wheat. Many of the genetic elements that we want to identify in order to select for disease resistance, for example, are highly conserved, therefore picking these apart presents a similar challenge. Then there is non-coding DNA, the stuff between the genes, which we now know can also have an evolutionarily-relevant function. So how do we identify meaning amidst the noise?
There is also the matter of the sheer amount of data that needs to be both stored and analysed. The data generated by DNA sequencing laboratories worldwide requires huge numbers of calculations to process data to information, with annual data growth predicted to increase beyond 4000% by 2020. We have supercomputers that can process millions of times more data than the average laptop, yet even this capacity is being overwhelmed, while powering and maintaining these systems presents an increasing problem in terms of energy and cost efficiency.
In order to respond to these significant challenges, we are at the forefront in taking up and developing the latest technologies, from DNA sequencing equipment to novel supercomputers, while fostering open science through international collaborations to enable the advancement of genome analysis and life sciences research.