Code for crops: Training Vietnamese Rice Breeders at EI
At EI, six breeders from Hanoi learned the basics of bioinformatics, which will help them breed drought and salt resistant rice in Vietnam. Here’s their story.
After the success of our first trip to Vietnam to introduce ‘big data’ analysis to the rice breeders of the Agricultural Genetics Institute (AGI), three months on we have been delighted to host six of our collaborators and friends from Hanoi in order to teach them how to code and apply bioinformatics to their work on rice breeding (and give them a taster of an English pub lunch).
Read on to discover how we have been training Vietnamese rice breeders, what they learnt, and what they think of English food ...
For three weeks, researchers from AGI have been closely shadowing and learning from some of our best bioinformaticians, who have been bequeathing their computing knowledge with a range of applications in order to allow our Vietnamese collaborators, in the words of Graham Etherington, “to confidently make the leap from biology to bioinformatics.”
On day one of the visit, we all congregated in the Wallace Room at EI, where our researchers introduced the topics they’d be teaching for the next few weeks. Each trainee would be designated a specific project, so our bioinformaticians tried to plug their ideas to the group.
From polecats to Arabidopsis, sugar beet, wheat and rice, the projects covered a range of research that we perform at the Earlham Institute, much of which is directly relevant to the genomics research that will be undertaken in Hanoi.
Techniques to be covered included: assembly and assessment of sequence data; SNP analysis and mapping of genes to reference genomes; analysing mutations and their biases; identifying motifs and protein analysis; genome annotation; studying segregating populations; comparing the divergence of crops from their wild relatives; and employing more efficient methods to sequence multiple rice genomes.
But before commencing their projects, on the morning of day two, our Scientific Computing Specialist Luis Yanes presented an introduction to scientific computing, during which our visiting scientists were led through the different high-performance computing tools they would be using at EI, and how to access the computer cluster to run tasks.
This was also a nice lesson for the author, who should probably learn to code at some point while working at an institute such as this. (Lessons booked in with Luis and Jessica Jordan, who will ease me in with some JavaScript - keep your eyes peeled for the journey ahead).
The trainees were also given a guided tour of EI’s unique high-performance computing facilities, and the supercomputers that would help them align and annotate genomic sequences during their training projects.
The first of the projects to be selected was that of Graham Etherington, Dina Raats and Paul Bailey, who would collectively deliver training in a three-part project. It was Duy Duong Tran who took up this project, who has a background in rice breeding and biotechnology but aims to learn how to use bioinformatics tools to identify and compare traits in rice plants, such as disease resistance and stress tolerance.
Part one, with Graham, would look at assembly of the European polecat genome and assessment using gene orthologues. Dina would then introduce SNP analysis of the wheat genome, using tools to map genes to a reference and then identify and analyse mutations. Part three, with Paul, would then look at exome capture and identifying homologous regions in the three chromosome copies of wheat’s obscurely repetitive genome, as well as variation in gene copy number.
Then, Van Tuan Pham was then very keen to take on Luca Venturini’s project, which involves wheat genome annotation. In this project, Luca would firstly go through how to get an annotation using the model species Arabidopsis thaliana, including RNA sequencing assemblies. He would then go on to show how to understand what information is real and what is most likely false, before later looking at applying these techniques to the - much more complex - wheat genome.
José de Vega’s project was taken on by two of our guests, Thi Thanh Ha Dang and Thuy Diep Nguyen, who would probe the very rice genomes that EI and AGI have been collaborating on together, supported by the Newton Fund.
At EI, we have sequenced many samples of Vietnamese rice in different batches. In order to speed the process of sequencing and assembling multiple hundreds of samples, José would guide the trainees through two methods - with one getting high coverage of just 20 samples, and the second using many samples but at low coverage. The aim would be to see if method number two could be used in conjunction with the first in order to inform results (in a much quicker time).
Thi Thuy Tran selected to work with Ricardo Ramirez, a PhD student at EI, who is interested in identifying polymorphisms in wheat from segregating populations. He takes lines that are resistant or susceptible to disease, crosses them, then analyses these. This project would introduce exome capture, checking polymorphisms and scoring by phenotype - linking single nucleotide polymorphisms to a specific trait.
Another exciting project undergoing at EI is the analysis of sugar beet, in particular looking at differences between cultivated and wild varieties during the process of domestication. Thi Loan Nguyen took on this project who is particularly looking to apply bioinformatics to rice breeding. This project, led by one of our most experienced and talented bioinformaticians - Janet Higgins - would be using the exact same pipeline that AGI and EI would be using to analyse rice, therefore, presented a highly useful introduction to bioinformatics techniques.
After a busy first week of learning bioinformatics from scratch - coding as they went - it was time to relax in Norwich. Several trainers and the trainees all met in the city centre, then had a wander around the castle.
On a gloriously sunny day, it would have been rude not to appreciate the finer side of life in Norwich, namely a pub lunch and a pint - followed by a tour of Norwich’s spectacular cathedral. This behemoth is one of the largest in the country, and our guests were duly impressed by its beautiful stone features. You sometimes don’t appreciate what’s on your own doorstep.
From the cathedral, we had a wander down the banks of the River Wensum - strongly recommended for a canoe trip of an afternoon - and we admired the various bits and bobs remaining from the old medieval city walls, which add a delicious touch to the centre of Norwich, in addition to its plethora of flint-strewn churches and pubs.
Speaking of pubs, we then hit what must be one of the oldest licensed premises in the country. The Adam and Eve pub, though the building is only a few hundred years old, has been a licensed beer-vending institute since the thirteenth century ...
… I’ll have a flagon of mead, please! (We actually had Kronenbourg, but I’ve always wanted to say that).
This experience was a novel one for the trainees and trainers alike, and both parties will go away having had a very positive experience.
So what did the trainers want their students to take away from their three weeks in Norwich at EI?
According to José de Vega, “bioinformatics is not difficult once you control a few basics. Their impact (the trainees) has to be in the results and their analysis, and not in the methods - that’s research for other guys.
“Tools aren’t important, often there are multiple options of doing something and it is easy to find new ways if the basic ideas are clear.”
As far as basics go, Janet Higgins, Ricardo Ramirez and Paul Bailey aimed to equip trainees with basic tools that they could apply to rice breeding back in Hanoi; going through the basics of mapping reads to a reference and using tools to help spot variants, which is one of the major applications for rice breeding projects.
Ricardo said, “I want my student to be able to at least find a lot of candidate SNPs. So, it’s from QC in the sample to filtering an SNP list, using various bioinformatics tools to do this. The training project started slowly, but I think we have got into a bit of a rhythm!”
Janet also stressed “the importance of understanding your data and ensuring it is of high quality before doing any downstream analysis,” while Paul agreed, ensuring “an understanding of the process, and command line operations that should be useful in the future for in-depth data analysis.”
Janet believes that the trainers will leave better equipped in using bioinformatics, adding, “they will be able to take away the skills learnt here at the Earlham Institute and apply these to their research projects back in Vietnam.”
José agrees, “definitely, they can now run commands and work in a HPC (high performance computing) system. They can read a pipeline in a paper and understand what the authors are talking about.”
José also found the experience very rewarding, adding, “when you try to explain something to a student, it forces you to think through the important bits of information and the best way to teach it.”
Janet had a similar experience, telling us, “it is very rewarding being given the opportunity to train students from countries where there is a skill shortage in specialist and rapidly developing fields such as bioinformatics. This will enable the students to take back state of the art skills which can be used to carry out important research relevant to their country.”
On their last Thursday, our guests from Hanoi treated us to a home-cooked feast of delicious Vietnamese food, with a variety of spring rolls, cooked meats and rice dishes.
We gathered around the pool at the rec centre and enjoyed our meal in the late-summer (and surprisingly warm) English sunshine, which was a fitting end to a great experience for all involved.
Tran has also sent us 64 pages of Vietnamese recipes, which we’ll translate into our next installment of our food guides, which we know the world has been waiting for...
Tran Duy Duong said, “I have enjoyed Norwich. Actually, people here are very nice and the weather has been beautiful! (We won’t break it to him that this is somewhat of an anomaly, mainly so that he’ll come back).
“I feel better equipped to do bioinformatics research, as it is very important to understand molecular biology and DNA sequencing.
“The most interesting thing I have learnt is how to do mapping in wheat, command lines, as well as mutant calling, alignments and identifying SNP candidates.
“When I go back to Vietnam, I will apply this knowledge to my work - using mapping and variant calling in my rice research.”
Thuy Diep Nguyen added, “I’ve enjoyed my stay in Norwich - I feel warm and safe when I’m here.
“Your computer system is very good and appropriate for bioinformatics research. From this training course, I have learned and understood pipelines for SNP calling and genome analysis, while using and understanding some command lines in Ubuntu.
“I feel that bioinformatics is a giant person - it can solve a huge workload in a short time."
“I will use my knowledge in this training course to find SNPs and perform genome analysis on local rice varieties in Vietnam to help serve our rice breeding programme.”
Importantly - I asked our guests what they thought of English food (important considering how much better the food I just ate was … ).
Tran said, “It’s ok, I enjoyed it - most of it is clean and safe.” (Better than nothing).
Thuy Diep added, “The food here is very diverse and safe, but there is some that doesn’t suit my tastes, such as bread, butter and fatty food.” (No pies then … )
This is not the end of our collaboration with AGI. In January 2017, we will continue to build upon our training both in Vietnam and at EI by revisiting Hanoi. This time, we will be training our newly-inspired trainees into how to disseminate their training to their colleagues.
This will ensure that the basic toolkit we have provided can be carried on at the AGI, in order for rice breeders to continue to use modern computing and bioinformatics techniques, going forwards. Considering the rapidity of climate change, as well as the increasing encroachment of salt water into the river deltas of Vietnam, the ability of rice breeding programmes to adapt more quickly to these changes is incredibly important.
Through understanding and applying the principles of bioinformatics to identify unique variants that are tolerant to a suite of stresses, including salt stress, drought, flooding and emerging pathogens, the AGI can help to maintain Vietnam’s future abundant rice yields.
This training was part of a project run under the British Council programme Institutional Links, with funding from the Newton Fund, which is managed by the Department of Business, Energy and Industrial Strategy.