Could long-read RNA sequencing be the future of drug discovery?
New research using long-read RNA sequencing provides a robust platform for the future discovery of novel gene isoforms and future medicines.
New research using long-read RNA sequencing provides a robust platform for the future discovery of novel gene isoforms and future medicines.
It’s the result of a successful collaboration between the Earlham Institute and the University of Oxford, which draws on crucial strategically funded UKRI-BBSRC capabilities in Genomics and Advanced Scientific Training here at EI.
Long-read sequencing technologies enable us to see far more than meets the eye in any genome or transcriptome and have the potential to add many novel gene isoforms to existing annotations.
In a paper published in BMC Genomics, Earlham Institute and University of Oxford researchers show quality control steps are vital to ensure what we see is a true representation of features that are eventually expressed - revealing some interesting new transcript variation in the process.
Genes contain many elements, including various splice junctions that combine in different ways to give any one gene multiple possible mRNA transcripts.
The regulation of this is key for cell differentiation and complex tissue formation - and different transcripts can have very different, even opposing effects. Some are strongly implicated in disease, so it’s useful to identify and understand them.
Just how many potential transcripts there are is very much up for debate, however, particularly since the dawn of long-read sequencing technology such as the Oxford Nanopore MinION.
Able to capture whole gene isoforms, long reads are leading to thousands of novel transcripts being added to the potentially-functional pool with each and every new study.
But how many of these are real?
Dr Wilfried Haerty and Dr David Wright have been working with collaborator Professor Liz Tunbridge at the University of Oxford, as well as the Genomics Pipelines and Core Bioinformatics Group here at EI, to demonstrate how we might answer that question.
“Everybody's using long reads now,” says Wright, first author of a recent study investigating potential novel transcript diversity in neuroblastoma cells. “But few studies take the time to check the sensitivity - how well long-read sequencing can actually quantify expression of isoforms.”
The TALON pipeline, published by Wyman et al. of the University of California, was used to generate a custom annotation of 3274 novel transcripts before validation using short reads.
Unlike many recent papers that reveal novel transcripts - occasionally in the tens of thousands - the study reports a relatively modest final figure of 2567, one-third of which are putatively protein-coding.
While not boasting huge numbers, the method demonstrates how to generate realistic and conservative estimates of novel transcripts - offering a better chance of finding isoforms of significance.
The team employed a suite of control steps and some stringent filtering, including the use of the Sequin ‘spike-in’ platform - artificial transcripts that can correct for levels of error in RNA sequencing and provide a measure of sensitivity.
“Spike-in has been around for decades, but not many have looked at its validity for this kind of work,” explains Wright.
“We spent a lot of time checking how well it can catch the biological reality of what we're looking at. We wanted a well-documented study on how to really think about what you’re doing.”
This project would never have been possible without a combination of strategically funded UKRI-BBSRC ‘National Capabilities’ at the Earlham Institute working alongside our research faculty and external collaborators.
The project was conceived as a collaboration between the Genomics Pipelines Group, the Core Bioinformatics Group, and the Evolutionary Genomics Group at the Earlham Institute, while the University of Oxford’s Professor Liz Tunbridge came on board to provide expert biological insights into the idea of using long reads for isoform detection and quantification.
Our Advanced Scientific Training team provided the catalyst for work to begin, with a Year in Industry student, Will Glynn, working with our Genomics Pipelines Group, experts in technical development and advanced sequencing methods.
This combination of facilities is both unique and special. Combined with our other capabilities in e-Infrastructure and the DNA Biofoundry, it offers an unparalleled opportunity to bring together diverse expertise to truly advance and innovate impactful, data-driven bioscience.
A particular finding of interest from the study was a novel isoform of CACNA2D2, a voltage-gated calcium channel subunit. It was enabled by looking for differential expression of transcripts when neuroblastoma cells began to differentiate.
Findings such as this carry great relevance, as calcium channels play such a vital role in cell signalling and neuronal function. Mutations in the same subunit have been studied due to their links in epilepsy and ataxia.
It is yet to be determined whether this particular isoform is functional or not, as it lacks a signal peptide important for membrane integration. However, the subunit was validated in the lab by RT-PCR, and appears to be downregulated as cells differentiate.
Perhaps switching between active and inactive subunits could play a role in the regulation of signalling via calcium channels, preventing its activation in undifferentiated cells.
It’s this study of transcript diversity - rather than focusing purely on gene-level expression - that Haerty believes can massively augment our understanding, and is where long-read sequencing comes into its own.
“When you look at the gene level, you miss that dynamic expression and regulation between different transcripts within a gene,” he explains.
“Transcripts can have dramatically different sequence composition and totally different consequences. Some of the transcripts will regulate very specific downstream pathways. In fruit flies or fish, it can mean the difference between generating a male or a female.
“With long reads, you can capture that variation. What might be surprising is that we’ve been able to find transcripts that are differentially expressed - even in genes that are not. ”
Of over 4000 genes that had differentially expressed isoforms identified in the study, 1276 were not themselves differentially expressed at the gene level.
While the transcripts themselves have been identified, there’s a lot of work down the line that will go into discovering which play important roles in health and disease.
That’s where collaboration is key, including projects focused on marrying the blue sky research with the industries that can make an impact, such as the recent Psychiatry Consortium funded project between EI, Oxford, and industrial partners.
“There’s a lot to be said about the nature of good collaboration with this project,” says Wright. “We’re very lucky to work with Liz and her team at Oxford. It’s a real joy.”
It’s a feeling reflected by Haerty, who is grateful for the fantastic UKRI-BBSRC funded resources we have here at the Earlham Institute to enable such research.
“This work is something that could not have been done without the strategic funding we get from UKRI-BBSRC. Thanks to the capabilities in Genomics and Single Cell Analysis, and Advanced Scientific Training, we were able to deliver something that otherwise would not have been possible if it was just us alone.
“We need that knowledge and expertise of Genomics Pipelines, and the opportunities for developing young talent that the Training team offers. It’s a special combination of resources that can advance our field and contribute important impacts now and in the future.”
Long read sequencing reveals novel isoforms and insights into splicing regulation during cell state changes. BMC Genomics 23(1):42.
Wright DJ, Hall NAL, Irish N, Man AL, Glynn W, Mould A, Angeles AL, Angiolini E, Swarbreck D, Gharbi K, Tunbridge EM, Haerty W. (2022)