Shaping the future of data-intensive bioscience
Professor Irene Papatheodorou explores how living systems work using computer science, developing and implementing new approaches to turn data into novel biological insights.
As the Earlham Institute’s Head of Data Science, Professor Irene Papatheodorou is spearheading strategic initiatives that bridge the gap between biology and data science.
In addition to leading one of three strands of the Cellular Genomics strategic research programme, she oversees the Institute’s data science strategy.
Irene's journey into the world of data science began with a deep-seated curiosity about nature, coupled with a passion for mathematics and computing.
“Bioinformatics immediately felt like the natural field for me to work in as it enables me to pursue both interests,” she reflects.
Prior to joining the Earlham Institute in February this year, Irene spent eight years at the European Bioinformatics Institute (EMBL-EBI) in Cambridge, where she led the Gene Expression team and oversaw the development of large-scale data integration projects including the Single Cell Expression Atlas.
Irene was drawn to the Earlham Institute’s strengths in both experimental biology research and data science.
“Most organisations primarily specialise in computational research or focus on generating biological data,” Irene explains.
“The Earlham Institute is unique in that it is meeting the demands of both, developing computational tools, infrastructure, metadata standards, and using AI applications to glean new insights from the data it generates.”
In her role as Head of Data Science, Irene is able to work at the forefront of developing services and databases that benefit the research community, as well as contribute to the Institute’s strategic research programmes.
As part of the BBSRC-funded Cellular Genomics research programme, she is continuing her own research on cell atlasing.
“Cell atlases provide a comprehensive view of gene expression patterns in cells within a particular organ or tissue,” she says. “They’re a rich resource for understanding cellular diversity, dynamics and evolution”
Irene’s group is responsible for building a data infrastructure and ecosystem that is FAIR — Findable, Accessible, Interoperable, and Reusable.
“We’re developing resources, tools, and databases to be able store the data that we generate and label it accurately,” she says.
“We want to capture all the metadata needed to reanalyse and share data generated by the Institute, or used by the Institute but generated by others.
“And we want to make the infrastructure for analysing the data easy to use, flexible, and scalable, as we keep generating more datasets and different types of data.”
For example, the Papatheodorou group manages the Collaborative OPen Omics (COPO) platform, which helps researchers describe, store, and retrieve data more easily. COPO is used widely by researchers across the UK and Europe.
Irene is also heavily involved in the work of the core bioinformatics team, who are responsible for genomic and imaging data analyses.
This team develops accessible tools for both computational and experimental scientists, enabling them to apply machine learning and AI approaches and to visualise their results in a clear way.
“By bringing together different data scientists and software engineers, we can respond to the needs and requirements of the biologists we are supporting at every stage - from data acquisition and management to data analysis and visualisation,” Irene says.
The Institute’s commitment to open science and collaboration means that all the data and analysis tools generated are freely accessible to the wider scientific community.
Irene is heavily involved in the Earlham Institute’s Cellular Genomics strategic research programme, which is investigating the effects of changes in gene expression at the cellular level in plants and animals.
The results of this 5-year, BBSRC-funded programme will help to expand our understanding of how cells function and organise themselves within different tissues.
Characterising the cellular responses to developmental and/or environmental cues, and how they contribute to organismal responses, is fundamental to learning more about processes such as ageing, disease, and even the development of resilience to climate change.
The Data Science for Cellular Genomics strand of the programme that Irene leads is developing methods for single-cell computational genomics.
Her wider research group, which is in the process of transferring from EMBL-EBI, is working on multiple collaborative projects, including a human Gut Cell Atlas for Crohn’s disease and the Biodiversity Cell Atlas.
“With researchers at the University of Edinburgh, we’re looking at RNA expression in single cells from gut samples from both Crohn’s disease patients and healthy individuals to understand the cell types and activity driving the disease, which could lead to new therapies,” Irene says.
The Biodiversity Cell Atlas is an ambitious project led by the Centre for Genomic Regulation in Barcelona, Spain.
It aims to chart the diversity of the different cell types within multicellular organisms. The project has recently kicked off by building cell atlases for five aquatic species.
“I’m really looking forward to looking at the similarities and differences between cell types in different species and mapping how anatomical function changes during evolution,” she adds.
Irene’s day typically involves multiple meetings with team members and collaborators, preparing presentations, along with writing and reviewing grants and manuscripts.
“What I enjoy the most is working in a team to develop computational tools that are used by the community to advance research,” she says.
Irene also sits on the Earlham Institute’s Executive Team, where she is keen to support initiatives that foster a positive research culture and promote women in science.
The field of computational science has traditionally been seen as male dominated, particularly when it comes to more senior research roles. But Irene is helping to change that, and is buoyed by the Institute’s Inclusivity, Diversity, Equality, and Accessibility agenda, as well as its Athena Swan Bronze award.
Balancing all of her roles, responsibilities, and interests can be a challenge, but she is the first to admit that it is an exciting one.
“The field is advancing so fast that it can be difficult to decide which question to tackle first!”
Authored by Monica Hoyos Flight, writing for the Earlham Institute.
Data-intensive science for cellular genomics
COPO is a portal to describe, store and retrieve data more easily.
We are proponents of open science and actively contribute to the development of tools and standards that ensure scientific data is FAIR.