Why IBD happens: <br />It's complicated
Inflammatory bowel disease (IBD) is on the rise in the UK, but what’s behind it? It’s a complex problem, requiring data-driven platforms to answer.
Inflammatory bowel disease (IBD) is on the rise in the UK, but what’s behind it? It’s a complex problem, requiring data-driven platforms to answer.
Inflammatory bowel disease (IBD) is on the rise in the UK and across the world. In fact, between 2000 and 2016, the prevalence of IBD in the UK almost doubled, now affecting 418 of every 100,000 people. But what’s behind this increase? It’s a complex problem, requiring sophisticated platforms and systems-based, data-driven approaches to answer.
Dr Paddy Sudhakar is a postdoctoral scientist in the Korcsmaros Group at EI and recently co-authored a review, led by collaborators at KU Leuven, on why it is challenging to understand the roots of IBD and how data-driven and systems-based approaches can help us to untangle this complex disease.
You can read the review, ‘Big Data in IBD: big progress for clinical practice’, in Gut - a leading international journal in gastroenterology.
IBD refers to diseases which are characterised by inflammation of the bowel. It can be divided into two major subtypes, namely Crohn’s disease (CD) and ulcerative colitis (UC). In CD, the inflammation can occur in any part of the intestinal tract, while in UC the inflammation is confined to the colon - as the name suggests.
Studies have indicated a combination of extrinsic (diet, lifestyle, smoking, use of antibiotics etc.), intrinsic (genetics, immune system), and intermediary factors (gut microbiota) which may contribute to the development of the disease.
Although associations have been made between various extrinsic, intrinsic, intermediary factors and the prevalence of IBD, we don’t yet know the sequence or exact combination of events which drive it. For example, it is known that the microbiome of IBD patients has reduced diversity compared to healthy subjects.
However, despite the many studies reporting this observation, it is still not clear this disruption to the gut microbes is the cause or the symptom of IBD. In the same vein, genetic mutations which have been linked to IBD susceptibility do not explain disease inheritance.
Recent studies have explored the role of early childhood events such as exposure to antibiotics or various dietary components and linked them to IBD development in early or mid adulthood. In addition, there are population and ethnicity specific differences.
Hence, going by current evidence, it seems likely that multiple factors at different stages of the lifespan are involved in complex interactions for the disease to manifest itself. So, from a pragmatic point of view, understanding the root causes of IBD becomes a daunting task, which in turn has a direct effect on how the disease can be treated, monitored and managed.
Current treatments include a range of anti-inflammatories and small molecules. These medicines target certain molecular pathways for reducing inflammation. However, most therapeutic regimens are not very effective, with treatments failing on average in about 70% of IBD patients. On top of this, patients often experience undesirable and painful side effects, such as skin lesions.
The disease itself is very complex. For example, based on disease behaviour, CD can be classified into three categories: inflammatory, fibrostenotic and penetrating. Usually, CD starts with the inflammatory phenotype and progresses towards the fibrostenotic or penetrating phenotype. To complicate it further, CD can also be classified based on its anatomical location: colon, ileum or both.
Since the mechanisms driving the different disease subclasses and the transitions between classes are not entirely clear, current treatments are not tailored to handle disease heterogeneity. This makes it difficult for clinicians and physicians to come up with effective and sustainable plans for disease treatment and management.
There is certainly a genomic contribution to the disease but we cannot understand the disease by looking at genomics alone. The analogy I tend to use is that of the dam.
There may be cracks in the dam but whether or not the cracks make the dam burst depends on the volume of water and debris imparting pressure on the dam. The cracks can be likened to the mutations at the genetic level and the water or debris to the environmental factors.
Also, based on observations from previous research, mutations in DNA do not explain in most cases why certain people develop IBD and how. Another complexity is the occurrence of mutations in a cell-type specific manner.
We would gain a much better understanding of IBD by using larger sample sizes and datasets. These would include genetic mutations, transcriptomics, proteomics, and metabolomics, along with information about the microbiome and environmental factors.
Various groups have started to collect such multi-omic datasets from tissue and blood samples, as well as patient-derived organoids - a recently developed system which allows us to test personalised medicine approaches through making a miniature version of an IBD patient’s own gut.
Of course, having more data and bringing all of this together to paint a picture of whole systems at work requires sophisticated analysis to tease out causes and effects. Thankfully, we now have access to advanced computational analytics, such as machine learning, which helps us to uncover hidden patterns in complex datasets with many different factors at play and relate those patterns to clinical information.
While machine learning provides us with a wide range of tools and resources to analyse the large complex datasets, systems biology - looking at the big picture - lends the biological context needed to infer and interpret the functional significance of the results.
Machine learning approaches can generally be classified into supervised and unsupervised models. Supervised methods use the sample labels to filter out features while unsupervised methods do not use sample labels. Machine learning methods also enable users not only to analyse singular data-types in isolation but to do it in an integrated manner.
For example, the recently published Multi Omics Factor Analysis (MOFA) based on a probabilistic Bayesian framework is tailored to capture hidden factors which capture the orthogonal variance in multiple data-types. The weights assigned to the samples (which can either be patient subjects or in-vitro specimens such as organoids) by such factors can be linked to binary or continuous variables representing the clinical phenotypes.
Systems biology approaches typically involve the use and integration of biological networks consisting of molecular interactions, -omic read outs and phenotypic information of the samples. They can be broadly classified as top-down data-driven approaches, bottom-up hypothesis driven approaches or a combination of the two.
Systems approaches can also be categorised as being qualitative or quantitative: the former representing methodologies which assign binary (present or absent; stimulatory or inhibitory, etc) relationships among variables (proteins, genes, etc), while the latter approach assigns magnitude (strength) to such relationships.
Generally, a common thread with systems biology approaches involves the use of quality-controlled biological networks which provide the biological and functional context to interpret the -omic datasets.
Since IBD is a complex disease, which develops over time, it would be ideal to generate molecular profiling datasets from patients over an extended period. However, since this needs to be performed prospectively, it presents a challenge of identifying potential patients for the profiling.
One non-ideal solution would be to profile those patients with early symptoms of IBD. Publicly-available repositories with historical records of molecular profiles (though most of these are low-throughput) over time do exist. However, some of these datasets are sparse (not all molecular layers profiled) and often characterised by missing data points.
Lack of cell-type specific datasets from IBD patients is an even greater conundrum, especially given the fact that IBD is associated with distinct signatures in various cell-types. Other challenges include the non-trivial nature of algorithms for clinical use and interpretation, uncertainties associated with biological networks, lack of big datasets which enable proper training and testing and, in cases where only machine learning is used, the lack of mechanistic explanations due to the fact that most machine learning models are like black boxes.
Progress is being made, as reported in our review and in various studies performed by our group and our collaborators in KU Leuven, who led this review, to disentangle the molecular drivers and mechanisms which make IBD such a complex, multi-dimensional disease.
Despite the challenges I’ve described, we now have access to clinical repositories with stored specimens - and the combination of these with big data, machine learning and systems biology provides us with the necessary tools to properly investigate, and eventually treat, IBD.
Most importantly, we need clinicians and clinical researchers to be open-minded to such multi-omics data collection and advanced data analysis. We are fortunate to work with the IBD team in Leuven for this reason.