The Frozen Zoo: thawing mammalian secrets for human health
Comparative genomics analysis of hundreds of mammals puts human genome in greatest resolution yet, with medical applications from COVID to cancer.
Deep in the vaults of San Diego Zoo, a resource has been sitting untapped: the frozen tissues of mammals both extant and extinct. A global collaboration has sequenced and assembled the genomes of hundreds of these mammals, providing an invaluable open access resource for comparative genomics research.
It allows us to visualise the human genome in greater resolution than ever before - with countless medical applications from COVID to cancer.
A walk through San Diego Zoo is a journey through some of life’s most vibrant yet endangered biodiversity. Leopards skulk among branches, leering as you pass. A mandrill yawns, baring his long canines in the lazy morning sun. Rhinos, almost extinct in the wild, swish their paintbrush-like tails as they roam, kicking up small clouds of dust. Yet these enclosures offer just a brief glimpse into the mammalian myriad.
Behind a doorway, hidden from the rest of the zoo, dozens of vats lie shrouded in mist. Filled with liquid nitrogen, these sub-Antarctic chambers contain the cryogenic preserves of thousands of species, maintained in tens of thousands of living tissue samples. Many of the animals represented in these tanks are precariously holding on. One, the po’ouli, is forever gone.
This is the San Diego Frozen Zoo, the largest collection of its type in the world. A precious and invaluable resource for conservation - it has another, perhaps less obvious use. The secrets thawed from its frosted vaults will help us to understand the human genome in greater resolution than ever before.
The Beluga Whale, Delphinapterus leucas, is an Arctic and sub-Arctic cetacean.
The human genome has allegedly been ‘sequenced to death.’ However, though it has been many years since the human genome project was completed, there are still gaping holes in our understanding - as well as the DNA sequence itself (particularly around the hard-to-reach bits near the middle of chromosomes).
It turns out that unraveling the complexities within any genome can help us to understand a lot more about every one of the others, including our own. Thanks to the incredible advances in DNA sequencing over the last decade, which is now very cheap and widely accessible, that’s exactly what researchers worldwide have increasingly been doing - particularly here at Earlham Institute (EI).
From colourful cichlid fish and chlamydia-ridden koalas to obscure plankton, bread wheat and extraordinarily complex strawberries, we are developing novel molecular and computational approaches to investigate every piece of the DNA puzzle - across each and every branch of life. That expertise enables our scientists to join world leading collaborations, including the Darwin Tree of Life and the Earth Biogenome Project, to document and understand all of life’s dazzling diversity.
An international team that features EI researchers Dr Will Nash and Dr Wilfried Haerty has recently published the initial findings of an analysis of one of those branches of life in particular: the mammals.
Announced in Nature, the Zoonomia project includes data from more than 240 mammal genomes - 131 of which were newly sequenced and assembled. This immense effort was enabled by San Diego’s Frozen Zoo, from which blood and tissue samples provided ample material for DNA sequencing.
What that has enabled is one of the great advances in comparative genomics in recent years - a unique, open-access resource that will be widely used for years to come to identify therapeutic targets for a range of diseases, from COVID-19 to cardiovascular disease, cancer and even schizophrenia.
“The Zoonomia project is revolutionary to our understanding of the mammals,” Dr Nash says. “It has created a huge amount of new genomes for us to study, increasing the number of Mammalian families with a representative genome from 50% to 80%.
“That means our understanding of the mammals is more broad - we have a wide, comparative dataset that allows us to understand the DNA bases in the human genome that are under evolutionary conservation. Previously, we only had regions of about 12 bases that we could confidently say were conserved, but with this dataset we can go down to a single nucleotide. This is essential as over 90% of the mutations associated with biochemical, physiological and morphological variations or with disease are found within regions of the genomes that have been preserved over millions of years of evolution.”
Aiding this improved resolution, the project has ushered in a very new approach to one of the hardest computational problems genome biologists must solve: alignment.
To compare between different species - to understand which nucleotides are the most important - we have to work out how sections of DNA that make up the genome of one species are related to those of another species. DNA can change repeatedly over time, sometimes in large chunks, which means tracing such relationships through the evolutionary tree can be very hard.
A well-established method for calculating whole genome alignments is to line all species up with a known ‘reference’ genome. There is a major limitation with this approach, however, in that it doesn’t really help us to identify the genes that are not already present in the reference genome - the human or mouse genome in the overwhelming majority of studies.
Mammals have evolved a phenomenal range of adaptations to life in different environments, which may be dictated in a large part by genetic elements that are shared with some, but not other, mammals. If we are using a single reference to try and prise apart all of this diversity, we’re missing a trick.
The Zoonomia project is therefore doing the sequence alignments (comparing of the genomes) using a new, reference-free method called Cactus. The approach is quite different.
Blood samples are among the various tissues stored in the Frozen Zoo.
For any two genome assemblies that are aligned, the software comes up with a likely ‘ancestral’ genome. The most similar sequences are separated from less similar sequences, branching off from each other as Cactus infers more ancestral genomes.
Cactus comes up with a tree of life, comparing similar genes with each other and not based on a single reference genome. The ‘tree’, however, might be better described as a complex web, with similar genes appearing in some lineages and disappearing in others.
By reconstructing the common ancestral DNA sequences of homologous (equivalent) genes, we get an idea of shared ancestry. These genes can then be compared across the tree that has been generated and we can identify those which are widely spread across the mammals, or those which are well-conserved within a specific lineage, such as carnivores or primates.
This would then suggest that those specific genes are innovations related to a certain group. On the flip side, some regions that are present in most mammals, but not in one group, would suggest a loss of a characteristic.
“We can start to think about adaptations that are specific to groups,” explains Nash. “An exciting one is whales and dolphins - mammals that have re-entered an aquatic environment. They have a whole suite of adaptations to be able to dive deeply, or to have different thermoregulation.
“We can think about it not only in terms of single nucleotide regulatory changes in these species, but also whole genome-scale changes - such as gene duplications.”
One of the biggest restricting factors in running such analyses is the computationally intense nature of comparing 241 genomes side by side. You need a whopping great computer to do that - aligning the genomes all together would require one hundred thousand processors running for an entire day. However, thanks to the Cactus software, data can be added to an existing tree, so the approach is more efficient.
In the name of democratised access, the Cactus resource has also been made ‘cloud-first’. This means research is not limited to those centres with the biggest and most powerful computers but can be performed by individual research groups, who can rent cloud-based services such as those run by Google and Amazon.
Bats form one of the most sequenced mammalian families in the project, with more than 30 species referenced so far. Pictured is the black flying fox, Pteropus alecto.
“Some members of the consortium have already published a comparative analysis of the key genes associated with the response to COVID,” says Nash.
“Not only does the dataset provide a far more comprehensive understanding of how mammals are affected by coronaviruses in general, but how viruses impact mammalian physiology across a wide range of species. Among the 240 mammal genomes are those of bats and pangolins, which have been implicated in the current pandemic.”
Global pandemics aside, there are many more potential medical applications of the resource. Brown fat, and its role in human health, is one possibility.
Adult humans weren’t always associated with this type of fat. Brown fat is usually found in hibernating mammals such as the thirteen-lined ground squirrel and the little brown bat, which prepare for their winter dormancy by loading up on this resource. In people this fat is thought to have a role in protecting against cardiovascular disease and type 2 diabetes - a potential avenue for combating the rise of these age and lifestyle-related diseases.
“This is a fantastic resource,” says Nash of the Zoonomia dataset. “It’s relevant to all human pathologies. We can think about how the genes that underline cancers, or other diseases or disorders, have travelled down different evolutionary trajectories across the mammals.
“We can see which animals are less likely to have cancers, or the species immune to certain diseases. We can then look at how they are different at the nucleotide level - single points of the DNA code - and perhaps target therapeutic uses for those genetic differences.”
The thirteen-lined ground squirrel, Ictidomys tridecemlineatus, prepare for their winter dormancy by loading up on brown fat.
The Zoonomia Project is already well on the way to helping researchers tap into the wealth of information in diverse genomes. Another grand, global collaboration is the Earth BioGenome Project, which aims to sequence the DNA of every single animal, plant, fungus and protist species on the planet to help understand and conserve the biodiversity on which we rely.
Many of the 131 mammal genomes sequenced and assembled as part of Zoonomia have fed directly into that effort. As we have mentioned, the resource has been used to understand how COVID-19 affects mammals, too. Dr Wilfried Haerty is already applying it to his work on understanding the genetic causes of psychological conditions such as schizophrenia. It’s likely to be a resource that keeps on giving.
“This is a resource that will be used for years and years to come,” says Haerty. “The precursor to this project, which looked at 29 mammal genomes, came out in 2011 and we have been using it since then.
“It’s had a lot of impact. I think we’ll see the same with this.”
To find out more about the Zoonomia project, including the species that had their DNA analysed, you can go to the website. For an understanding of how the resource can help our understanding of medicine, we recommend reading some of the articles featured below.