genomic data helped us during covid-19. what other data can we use in the future?
posted on september 11, 2023 by dr ash porter
dr ash porter takes us behind the scenes of their latest publication, 'the importance of utilizing travel history metadata for informative phylogeographical inferences: a case study of early sars-cov-2 introductions into australia' published in microbial genomics.
my name is dr ash porter, and i am a postdoctoral researcher with the duchene group at the doherty institute within the university of melbourne, australia. i’m an evolutionary biologist who has a particular fascination wildlife disease and zoonotic viruses. i use a combination of biological knowledge with computational methodology to study how viruses emerge and evolve in new hosts. my research revolves around data: data can help us unravel and understand pathogens from hundreds of years ago. data can help us monitor outbreaks and inform public health policy. data can even be used to explore what might happen in the future and help us prepare for the next pandemic.
as i mainly work on zoonotic viruses, you might think its strange that i don’t think viruses are our enemies. they exist, like we do, in the global ecosystem, and they have played their part shaping human evolution. in fact, most of the recent outbreaks of viruses have been caused by human activities, such as climate change, land use change, and breaching wildlife habitats. viruses infect your pet, the veggies you eat, the bacteria that live in your gut, and you (they are even in your dna!). most of the time they don’t cause any harm, but sometimes they cause catastrophic pandemics. i believe in taking a proactive approach to understanding as much as we can about zoonotic viruses, to better prepare us for “the next covid-19”.
when i started my work at the doherty institute, it was 2020 and many of us were attempting to understand sars-cov-2. one of the first project ideas that i suggested was to use some novel methods (lemey et al., 2020) to explore how the virus moved into australia during the first wave of the pandemic.
australia is an interesting case study when it comes to pandemics. being an island nation, it is much easier to close our borders and to potentially halt the introduction of pathogens. for example, during the deadly influenza pandemic of 1917-1919, sometimes misleadingly referred to as the “spanish flu”, australia enforced both maritime and land quarantine measures to attempt to slow the spread of the virus. the city of sydney even mandated the use of masks, along with the closure of schools and places of entertainment (sound familiar?).
at the beginning of the covid-19 pandemic, australia enforced a range of measures to reduce the impact of sars-cov-2 on the population (figure 1). due to the social distancing measures, there were probably lower levels of community transmission during this stage of the pandemic (people weren’t spreading it to each other). however, there was still a rise of new cases being detected – and some of them were travellers from outside the country. importantly, in victoria, those who tested positive upon entry to australia had their recent travel history recorded.
the microbial diagnostic unit public health laboratory (mdu phl) and i were very lucky to have access to an extremely special dataset: the travel history of these travellers, which was linked to the genomic sequence of their sars-cov-2 infection.
what makes this special? having extra data that is linked with genomic sequences is not very common – sometimes, all we have is what date the sample was collected. other useful information can be the location of collection, what species it was sampled from, or if there was any important medical history of the individual (i.e. chronic infection, vaccination status).
sometimes having just the sequence data and the date of collection is enough information for evolutionary biologists to figure out how the virus is spreading and evolving. however, sars-cov-2 evolves pretty slowly (for a virus), and unfortunately, most of the sequences that were generated were collected from high-income countries. this means that there is a major bias in the dataset towards sequences from countries like australia, which had a lower caseload, but had the resources to sequence many of the detected cases. as an example, during the same period, australia had a sequencing proportion (the number of cases that were sequenced) of over 50%, whereas south america sequenced 0.3% of cases.
if we were to try and model the movement of sars-cov-2 based on sequence data, it would appear that countries with a higher sequence proportion (such as australia) were the “epicentres” of the pandemic. in order to produce more realistic estimates, having access to the travel history metadata enabled us to implement that extra information into our model.
to use a metaphor, if my friend and i both went out and bought a copy of the 2020 booker prize winner “shuggie bain” by douglas stuart to discuss at our monthly book club. because we love books and data, we also track how many people come to our book clubs, and therefore, how many copies of each book are present. likening this to an infectious disease outbreak – each “book” is the virus infecting a person.
upon meeting, we might be slightly confused to why our two copies of the same book had different covers and punctuation styles. it might take a discussion to unravel that they had bought their version while they were on a recent holiday to the usa, whereas i had sourced mine from a local bookstore. although both books were present at our local book club in victoria, the novels are slightly different versions of the same book, based on their origin of publication. we can compare this to two people in the same current location having slightly different lineages of sars-cov-2, based on where each person was infected in the past.
if we took our usual tally at our book club of how many people were present that day (and how many copies of the “book” were present), the history of the origin of each book would be lost – we would just know how many copies of the book were present at our victorian book club. obviously, it wouldn’t really matter for a book club to know about the origin of publication for the books brought by the participants. however, this kind of information can be very useful for phylogeography.
to translate that to our project, we were able to tell our model that some sequences, even though they were collected in australia, were most likely the result of an infection overseas (i.e. recent travel history). we thought this extra layer of information would help generate more accurate results – but as it turns out, the travel history metadata was essential for the model to work.
so, what did our analysis tell us? as you might expect, australia had plenty of sars-cov-2 imported from other countries (figure 2). as we anticipated, it didn’t really play a role as an “exporter” (i.e., australia didn’t spread the virus as much to other locations, figure 2a). you can see there was a “spike” in importations around mid-april before there was a rapid reduction. we can’t say what caused the slowing of importations – but it was likely a combination of the control measures introduced, such as the mandatory isolation requirements (figure 1, 2b).
this project had a range of people from different backgrounds working together: our collaborators at the victorian department of health, epidemiologists, microbiologists, and experts in public health and phylodynamics. however, the point i want to make is: even with this innovative methodology and the range of experts we have in australia, we wouldn’t have been able to generate these results without the travel history metadata.
unfortunately, collecting and sharing metadata has many barriers (e.g. ethics, data ownership). the quality and availability of publicly available metadata presents barriers for its usage in most sars-cov-2 research.
we strongly recommend caution when applying phylodynamic and phylogeographic models to the global sars-cov-2 dataset without using relevant metadata, as we know that there is sampling bias present, and that it will potentially produce misleading estimates. for the ongoing covid-19 pandemic and future outbreaks, we need to be sure that the estimates we are producing can inform the most appropriate public health policies.
we hope that collecting and sharing metadata will become more common practice, and that we can work towards a global, coordinated response for data collection and modelling. this will be key for managing the ongoing covid-19 pandemic, along with preparing for future pandemics.
reference
lemey p, hong sl, hill v, baele g, poletto c, colizza v, et al. accommodating individual travel history and unsampled diversity in bayesian phylogeographic inference of sars-cov-2. nature communications. 2020;11(1):1-14.
thumbnail image credit: doherty institute