needles and haystacks: understanding the hbv transcriptome

posted on may 25, 2023   by james harris

james harris takes us behind the scenes of his latest publication 'an enrichment protocol and analysis pipeline for long read sequencing of the hepatitis b virus transcriptome' published in journal of general virology.

blobs.jpg
hello i’m james harris, a post-doctoral research fellow in jane mckeating’s group in the nuffield department of medicine, at the university of oxford, uk.

hepatitis b virus (hbv) infects hepatocytes in the liver with chronically infected subjects at a lifelong risk of liver fibrosis, cirrhosis or even hepatocellular carcinoma. hbv is one of the smallest known dna viruses; its 3.2kb covalently closed circular genome (cccdna) encodes 4 overlapping open reading frames and transcribes 6 major viral rnas with unique 5’ transcriptional start sites that share a common 3’ terminus and poly-adenylation sequence. these transcripts translate in different frames, making interpreting the viral transcriptome from short read sequencing almost impossible.

a further complication is that hbv can integrate into the cellular chromosomal dna. these integrants are defective and only transcribe a subset of viral rnas transcripts that differ at their 3’ terminus. any curative therapy will require the silencing or ablation of replicating cccdna, however ongoing transcription from integrants makes the task of identifying successful treatments very challenging. clearly identifying the source of viral rnas is important for future clinical developments, and that’s what got us interested!

hbv infection efficiency in vitro is low, with large amounts of virus required to establish infection. to circumvent this, many studies use plasmid transfection to deliver hbv into cells. to allow transcription of the full repertoire of viral rnas, the plasmid-encoded genome is overlength (1.3x) with a repetition of some loci and promoters. we compared the hbv transcriptome produced from this synthetic viral dna construct with authentic genomes produced during infection.

our previous work showed that hbv rnas comprise a very small portion of the cellular transcriptome, and this is further diluted by the presence of large numbers of uninfected cells. however, we knew that our colleagues azim ansari and philippa matthews were developing an enrichment protocol to study hbv dna in clinical samples, and thought we could apply this technique to increase the abundance of viral rnas in our library. all we had to do was decide on our sequencing strategy…
needle in haystack credit miromiro.jpg
© istock/miromiro

finding a needle in a haystack: enriching hbv rnas allowed us to overcome the dilution of rare viral rnas by abundant host cell transcripts.

in late 2021 pacbio ran an online presentation with oxford genomics centre to publicise the new sequel ii long read sequencing platform that would be available in the following spring. everyone who registered for the event got a free travel mug, so of course we all signed up! in addition to their presentations, pacbio held a competition to win a free run on their brand new machine. this was a single box on their website with space for a title and 200 words. we filled it in, and got this message three days later:

smrt grant winner
mckeating group, nuffield department of medicine, university of oxford
alternative transcription in hepatitis b virus


our sequencing strategy was decided! sequencing entire transcripts rather than fragments solves the problem of identifying the transcripts in our library and we were delighted when our pacbio sequel ii run yielded several hundred thousand reads, allowing us to assess the abundance of spliced rnas, and compare the infection and transfection delivery methods.

james harris blog main 2.png
© lines, istock/paperkites questions icon,istock/milkghost

sequencing whole transcripts, not rna fragments: traditional short read methods offer great depth, but its impossible to determine what rna transcript they originated from. this is the reason we chose to go with the pacbio platform.

mug photo credits to the proud owner of the pacbio travel mug, dr peter balfe.

esther ng, a talented computational biologist, worked to streamline and rewrite the conventional analysis pipelines to interrogate the unusual canonical and non-canonical (spliced) viral transcripts. we were pleasantly surprised to find broadly similar transcript patterns in both infection and transfection models, suggesting that the model systems in widespread use are not as different as we originally thought. as the cost of sequencing continues to drop, and more long-read instruments are being installed, the approach we’ve described is becoming increasingly feasible and routine (indeed our sequencing service now charges less for long-read than short-read sequencing). we set about making all the required reagents and analysis methods as transparent and accessible as possible and everything we did is available online.

we hope this enrichment, sequencing and analysis pipeline will be of interest to groups working on hbv and other viruses - the only changes needed would be to design new capture probes and work out the transcription pattern files for the analysis pipeline.