microbiota from the computational perspective of a non-computer scientist

posted on october 16, 2023   by eric nayman

eric nayman takes us behind the scenes of their latest publication 'microbiome depiction through user-adapted bioinformatic pipelines and parameters’ published in journal of medical microbiology. 

blob.jpg

my name is eric nayman, and i am currently a 4th year md candidate at the herbert wertheim college of medicine at florida international university (fiu) in miami, fl. for the last four years, i have been working under the aegis of professor kalai mathee as part of the fiu bioinformatics research group, which focuses on the development of techniques for state-of-the-art multi-omics analysis as they pertain to the human microbiome.

microbiome science is an exciting and relatively new field in the biomedical sciences. our resident microbial flora is in constant interaction with our own cells, and this exchange shapes the structure and function of both counterparts. thus, the microbiome has a significant role in human health and disease. as a future physician and pathologist, i hope to further our understanding of how the microbiome contributes to pathophysiologic processes, and exploit this for the development of novel diagnostics and therapeutics, which is an area of study already underway.

however, studying the microbiome is actually quite challenging because it is difficult to consistently identify how microbiota change and adapt across a wide spectrum of conditions. the work my team recently published in the journal of medical microbiology describes how we sought to ameliorate this insufficiency through bioinformatics and, ultimately, improve how well we can characterize microbiota.

kalai blog in text.jpg

from left to right, professor kalai mathee, eric nayman (fiu medical student) and alayna gumabong (fiu undergraduate).

for the first part of our project, we had to set up four very different bioinformatic pipelines using the recommendations set forth by the creator of each. given my biomedical background, this was initially a task far beyond my comfort level. though, as time went on and through much trial and error, we learned as a team, both data and biological scientists alike. we first performed several trial runs using various genomic datasets to benchmark our workflows. it was through these initial quality control steps that we realized that the crux of our work would address how these well-established but error-prone pipelines are commonly used.

early on, our workflows discarded up to 80% of the input genomic data, and, as a result, generated largely inaccurate microbial compositions. after we realized this, we tweaked several of the parameters across the workflows and benchmarked these “user-adapted” pipeline runs against how these pipelines are commonly used, which is largely as recommended by their online tutorials. while the tutorials are valuable starting points, they do not account for the uniqueness of each individual user’s dataset. we were able to compare the workflows we adapted to the tutorial-recommended versions because we used a microbial mock community of known composition.

the manipulation of the parameters within each workflow took much official (pubmed and google scholar) and unofficial (online forums) literature review. it was through reading what others had tried and demonstrated that we were able to come up with the best versions of our workflows. at the beginning, i distinctly remember the online forums being much richer in information than the published literature. though, as this project took several years to complete, the literature grew, but i still appreciate how dedicated members of the community were in those corners of the internet.

we hope that our work helps those studying the microbiome to be able to better characterize the composition of their sample. though, more than that, our work highlights how important it is for biologists and clinically oriented scientists to strive for a deeper understanding of the computational protocols that so significantly impact their work. we demonstrate how striking a few keys on the keyboard can drastically change the outcome of an analysis that holds an increasingly promising clinical potential.