Sovacool has focused her research interests along three directions. First, she is interested in how microbial communities are characterized. She considers the microbiome as a system and studies the entirety of the taxonomic and functional composition of the gut microbiota. In the Schloss lab, the team uses amplicon sequencing techniques to categorize bacteria and clusters them in taxonomic groups. The first part of Sovacool’s dissertation explains how she developed an algorithm that analyzes the bacterial DNA sequences and clusters bacteria into groups that can be studied with supervised machine learning. This new method allows researchers to include new sequences to an already referenced dataset and can also be applied to an entirely different dataset.
In the second part of her dissertation, Sovacool shows how this technique can be used to study Clostridioides difficile infection (CDI). This potentially life threatening disease occurs when antibiotics create a major imbalance in the gut microbiota which allows C. difficile to infect the gut, a condition that can be treated by taking more antibiotics. If the microbiota disorder persists, the patient can have more recurring infections and in up to 10% of CDI cases, the patients might need to have their colon removed, while 8% of the patients might die. The Schloss lab is researching predictive factors of CDI to inform antibiotics and other treatment options. The overall weakening of the defense system that happens with aging or with other immune related conditions is known to be a factor. But the team hypothesized that the composition and organization of the microbiota itself, at the onset of a disease, might be used to predict negative outcomes of the disease. Sovacool developed a software for such an application and using machine learning, found that indeed the gut microbiome composition could be used as a biomarker to predict severe CDI outcomes. The next step for this research is to make the model robust enough to increase its reliability to eventually bring it to the clinics.
Reproducibility is one important criteria for robustness. It is of great importance for Sovacool that other scientists are able to use her software with their own microbiome data, or other types of data. She is a strong advocate for sharing data and code to advance science. “The way you write your code, structure your project, and disseminate your work plays a big role in how easy it is for other scientists to use your software and to reproduce your work. I write my code with this in mind, and this is what I call democratizing data science,” she said.
“The way you write your code, structure your project, and disseminate your work plays a big role in how easy it is for other scientists to use your software and to reproduce your work. I write my code with this in mind, and this is what I call democratizing data science.”
Passionate about democratizing data science, Sovacool participated in the University of Michigan local chapter of Girls Who Code, founded by Bioinformatics Ph.D. alumnae Brooke Wolford and Zena Lapp. The goal for this club is to entice high school girls to become interested in coding for data science. Sovacool particularly contributed to a new curriculum focused on Python for data science to attract youth in local high schools. The club runs year-round for high school students and also offers a summer two-week program in Detroit. The students learn the very basics of Python, analyze a dataset, and present their findings to the group. “This program continues on thanks to the efforts of Audrey Drotos, Hayley Falk, and other graduate students at U-M. We’ve had overwhelmingly positive feedback about the students’ experience in the club,” she said. The club shifted to a virtual format due to the COVID-19 pandemic, and will return to the in-person format starting this fall.
Sovacool also contributes to the U-M organization Software Carpentry that offers two-day workshops to anyone from any field at U-M who is interested in data science. The workshop teaches the basics of a programming language, how to use a command line, and how to use git, a software that tracks changes and facilitates collaborations. With this group, Sovacool developed a curriculum that integrates these three topics.
She published an article about responsibly using machine learning with an emphasis on reproducibility. The team developed an R package and a pipeline of computational tools that follow best practices with the goal to be user-friendly and also customizable to users’s research needs.
Sovacool will join a team of bioinformaticians as a bioinformatics software engineer at Frederick National Laboratory for Cancer Research. She will develop software and workflows to support other bioinformaticians who analyze a variety of datasets from scientists at the National Cancer Institute. She was looking for a position that would allow her to do more engineering development to support data analysis, and she chose a governmental institution in order to continue her work in open science and on the democratization of data science.
Sovacool found her calling for bioinformatics in her hometown high school in Noblesville, Indiana. She was offered several courses in biomedical sciences through Project Lead the Way and discovered that she loved the idea of doing research that would impact healthcare downstream. As a biology major in college at the University of Kentucky, she took a course where she analyzed DNA sequence data and learned that the integration of computer science with biology is called “bioinformatics!” She sought out undergraduate research opportunities where she could do bioinformatics –and continued on this path.
Outside the lab, Sovacool enjoys running, biking, rock climbing, and volunteering for her church’s worship team as a sound engineer and guitarist. Keeping up with her hobbies outside of science has been key to maintaining a healthy work-life balance, which she learned from her mentor, Pat.