Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are developing an eHealth system that will be able to automatically distinguish lymphoma subtypes to help with cancer diagnose and treatment.
PhD student Yuan Luo and MIT Professor Peter Szolovits have collaborated with a team of experts from Massachusetts General Hospital (MGH) to develop the eHealth system which analyses data from existing medical records that contain pathology reports, and then automatically suggests cancer diagnoses.
In a paper published in the Journal of the American Medical Informatics Association, the researchers explained how they focused on the three most prevalent subtypes of lymphoma, a common cancer with more than 50 distinct subtypes that are often difficult to distinguish. According to the director of the Center for Lymphoma at MGH and one of the paper’s co-authors, Dr Ephraim Hochberg, up to 15% of lymphoma cases are misdiagnosed, which could cause unnecessary delays in treatment.
The MIT researchers realised they could tap into MGH’s archive of pathology reports to develop automated tools that could improve doctors’ understanding of how to diagnose lymphomas. “It is important to ensure that classification guidelines are up-to-date and accurately summarised from a large number of patient cases,” said Luo. “Our work combs through detailed medical cases to help doctors more comprehensively capture the subtle distinctions between lymphomas.”
Luo added that such machine-learning models need to be not only accurate but also interpretable to clinicians. The researchers therefore converted sentences from pathology reports into a graph representation where graph nodes are medical concepts and graph edges are syntactic/semantic dependencies.
“Clinicians’ diagnostic reasoning is based on multiple test results simultaneously,” said Luo. “Thus it is necessary for us to automatically group subgraphs in a way that corresponds to the panel of test results. This makes the model interpretable to clinicians instead of being a black-box, as they often complain about many other machine-learning models.”
The researchers used a technique called Subgraph Augmented Non-negative Tensor Factorization (SANTF), which organised data from about 800 medical cases as a three-dimensional table to link test results to lymphoma subtypes.
“The promise of Luo’s work, if applied to very large data sets, is that the criteria that would then help to identify these clusters can inform doctors about how to understand the range of lymphomas and their clinical relationships to each other,” said Szolovits, adding that he is confident that that the model could lead to more accurate lymphoma diagnoses and be incorporated into future WHO guidelines.
“Our ultimate goal is to be able to focus these techniques on extremely large amounts of lymphoma data, on the order of millions of cases,” said Szolovits. “If we can do that, and identify the features that are specific to different subtypes, then we’d go a long way towards making doctors’ jobs easier — and, maybe, patients’ lives longer.”