Natural Language Processing Methods Allow Reliable Extraction of Banff Scores and Grades.

K. Reilly,¹ L. Lenert,¹ H. Anwar,² J. Thompson,¹ D. Taber,¹ P. Mauldin,¹ T. Srinivas.¹

¹MUSC, Charleston
²IBM, NY.

Meeting: 2016 American Transplant Congress

Abstract number: A160

Keywords: Biopsy, Histology, Kidney transplantation, Resource utilization

Session Information

Session Name: Poster Session A: Kidney: Acute Cellular Rejection

Session Type: Poster Session

Date: Saturday, June 11, 2016

Session Time: 5:30pm-7:30pm

Presentation Time: 5:30pm-7:30pm

Location: Halls C&D

Objective: National Data are sparse with regard to pathology of rejection. Incorporating biopsy pathology grades and lesion scores into analyses requires manual extraction and entry into databases; a resource intensive and error-prone exercise. We examined the utility of reliable natural language processing (NLP) method in retrieving Banff scores and grades from transplant pathology reports.

Methods: In a quality initiative, IBM Watson Content Analytics Studio was used for NLP development targeting pathology reports. First, we built parsers which collected Banff scores; targeting alpha-numeric pairs following morphological variations of the string “Banff Score:” found within Diagnosis sections of reports. Annotators catalogued alpha-numeric scores and other tokens (ex: i?, iN/A) were reviewed for data quality. Results were validated against a manually populated database maintained by the transplant center. Subsequent parsers retrieved Banff grades (ex: IA, IIB) and these values were compared to auto-calculated grades using the previously extracted Banff scores.

Results: Banff lesion scores were collected for 1623 sample patients from 1993 to 2015; 61% of the pathology reports with Banff scores were matched to the Banff quality database joining MRN, report date, and biopsy date. This comparison of NLP extracted scores to manual recorded scores revealed 98% accuracy in our NLP methods. A sample of manually entered data demonstrated an error rate of 10 % in comparison. Banff rejection grades were found in 542 reports; 453 of which also included scores which were used to calculate grades for quality comparison. Initial results revealed 79% accuracy between NLP extracted and calculated grades. Further analysis revealed syntax discrepancies and semantic equivalence was applied (ex: 2A = IIA) to unveil 93% accuracy between NLP retrieved and calculated grades. NLP methods were measured at 98% accuracy for Banff scores and 93% for assigned grades.

Conclusion: Natural language Processing techniques allow a Big Data approach to extracting and curating the unstructured text of biopsy reports in electronic medical records and enable their application in analytics databases for predictive modeling with a low error rate. This process is automatable and deployable through the electronic health record for both clinical and analytic purposes. Errors in transcription of grades can also be picked up by this technique and may be applied to improve pathology reporting processes.

CITATION INFORMATION: Reilly K, Lenert L, Anwar H, Thompson J, Taber D, Mauldin P, Srinivas T. Natural Language Processing Methods Allow Reliable Extraction of Banff Scores and Grades. Am J Transplant. 2016;16 (suppl 3).

To cite this abstract in AMA style:

Reilly K, Lenert L, Anwar H, Thompson J, Taber D, Mauldin P, Srinivas T. Natural Language Processing Methods Allow Reliable Extraction of Banff Scores and Grades. [abstract]. Am J Transplant. 2016; 16 (suppl 3). https://atcmeetingabstracts.com/abstract/natural-language-processing-methods-allow-reliable-extraction-of-banff-scores-and-grades/. Accessed July 2, 2025.

« Back to 2016 American Transplant Congress