EMERSE (the text processing/searching tool) updates for 2024


Abstract

EMERSE (https://project-emerse.org) is a text processing and searching system designed for analyzing free text clinical notes from electronic health records (EHRs). EHR notes contain vital data necessary for multiple research tasks, ranging from patient cohort identification to data abstraction for case report forms. EMERSE was designed to be user-friendly and easily accessible by non-technical researchers.

Our upcoming release (expected in summer 2024) will have built-in natural language processing (NLP) capabilities, including annotations for negation, uncertainty, subject (patient versus nonpatient), and prior history. With these annotations and an updated user interface, users can search for detailed information, such as instances where it is specifically mentioned that a family member did not have breast cancer or notes where a patient had prostate cancer, excluding mentions that are negated, uncertain, or mentioned in the context of a prior history. EMERSE will also support labeling of (and searching for) terms based on the Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs). Incorporating this functionality while maintaining ease-of-use, as well as fine-grained-control for users, has required substantial changes to the user interface.

We have also recently completed an application programming interface (API) that allows for audited, back-end access to clinical notes in a HIPAA-compliant manner (e.g., audit trails of all queries are maintained). This has been beneficial to research teams building their own data analytic pipelines, including those training artificial intelligence (AI) models.

Our team focuses our work on two primary groups that we interact with: (1) end-user researchers who want to use the software and (2) information technology/informatics teams that must implement the enterprise software locally at their institutions. These two groups have very different needs and expectations. In addition to a user guide, we have multiple technical implementations and ongoing operational guides. We have also developed a general security document for distribution that contains answers to many questions asked during software security reviews, which are now almost universally required by implementing sites.

EMERSE is now live at 12 sites and is undergoing implementation at 7 more (see https:// project-emerse.org/community.html). Additionally, EMERSE has supported over 670 peer-reviewed papers and abstracts. In addition to completing the NLP work for our next release, we are also working to (1) enable the incorporation of customized NLP labeling in lieu of, or in addition to, what EMERSE provides out-of-the-box; (2) better understanding of security concerns about our network feature so that we can address these in a future release; (3) surveying end users to better understand how they have been using EMERSE and the kind of work it has supported; and (4) a process for extracting data from templated notes. We are also beginning to explore how large language models might be incorporated to provide novel capabilities for our user base.

Acknowledgment: This was work supported by the National Cancer Institute of the National Institutes of Health under Award U24CA269315 and the National Center for Advancing Translational Sciences of the National Institutes of Health under Award UM1TR004404.

Poster
non-peer-reviewed

EMERSE (the text processing/searching tool) updates for 2024


Author Information

David A. Hanauer Corresponding Author

Learning Health Sciences, University of Michigan, Ann Arbor, USA

Lisa A. Ferguson

Department of Learning Health Sciences, University of Michigan, Ann Arbor, USA

Kellen J. McClain

Informatics, Michigan Medicine, Ann Arbor, USA

Guan Wang

Department of Learning Health Science, Michigan Medicine, Ann Arbor, USA


PDF Share