Can AI disrupt a >$1 billion market of manually processing low-quality clinical EHR data?
Almost 10 years ago, the health information landscape looked very different. The majority of hospitals and medical practices were still using paper records, and the idea of leveraging real-world patient data for clinical decision making seemed far off. The 2009 HITECH Act created Meaningful Use as an incentive program for electronic health records (EHR) based on optimistic views on digitizing healthcare. A digital record would surely provide major ROI in the form of population health, value-based care, and improved billing efficiency. It’s now clear with EHR adoption rates above 90 percent that the program was effective at forcing modernization, but it also had unexpected consequences.
First, hundreds of existing and developing EHR vendors emerged. With them came hundreds of different data standards and data collection systems. Then, purchasers, different institutions, and physician specialties drove further customization of data elements, and mergers of EHR companies and hospital systems led to even more bridging of historical formats to new blended data structures. While government and industry-led initiatives continue to work on fixing interoperability, the reality is that the majority of clinical data today lacks consistent structure, has a high degree of incompleteness, and is not readily usable by most analytic processes. Almost two-thirds of hospitals cannot integrate a patient’s summary of care records into their EHR systems without manual entry.
EHR adoption digitized while also further dividing the landscape of clinical information:
EHR adoption by hospital size. Source: KLAS Research
The result of this fragmentation has been the rise of manual curation of EHR data to prepare it for clinical data analysis. Manual curation involves a disease expert, often a nurse, physician assistant, or graduate/medical student reading through the associated notes, test results, imaging files, or other relevant documents and inputting controlled structured information. The type of information a curator will focus on depends on the objective, but in general, they codify things like adverse events, disease staging, the progression of disease, and other medical and clinical concepts.
While these types of variables are clinically focused, there is also a great need for curation in response to the simple incompleteness of data fields. Dates of treatment, lab values, and even things like inconsistent taxonomy create demand for manual or rules-based curation.
The biggest driver of all is the growing convergence of real-world data (RWD) for use in clinical trials as a replacement or addition to control trials. As regulatory agencies grapple with how to include RWD-based evidence in label decisions and payers evolve interpretation of RWD evidence for economic value, the level of curation can have a massive effect on the clinical interpretation. Recent commentary by the FDA indicates that they are highly focused on this issue and that consistency of data curation and analytic methods are a priority. In the graphic below, three different approaches to the calculation of progression-free survival in breast cancer shift the potential median time to progression by months. It is this type of variation and systematic bias that has driven the need for clinical-level EHR curation as a standard for clinical studies and regulatory submissions.
Progression free survival in first line metastatic breast cancer patients receiving treatment for HR+ disease. Source: Vector Oncology, PH.AI internal research
Several firms have emerged with a specific focus on manual curation to address this >$1 billion a year market demand. Largely, this has been driven by disease areas that require significant clinical interpretation such as oncology, radiology, and neurology. Several companies such as Vector Oncology, Tempus, Flatiron Health, and Precision Digital Healthhave CaaS offerings; they have reported to have hundreds of nurse/grad student/physician curators. In addition EHR companies, generalist analytic firms, and academic institutions have reported to leverage technology in EHR curation; however, it is unclear this has been validated to the same level as manual curation. Manual curation is currently a necessary part of an EHR analytic value chain, but currently it is costly, lacks standards, and is difficult to do at scale.
A rising wave of artificial intelligence (AI) and natural language processing (NLP)–based approaches look poised to disrupt the manual curation sector with a scalable and more precise approach. The majority of these technologies still struggle with challenges from disease context, the difficulty of obtaining training sets, and interpreting medical language and taxonomy. Imagine training an AI to determine if a “cough” was an adverse event from a medication or just a winter cold from physician notes.
There is progress, though, and I am confident that CaaS will begin to standardize, scale, and move from completely manual to technology enhanced. At Precision Health AI, we’re preparing to publish the results of a study that shows our AI module, pretrained on real-world data, can effectively identify the correct stage of cancer in incomplete oncology health records. As a result, we can help researchers better understand treatment during each stage of the disease and reduce documentation work for oncologists. We can also reduce the burden for manual curators and provide validated and consistent metrics to clinical and regulatory bodies. Stay tuned at ASCO for details on that research.
Demand for curation is rising. Recent approvals from real-world data studies in Europe and comments by the FDA about the importance of high-quality curation raise the bar for CaaS and validate new technological approaches to it. While CaaS advances, the volume of uncurated data continues to expand every day. Smart new interoperability policy may clear up some data processing issues. Blockchain and related technologies could eventually help unlock currently siloed records. But until then, the question remains: How soon can we convert messy, incomplete EHR data into usable clinically valid regulatory data?