Integrating a lab data extraction API with an existing Electronic Health Record (EHR) system is a project that requires careful planning across three dimensions: technical, organizational, and regulatory. This guide provides a practical walkthrough of the design decisions, integration patterns, and common pitfalls that clinical informatics teams should consider when connecting an API like MedExtract to their EHR infrastructure.
Integration patterns
There are three main patterns for integrating an extraction API with an EHR. The choice depends on the existing architecture, document volume, and latency requirements.
Pattern 1: direct point-to-point integration
In this pattern, the EHR calls the extraction API directly when it receives a new document. It is the simplest pattern but also the most tightly coupled.
User uploads PDF → EHR → MedExtract API → EHR stores results
Advantages: simplicity, low latency, rapid implementation. Disadvantages: tight coupling, no fault tolerance for API failures, the EHR must handle the asynchronous flow if extraction takes more than a few seconds.
This pattern works well for low volumes (fewer than 50 daily reports) and when the EHR has the ability to make outbound HTTP calls.
Pattern 2: message queue integration
An integration middleware (integration engine) receives documents from the EHR, sends them to the extraction API, and returns results to the EHR. This pattern decouples both systems.
EHR → Message queue → Worker → MedExtract API → Queue → EHR
Advantages: decoupling, fault tolerance, automatic retries, horizontal scalability. Disadvantages: greater complexity, additional latency, messaging infrastructure required.
This is the recommended pattern for medium and high volumes (more than 50 daily reports). Healthcare integration engines like Mirth Connect, Rhapsody, or Iguana implement this pattern natively.
Pattern 3: native FHIR integration
If the EHR supports FHIR R4 natively, the integration can be done entirely through FHIR resources. The extraction API produces a FHIR Bundle that is sent directly to the EHR's FHIR endpoint.
Document → MedExtract API → FHIR Bundle → EHR's FHIR Server
Advantages: full standardization, portability, EHDS compliance. Disadvantages: requires the EHR to have an operational FHIR server, which is not always the case.
This pattern is ideal for organizations with mature FHIR infrastructure and is the model that the European healthcare ecosystem will converge toward with EHDS implementation.
Detailed data flow
Document reception
The first step is document capture. Lab reports can arrive at the EHR through multiple channels:
- Integrated scanner: healthcare staff scan the paper report directly from the EHR interface.
- Manual upload: the user uploads a PDF or image file through a web form.
- Email: a monitored mailbox receives reports from external labs and redirects them to the system.
- HL7 integration: the lab sends an HL7 v2 message with the report attached as an embedded document.
- FHIR DocumentReference: the lab publishes a DocumentReference resource with the report as an attachment.
Sending to the extraction API
Once the document is captured, it is sent to the MedExtract API for processing. The basic call is a multipart HTTP POST request:
import httpx
async def send_to_extraction(
file_bytes: bytes,
filename: str,
content_type: str,
) -> dict:
"""Send a document to the MedExtract API."""
async with httpx.AsyncClient(timeout=60.0) as client:
response = await client.post(
"https://api.medextract.io/v1/extract",
headers={
"Authorization": f"Bearer {API_KEY}",
},
files={
"file": (filename, file_bytes, content_type),
},
data={
"output_format": "fhir_bundle",
"include_confidence": "true",
"language": "es",
},
)
response.raise_for_status()
return response.json()
Processing the response
The API returns structured results that include:
- Identified patient data (name, ID, date of birth)
- Issuing laboratory data
- Report date
- List of results with LOINC code, value, unit, reference range, and abnormality indicator
- Overall and per-result confidence scores
- Optionally, a FHIR Bundle ready for import
Patient reconciliation
Before storing results in the EHR, it is necessary to reconcile the patient identity extracted from the report with the patient registered in the EHR. This step is critical to avoid storing results in the wrong clinical record.
Reconciliation can be performed via:
- Exact identifier: if the report contains the EHR medical record number, matching is direct.
- Demographics: comparison of name, date of birth, and other identifiers.
- MPI (Master Patient Index): query the organization's master patient index.
- Manual confirmation: in cases of ambiguity, an operator confirms the association.
def reconcile_patient(
extracted_name: str | None,
extracted_id: str | None,
extracted_dob: str | None,
ehr_client: EHRClient,
) -> str | None:
"""Reconcile the extracted patient with the EHR record."""
# Attempt 1: search by exact identifier
if extracted_id:
patient = ehr_client.find_patient_by_id(extracted_id)
if patient:
return patient.id
# Attempt 2: search by demographics
if extracted_name and extracted_dob:
candidates = ehr_client.search_patients(
name=extracted_name,
birthdate=extracted_dob,
)
if len(candidates) == 1:
return candidates[0].id
# No unambiguous match: requires manual review
return None
Duplicate detection
Lab reports may arrive through multiple channels (fax, email, lab portal), which can generate duplicates. The system must detect and manage duplicates before storing results:
- Compare the combination of laboratory + date + test list with existing results
- Calculate a hash of the document content to detect identical documents
- Check if a FHIR DiagnosticReport with the same external identifier already exists
Storage in the EHR
Validated and reconciled results are stored in the EHR. Depending on the EHR architecture, this may involve:
- FHIR insertion: send the FHIR Bundle to the EHR's FHIR server via a POST transaction.
- HL7 v2 insertion: generate an ORU^R01 message with the results and send it to the EHR's interface engine.
- Proprietary API: use the EHR's specific API to create lab observations.
- Direct database: in exceptional cases, insert directly into the EHR's result tables (not recommended).
Error handling and special cases
Low-confidence results
When the extraction API reports low confidence for a result, the system should route that case to a human review queue instead of automatically inserting it into the EHR. The review interface should display the original document alongside the extracted results, allowing the operator to correct or confirm each field.
Partially processed reports
If the API cannot process some pages or sections of the report (for example, due to insufficient image quality), the system should store successfully processed results and mark the report as "partially processed," generating an alert for an operator to review the unprocessed sections.
Unrecognized report formats
Some lab reports may have formats that the extraction system does not recognize. In these cases, the flow should degrade gracefully: the document is stored as a patient attachment in the EHR and flagged for manual processing, without blocking the overall flow.
Security and compliance
Authentication and authorization
Communication between the EHR and the extraction API must be protected with:
- TLS 1.2+: mandatory encryption in transit.
- API Key / OAuth 2.0: client authentication.
- IP whitelisting: access restriction by source IP (recommended in hospital environments).
- Rate limiting: protection against abusive or accidental overuse.
Audit trail
Every extraction operation must generate an audit record that includes:
- Input document identifier
- Request and response timestamps
- Identifier of the user session that initiated the process
- Extraction result (success, partial failure, error)
- Patient reconciliation result
- EHR storage result
This audit trail is a GDPR requirement and a best practice for healthcare data governance.
Data processing agreement
Integration with an external API to process health data requires a data processing agreement compliant with Art. 28 of the GDPR. This agreement must establish the responsibilities of each party, applicable security measures, and data retention and deletion conditions.
Testing and validation
Testing phase
Before putting the integration into production, a testing period covering the following is essential:
- Unit tests: verify that each integration component works correctly in isolation.
- Integration tests: verify the complete flow from document upload to EHR storage.
- Load tests: verify that the system handles expected volume without degradation.
- Failure tests: verify behavior when the API is unavailable, when the network fails, or when the EHR rejects the insertion.
Parallel validation
The best practice is to run the system in parallel mode for 1-3 months: reports are processed both manually and automatically, and results are compared to measure the concordance rate. A concordance rate above 98% on the main fields (LOINC, value, unit) indicates the system is ready for production.
Continuous monitoring
In production, the system should be continuously monitored to detect performance degradation:
- Extraction success rate
- Average processing latency
- Percentage of results sent to human review
- Concordance rate with manual entries (if parallel flow exists)
Conclusion
Connecting an EHR to a lab extraction API like MedExtract is a project with a clear return: it automates a time-intensive, error-prone manual process and enables the interoperability that regulations like the EHDS are making mandatory. The keys to success are choosing the right integration pattern for the existing architecture, implementing robust patient reconciliation, and maintaining traceability and security at every step of the data flow.
For organizations just getting started, the Python pipeline tutorial provides a functional starting point that can be adapted to the specific needs of each EHR.
Related Articles
FHIR R4 Integration Guide for EHR Systems
A practical overview of integrating FHIR R4 resources into EHR systems, focusing on DiagnosticReport and Observation bundles from lab data.
Building a Lab Report Processing Pipeline in Python
Step-by-step tutorial to build a lab data extraction pipeline with Python, from PDF to FHIR R4.
Complete Guide to LOINC Code Extraction
Everything about automated LOINC code extraction from lab reports: process, challenges, dictionaries, and best practices.