Connecting Your EHR to a Lab Extraction API

Integrating a lab data extraction API with an existing Electronic Health Record (EHR) system is a project that requires careful planning across three dimensions: technical, organizational, and regulatory. This guide provides a practical walkthrough of the design decisions, integration patterns, and common pitfalls that clinical informatics teams should consider when connecting an API like MedExtract to their EHR infrastructure.

Integration patterns

There are three main patterns for integrating an extraction API with an EHR. The choice depends on the existing architecture, document volume, and latency requirements.

Pattern 1: direct point-to-point integration

In this pattern, the EHR calls the extraction API directly when it receives a new document. It is the simplest pattern but also the most tightly coupled.

User uploads PDF → EHR → MedExtract API → EHR stores results

Advantages: simplicity, low latency, rapid implementation. Disadvantages: tight coupling, no fault tolerance for API failures, the EHR must handle the asynchronous flow if extraction takes more than a few seconds.

This pattern works well for low volumes (fewer than 50 daily reports) and when the EHR has the ability to make outbound HTTP calls.

Pattern 2: message queue integration

An integration middleware (integration engine) receives documents from the EHR, sends them to the extraction API, and returns results to the EHR. This pattern decouples both systems.

EHR → Message queue → Worker → MedExtract API → Queue → EHR

Advantages: decoupling, fault tolerance, automatic retries, horizontal scalability. Disadvantages: greater complexity, additional latency, messaging infrastructure required.

This is the recommended pattern for medium and high volumes (more than 50 daily reports). Healthcare integration engines like Mirth Connect, Rhapsody, or Iguana implement this pattern natively.

Pattern 3: native FHIR integration

If the EHR supports FHIR R4 natively, the integration can be done entirely through FHIR resources. The extraction API produces a FHIR Bundle that is sent directly to the EHR's FHIR endpoint.

Document → MedExtract API → FHIR Bundle → EHR's FHIR Server

Advantages: full standardization, portability, EHDS compliance. Disadvantages: requires the EHR to have an operational FHIR server, which is not always the case.

This pattern is ideal for organizations with mature FHIR infrastructure and is the model that the European healthcare ecosystem will converge toward with EHDS implementation.

Detailed data flow

Document reception

The first step is document capture. Lab reports can arrive at the EHR through multiple channels:

Integrated scanner: healthcare staff scan the paper report directly from the EHR interface.
Manual upload: the user uploads a PDF or image file through a web form.
Email: a monitored mailbox receives reports from external labs and redirects them to the system.
HL7 integration: the lab sends an HL7 v2 message with the report attached as an embedded document.
FHIR DocumentReference: the lab publishes a DocumentReference resource with the report as an attachment.

Sending to the extraction API

Once the document is captured, it is sent to the MedExtract API for processing. The basic call is a multipart HTTP POST request:

import httpx

async def send_to_extraction(
    file_bytes: bytes,
    filename: str,
    content_type: str,
) -> dict:
    """Send a document to the MedExtract API."""
    async with httpx.AsyncClient(timeout=60.0) as client:
        response = await client.post(
            "https://api.medextract.io/v1/extract",
            headers={
                "Authorization": f"Bearer {API_KEY}",
            },
            files={
                "file": (filename, file_bytes, content_type),
            },
            data={
                "output_format": "fhir_bundle",
                "include_confidence": "true",
                "language": "es",
            },
        )
        response.raise_for_status()
        return response.json()

Processing the response

The API returns structured results that include:

Identified patient data (name, ID, date of birth)
Issuing laboratory data
Report date
List of results with LOINC code, value, unit, reference range, and abnormality indicator
Overall and per-result confidence scores
Optionally, a FHIR Bundle ready for import

Patient reconciliation

Before storing results in the EHR, it is necessary to reconcile the patient identity extracted from the report with the patient registered in the EHR. This step is critical to avoid storing results in the wrong clinical record.

Reconciliation can be performed via:

Exact identifier: if the report contains the EHR medical record number, matching is direct.
Demographics: comparison of name, date of birth, and other identifiers.
MPI (Master Patient Index): query the organization's master patient index.
Manual confirmation: in cases of ambiguity, an operator confirms the association.

def reconcile_patient(
    extracted_name: str | None,
    extracted_id: str | None,
    extracted_dob: str | None,
    ehr_client: EHRClient,
) -> str | None:
    """Reconcile the extracted patient with the EHR record."""
    # Attempt 1: search by exact identifier
    if extracted_id:
        patient = ehr_client.find_patient_by_id(extracted_id)
        if patient:
            return patient.id

    # Attempt 2: search by demographics
    if extracted_name and extracted_dob:
        candidates = ehr_client.search_patients(
            name=extracted_name,
            birthdate=extracted_dob,
        )
        if len(candidates) == 1:
            return candidates[0].id

    # No unambiguous match: requires manual review
    return None

Duplicate detection

Lab reports may arrive through multiple channels (fax, email, lab portal), which can generate duplicates. The system must detect and manage duplicates before storing results:

Compare the combination of laboratory + date + test list with existing results
Calculate a hash of the document content to detect identical documents
Check if a FHIR DiagnosticReport with the same external identifier already exists

Storage in the EHR

Validated and reconciled results are stored in the EHR. Depending on the EHR architecture, this may involve:

FHIR insertion: send the FHIR Bundle to the EHR's FHIR server via a POST transaction.
HL7 v2 insertion: generate an ORU^R01 message with the results and send it to the EHR's interface engine.
Proprietary API: use the EHR's specific API to create lab observations.
Direct database: in exceptional cases, insert directly into the EHR's result tables (not recommended).

Error handling and special cases

Low-confidence results

When the extraction API reports low confidence for a result, the system should route that case to a human review queue instead of automatically inserting it into the EHR. The review interface should display the original document alongside the extracted results, allowing the operator to correct or confirm each field.

Partially processed reports

If the API cannot process some pages or sections of the report (for example, due to insufficient image quality), the system should store successfully processed results and mark the report as "partially processed," generating an alert for an operator to review the unprocessed sections.

Unrecognized report formats

Some lab reports may have formats that the extraction system does not recognize. In these cases, the flow should degrade gracefully: the document is stored as a patient attachment in the EHR and flagged for manual processing, without blocking the overall flow.

Security and compliance

Authentication and authorization

Communication between the EHR and the extraction API must be protected with:

TLS 1.2+: mandatory encryption in transit.
API Key / OAuth 2.0: client authentication.
IP whitelisting: access restriction by source IP (recommended in hospital environments).
Rate limiting: protection against abusive or accidental overuse.

Audit trail

Every extraction operation must generate an audit record that includes:

Input document identifier
Request and response timestamps
Identifier of the user session that initiated the process
Extraction result (success, partial failure, error)
Patient reconciliation result
EHR storage result

This audit trail is a GDPR requirement and a best practice for healthcare data governance.

Data processing agreement

Integration with an external API to process health data requires a data processing agreement compliant with Art. 28 of the GDPR. This agreement must establish the responsibilities of each party, applicable security measures, and data retention and deletion conditions.

Testing and validation

Testing phase

Before putting the integration into production, a testing period covering the following is essential:

Unit tests: verify that each integration component works correctly in isolation.
Integration tests: verify the complete flow from document upload to EHR storage.
Load tests: verify that the system handles expected volume without degradation.
Failure tests: verify behavior when the API is unavailable, when the network fails, or when the EHR rejects the insertion.

Parallel validation

The best practice is to run the system in parallel mode for 1-3 months: reports are processed both manually and automatically, and results are compared to measure the concordance rate. A concordance rate above 98% on the main fields (LOINC, value, unit) indicates the system is ready for production.

Continuous monitoring

In production, the system should be continuously monitored to detect performance degradation:

Extraction success rate
Average processing latency
Percentage of results sent to human review
Concordance rate with manual entries (if parallel flow exists)

Conclusion

Connecting an EHR to a lab extraction API like MedExtract is a project with a clear return: it automates a time-intensive, error-prone manual process and enables the interoperability that regulations like the EHDS are making mandatory. The keys to success are choosing the right integration pattern for the existing architecture, implementing robust patient reconciliation, and maintaining traceability and security at every step of the data flow.

For organizations just getting started, the Python pipeline tutorial provides a functional starting point that can be adapted to the specific needs of each EHR.

Standards

January 10, 20262 min read

FHIR R4 Integration Guide for EHR Systems

A practical overview of integrating FHIR R4 resources into EHR systems, focusing on DiagnosticReport and Observation bundles from lab data.

fhirehrintegration

MedExtract Team

Technical

February 20, 202610 min read

Building a Lab Report Processing Pipeline in Python

Step-by-step tutorial to build a lab data extraction pipeline with Python, from PDF to FHIR R4.

pythonpipelinetutorial

MedExtract Team

StandardsGuide

March 10, 202614 min read

Complete Guide to LOINC Code Extraction

Everything about automated LOINC code extraction from lab reports: process, challenges, dictionaries, and best practices.

loincextractionlab-data

MedExtract Team

Integration patterns

There are three main patterns for integrating an extraction API with an EHR. The choice depends on the existing architecture, document volume, and latency requirements.

Pattern 1: direct point-to-point integration

In this pattern, the EHR calls the extraction API directly when it receives a new document. It is the simplest pattern but also the most tightly coupled.

User uploads PDF → EHR → MedExtract API → EHR stores results

This pattern works well for low volumes (fewer than 50 daily reports) and when the EHR has the ability to make outbound HTTP calls.

Pattern 2: message queue integration

An integration middleware (integration engine) receives documents from the EHR, sends them to the extraction API, and returns results to the EHR. This pattern decouples both systems.

EHR → Message queue → Worker → MedExtract API → Queue → EHR

Advantages: decoupling, fault tolerance, automatic retries, horizontal scalability. Disadvantages: greater complexity, additional latency, messaging infrastructure required.

This is the recommended pattern for medium and high volumes (more than 50 daily reports). Healthcare integration engines like Mirth Connect, Rhapsody, or Iguana implement this pattern natively.

Pattern 3: native FHIR integration

If the EHR supports FHIR R4 natively, the integration can be done entirely through FHIR resources. The extraction API produces a FHIR Bundle that is sent directly to the EHR's FHIR endpoint.

Document → MedExtract API → FHIR Bundle → EHR's FHIR Server

Advantages: full standardization, portability, EHDS compliance. Disadvantages: requires the EHR to have an operational FHIR server, which is not always the case.

This pattern is ideal for organizations with mature FHIR infrastructure and is the model that the European healthcare ecosystem will converge toward with EHDS implementation.

Detailed data flow

Document reception

The first step is document capture. Lab reports can arrive at the EHR through multiple channels:

Integrated scanner: healthcare staff scan the paper report directly from the EHR interface.
Manual upload: the user uploads a PDF or image file through a web form.
Email: a monitored mailbox receives reports from external labs and redirects them to the system.
HL7 integration: the lab sends an HL7 v2 message with the report attached as an embedded document.
FHIR DocumentReference: the lab publishes a DocumentReference resource with the report as an attachment.

Sending to the extraction API

Once the document is captured, it is sent to the MedExtract API for processing. The basic call is a multipart HTTP POST request:

import httpx

async def send_to_extraction(
    file_bytes: bytes,
    filename: str,
    content_type: str,
) -> dict:
    """Send a document to the MedExtract API."""
    async with httpx.AsyncClient(timeout=60.0) as client:
        response = await client.post(
            "https://api.medextract.io/v1/extract",
            headers={
                "Authorization": f"Bearer {API_KEY}",
            },
            files={
                "file": (filename, file_bytes, content_type),
            },
            data={
                "output_format": "fhir_bundle",
                "include_confidence": "true",
                "language": "es",
            },
        )
        response.raise_for_status()
        return response.json()

Processing the response

The API returns structured results that include:

Identified patient data (name, ID, date of birth)
Issuing laboratory data
Report date
List of results with LOINC code, value, unit, reference range, and abnormality indicator
Overall and per-result confidence scores
Optionally, a FHIR Bundle ready for import

Patient reconciliation

Reconciliation can be performed via:

Exact identifier: if the report contains the EHR medical record number, matching is direct.
Demographics: comparison of name, date of birth, and other identifiers.
MPI (Master Patient Index): query the organization's master patient index.
Manual confirmation: in cases of ambiguity, an operator confirms the association.

def reconcile_patient(
    extracted_name: str | None,
    extracted_id: str | None,
    extracted_dob: str | None,
    ehr_client: EHRClient,
) -> str | None:
    """Reconcile the extracted patient with the EHR record."""
    # Attempt 1: search by exact identifier
    if extracted_id:
        patient = ehr_client.find_patient_by_id(extracted_id)
        if patient:
            return patient.id

    # Attempt 2: search by demographics
    if extracted_name and extracted_dob:
        candidates = ehr_client.search_patients(
            name=extracted_name,
            birthdate=extracted_dob,
        )
        if len(candidates) == 1:
            return candidates[0].id

    # No unambiguous match: requires manual review
    return None

Duplicate detection

Lab reports may arrive through multiple channels (fax, email, lab portal), which can generate duplicates. The system must detect and manage duplicates before storing results:

Compare the combination of laboratory + date + test list with existing results
Calculate a hash of the document content to detect identical documents
Check if a FHIR DiagnosticReport with the same external identifier already exists

Storage in the EHR

Validated and reconciled results are stored in the EHR. Depending on the EHR architecture, this may involve:

FHIR insertion: send the FHIR Bundle to the EHR's FHIR server via a POST transaction.
HL7 v2 insertion: generate an ORU^R01 message with the results and send it to the EHR's interface engine.
Proprietary API: use the EHR's specific API to create lab observations.
Direct database: in exceptional cases, insert directly into the EHR's result tables (not recommended).

Error handling and special cases

Low-confidence results

Partially processed reports

Unrecognized report formats

Security and compliance

Authentication and authorization

Communication between the EHR and the extraction API must be protected with:

TLS 1.2+: mandatory encryption in transit.
API Key / OAuth 2.0: client authentication.
IP whitelisting: access restriction by source IP (recommended in hospital environments).
Rate limiting: protection against abusive or accidental overuse.

Audit trail

Every extraction operation must generate an audit record that includes:

Input document identifier
Request and response timestamps
Identifier of the user session that initiated the process
Extraction result (success, partial failure, error)
Patient reconciliation result
EHR storage result

This audit trail is a GDPR requirement and a best practice for healthcare data governance.

Data processing agreement

Testing and validation

Testing phase

Before putting the integration into production, a testing period covering the following is essential:

Unit tests: verify that each integration component works correctly in isolation.
Integration tests: verify the complete flow from document upload to EHR storage.
Load tests: verify that the system handles expected volume without degradation.
Failure tests: verify behavior when the API is unavailable, when the network fails, or when the EHR rejects the insertion.

Parallel validation

Continuous monitoring

In production, the system should be continuously monitored to detect performance degradation:

Extraction success rate
Average processing latency
Percentage of results sent to human review
Concordance rate with manual entries (if parallel flow exists)

Conclusion

For organizations just getting started, the Python pipeline tutorial provides a functional starting point that can be adapted to the specific needs of each EHR.

Standards

January 10, 20262 min read