Validating Supplier Data Payloads with Pydantic Models Permalink to this section

↑ Part of Schema Validation Using Pydantic.

Supplier payloads rarely arrive clean. EDI translations, CSV exports, and REST responses from tier-1 and tier-2 vendors consistently introduce type drift, missing mandatory fields, malformed dates, and precision mismatches in quantities and unit costs. This page addresses one precise scenario: a purchase-order or advance-ship-notice line item has already been structurally parsed into a dictionary, and you need a runtime contract that rejects the bad rows at the payload boundary — before they reach three-way matching or your inventory ledger — without stalling the rest of the batch. It is the implementation-level companion to the contract-design decisions in the parent Schema Validation Using Pydantic reference, which itself sits inside the broader Ingestion & Parsing Workflows for Supply Chain Data architecture.

Operational Trigger Signals Permalink to this section

Add an explicit Pydantic validation layer at the payload boundary — rather than scattering isinstance guards through business logic — when your ingestion telemetry shows any of these measurable conditions across consecutive runs:

Reconciliation rejects exceed ~2% of received lines for reasons that trace back to type or format drift (a quantity arriving as "150.00" instead of 150, a date as 06/15/24 instead of ISO 8601) rather than genuine business disputes.
Undocumented fields appear in supplier feeds. A vendor adds promo_code or pallet_id to their API contract without notice; without extra="forbid" these pass silently and mask a version bump you needed to know about.
Over-receipt slips into the ledger. A quantity_shipped greater than quantity_ordered reaches allocation because no cross-field rule fired at ingestion, forcing manual reversal downstream.
GTIN/SKU corruption surfaces in master data — identifiers that fail their GS1 check digit reach the catalog and break joins against the ERP item master.
Batch latency climbs past your SLA because line items are validated one object at a time in a Python loop instead of in a single compiled pass.
A transient crash mid-batch produces duplicate inventory allocations because reprocessing is not idempotent and the same payload is applied twice.

When the constraint is feed volume (thousands of concurrent small payloads) rather than per-record correctness, pair this contract with the fan-out model in Async Batch Processing for High-Volume Feeds; the procedure here governs the correctness of each record once it lands.

Step-by-Step Implementation Permalink to this section

Work through the steps in order. Each one closes a specific failure vector, and together they turn an untrusted dict into a record the matching and inventory stages are allowed to consume.

Step 1 — Define a strict base model for supplier feeds Permalink to this section

Establish a base configuration that enforces explicit type checking and disables silent coercion that corrupts reconciliation math. The model_config below is the policy every supplier line item must satisfy.

PYTHON

import logging
from datetime import date
from decimal import Decimal
from typing import Optional

from pydantic import BaseModel, ConfigDict, Field, field_validator, model_validator

logging.basicConfig(level=logging.INFO, format="%(asctime)s | %(levelname)s | %(message)s")
logger = logging.getLogger("ingestion.pydantic")


class SupplierLineItem(BaseModel):
    model_config = ConfigDict(strict=True, extra="forbid", validate_default=True)

    line_number: int = Field(ge=1, description="Sequential PO/ASN line number")
    sku: str = Field(pattern=r"^[A-Z0-9\-]{6,20}$", description="Internal SKU format")
    supplier_sku: Optional[str] = Field(default=None, max_length=25)
    quantity_ordered: Decimal = Field(ge=0, description="Requested quantity")
    quantity_shipped: Optional[Decimal] = Field(default=None, ge=0)
    unit_cost: Decimal = Field(ge=0, description="Per-unit cost in base currency")
    currency_code: str = Field(pattern=r"^[A-Z]{3}$", description="ISO 4217 currency")
    expected_delivery: date
    actual_delivery: Optional[date] = None

    @field_validator("quantity_ordered", "unit_cost", mode="before")
    @classmethod
    def quantize_financials(cls, v: object) -> Decimal:
        """Normalize incoming numerics to Decimal with 4 decimal places."""
        if isinstance(v, (int, float, str)):
            v = Decimal(str(v))
        return v.quantize(Decimal("0.0001"))

    @model_validator(mode="after")
    def reconcile_quantities(self) -> "SupplierLineItem":
        """Ensure shipped quantity never exceeds ordered quantity."""
        if self.quantity_shipped is not None and self.quantity_shipped > self.quantity_ordered:
            logger.warning("over_receipt sku=%s line=%d", self.sku, self.line_number)
            raise ValueError("quantity_shipped cannot exceed quantity_ordered")
        return self

The three configuration choices carry most of the weight. strict=True blocks implicit string-to-number and string-to-date conversions that mask upstream formatting errors; extra="forbid" rejects payloads carrying undocumented fields, which usually signal contract drift; and validate_default=True forces Optional fields to pass validation even when explicitly supplied as None. A BeforeValidator (mode="before") runs ahead of type coercion so it can normalize a numeric string into a Decimal, while the model_validator(mode="after") is the only correct place for a cross-field rule because it can see quantity_ordered and quantity_shipped together.

Step 2 — Add supply-chain-specific validators Permalink to this section

Standard type checks are insufficient for procurement and logistics data. Enforce domain rules — valid GTIN/EAN check digits, ISO 4217 currency codes, cross-field reconciliations — directly inside the parsing phase. The GTIN check below can live in the base model or in a dedicated subclass when only some feeds carry barcode-grade identifiers.

PYTHON

import re


class SupplierLineItemWithGTIN(SupplierLineItem):
    """Extends the base contract with GTIN-12/13/14 check-digit enforcement."""

    @field_validator("sku")
    @classmethod
    def validate_gtin_checksum(cls, v: str) -> str:
        """Validate a GTIN check digit using the GS1 modulo-10 algorithm."""
        if not re.match(r"^\d{12,14}$", v):
            raise ValueError("Invalid GTIN format — expected 12-14 digits")
        digits = [int(d) for d in v[:-1]]
        total = sum(d * (3 if i % 2 else 1) for i, d in enumerate(digits))
        check_digit = (10 - (total % 10)) % 10
        if check_digit != int(v[-1]):
            raise ValueError("GTIN checksum mismatch")
        return v

Validating the check digit at the boundary prevents master-data corruption at the catalog level and keeps the identifier aligned with GS1 expectations so downstream ERP and WMS systems accept it without manual cleanup. The same contract-first discipline applies when the source is an invoice rather than a barcode — see How to Map EDI 810 Invoices to Internal PO Schemas for the field-mapping layer that feeds these models.

Step 3 — Extract structured errors and route to quarantine Permalink to this section

When validation fails, Pydantic raises ValidationError. Parse its structured .errors() output into a quarantine record instead of letting a single bad row abort the batch. This is what makes field-level triage and the recovery taxonomy in the next section possible.

PYTHON

import json

from pydantic import ValidationError


def parse_validation_error(err: ValidationError, raw_payload: dict) -> dict:
    """Convert a Pydantic ValidationError into a structured quarantine record."""
    return {
        "raw_payload": raw_payload,
        "error_count": len(err.errors()),
        "failed_fields": [
            {
                "field": ".".join(str(loc) for loc in e["loc"]),
                "error_type": e["type"],   # e.g. 'string_pattern_mismatch', 'greater_than_equal'
                "message": e["msg"],
                "input_value": e["input"],
            }
            for e in err.errors()
        ],
    }


raw_payload = {  # one row pulled from a supplier feed
    "line_number": 1,
    "sku": "ACME-12345",
    "quantity_ordered": "10",
    "unit_cost": "4.99",
    "currency_code": "USD",
    "expected_delivery": "2024-06-15",
}
try:
    validated_item = SupplierLineItem.model_validate(raw_payload)
except ValidationError as e:
    quarantine_record = parse_validation_error(e, raw_payload)
    logger.error("quarantined line=%s fields=%d",
                 raw_payload.get("line_number"), quarantine_record["error_count"])
    # push to a Kafka dead-letter topic or an S3 quarantine prefix

Step 4 — Validate batches for throughput Permalink to this section

Validating thousands of line items one object at a time adds unacceptable latency in high-volume EDI or polling scenarios. Use a TypeAdapter with validate_json to run the whole array through Pydantic’s Rust-backed core in a single pass, then fall back to per-record validation only to isolate the failures inside a rejected batch.

PYTHON

from typing import List

from pydantic import TypeAdapter

LineItemBatch = TypeAdapter(List[SupplierLineItem])  # compile the adapter once, reuse it


def validate_batch(json_payload: bytes) -> tuple[List[SupplierLineItem], List[dict]]:
    """Validate a JSON array of line items, isolating any failures to the DLQ."""
    valid_items: List[SupplierLineItem] = []
    quarantined: List[dict] = []
    try:
        valid_items = LineItemBatch.validate_json(json_payload)  # fast path
    except ValidationError:
        for item in json.loads(json_payload):  # slow path: isolate the bad rows
            try:
                valid_items.append(SupplierLineItem.model_validate(item))
            except ValidationError as ve:
                quarantined.append(parse_validation_error(ve, item))
    logger.info("batch_validated valid=%d quarantined=%d", len(valid_items), len(quarantined))
    return valid_items, quarantined

The Decimal quantization in Step 1 keeps cost arithmetic free of floating-point drift; carry that precision through to the match engine so it lines up with the bands in Setting Quantity and Price Tolerance Windows. When the upstream payload is a nested barcode or shipment document, the flattening step in XML to JSON Conversion with xmltodict produces the dict these models validate, and tabular feeds arrive via Parsing CSV and Excel Feeds with Pandas.

Step 5 — Make recovery idempotent Permalink to this section

Partial failures require deterministic, replay-safe recovery so a transient crash never double-applies an allocation.

Hash-based deduplication. Compute a SHA-256 over the canonical raw payload before validation and store it behind a unique index (Redis or a Postgres constraint). Reject re-submissions with an identical hash unless the row is explicitly flagged as a correction.
Stateful quarantine tables. Persist each quarantined record with a status of PENDING_REVIEW, CORRECTED, or REJECTED. Let procurement ops patch fields, then re-run model_validate against the corrected payload.
Versioned audit trail. Log every pass and fail with the exact schema version used, so when a supplier changes their contract the drift is immediately traceable to a specific model revision.

Configuration Reference Permalink to this section

These are the knobs that decide whether the contract rejects genuine errors without flooding your exception backlog with false positives. Pin them per feed: an internal service emitting typed JSON tolerates strict=True, while a legacy EDI portal needs lax coercion at the field boundary.

Parameter	Recommended default	Accepted values	Rationale
`strict`	`True`	`True` / `False`	`True` for typed internal JSON; `False` to coerce string numerics from legacy EDI
`extra`	`forbid`	`forbid` / `ignore` / `allow`	`forbid` surfaces supplier contract drift instead of swallowing it
`validate_default`	`True`	`True` / `False`	Forces explicit `None` on optional fields through validation
`Decimal` quantize	`0.0001`	`0.01` / `0.0001`	Match the precision band of the reconciliation engine
`quantity_ordered`	`Field(ge=0)`	`ge=0` / `gt=0`	`gt=0` rejects zero-qty lines if your feed never sends them
`currency_code`	`^[A-Z]{3}$`	ISO 4217 pattern	Blocks malformed or lowercase currency codes
Batch size	`10_000`	`1_000`–`50_000`	Chunk size for `validate_json`; lower it for wide line items
GTIN check	per-feed subclass	base / subclass	Apply check-digit validation only to barcode-grade feeds

Never relax extra to ignore or flip strict to False just to clear a backlog — that converts a contract-drift alert into silent data loss.

Debugging & Recovery Permalink to this section

When validation rejects climb or a batch stalls, isolate the failure vector deterministically and route the offending record to a dead-letter queue (DLQ) rather than re-running the whole feed by hand. Tag every quarantined record with one reason from the taxonomy below and emit the audit fields alongside it.

TYPE_DRIFT — int_parsing, decimal_parsing, or date_from_datetime_parsing error types. A supplier started sending "00045" where an int is expected. Decide per field whether to relax to lax coercion or push the fix upstream.
SCHEMA_DRIFT — extra_forbidden errors. The vendor added a field; review the contract, extend the model deliberately, and bump the schema version rather than switching extra to ignore.
CONSTRAINT_VIOLATION — greater_than_equal, string_pattern_mismatch. A negative quantity, a malformed SKU, or a bad currency code. These are genuine data-quality rejects and belong in the review queue.
CROSS_FIELD_FAIL — the reconcile_quantities over-receipt rule fired. Hold the line for procurement confirmation; do not auto-allocate.
GTIN_CHECKSUM — check-digit mismatch. Almost always a transcription error in the vendor’s master data; flag for correction, never auto-pad.

Emit these audit fields to append-only storage so a SOX or internal review can replay any ingestion decision: payload_hash, schema_version, line_number, failed_fields, error_type, status, quarantined_at, and corrected_at. The hash-dedup gate from Step 5 is what makes recovery cheap — reprocessing skips already-applied payloads and only re-validates the corrected tail. Monitor quarantine_rate (rejected ÷ received) per supplier and per error reason; a sudden spike in SCHEMA_DRIFT for one vendor is the earliest signal of an undocumented API change.

FAQ Permalink to this section

Should I use strict mode for every supplier feed? Permalink to this section

No. Reserve strict=True for internal services and modern partners that already emit correctly typed JSON. Legacy EDI translators and supplier portals overwhelmingly send numerics as strings ("1200.00", "00045"), and strict mode rejects all of them as type errors. For those feeds, set strict=False so trivial formatting is coerced at the field boundary, and keep your business rules in @field_validator/@model_validator so genuine violations still fail loudly. Decide per feed, not globally.

How do I keep a single bad line from failing the whole batch? Permalink to this section

Validate the array with a TypeAdapter.validate_json fast path, and on any ValidationError fall back to per-record model_validate so each failure is isolated to a quarantine record while the valid rows proceed. The fast path gives you Pydantic’s compiled-core throughput in the common all-valid case; the slow path runs only for batches that actually contain a bad row, so you pay the per-object cost just when you need precise blame.

Why are my Decimal quantities drifting in cost calculations? Permalink to this section

Because a value entered Python as a float before reaching the model. Always coerce through Decimal(str(v)) in a mode="before" validator and quantize to a fixed scale, as in Step 1. Then carry that same scale all the way to the match engine so the precision your contract guarantees survives into the tolerance comparison rather than being silently re-widened to binary float.

Validating Supplier Data Payloads with Pydantic Models Permalink to this section#

Operational Trigger Signals Permalink to this section#

Step-by-Step Implementation Permalink to this section#

Step 1 — Define a strict base model for supplier feeds Permalink to this section#

Step 2 — Add supply-chain-specific validators Permalink to this section#

Step 3 — Extract structured errors and route to quarantine Permalink to this section#

Step 4 — Validate batches for throughput Permalink to this section#

Step 5 — Make recovery idempotent Permalink to this section#

Configuration Reference Permalink to this section#

Debugging & Recovery Permalink to this section#

FAQ Permalink to this section#

Should I use strict mode for every supplier feed? Permalink to this section#

How do I keep a single bad line from failing the whole batch? Permalink to this section#

Why are my Decimal quantities drifting in cost calculations? Permalink to this section#

Related Permalink to this section#

Validating Supplier Data Payloads with Pydantic Models Permalink to this section

Operational Trigger Signals Permalink to this section

Step-by-Step Implementation Permalink to this section

Step 1 — Define a strict base model for supplier feeds Permalink to this section

Step 2 — Add supply-chain-specific validators Permalink to this section

Step 3 — Extract structured errors and route to quarantine Permalink to this section

Step 4 — Validate batches for throughput Permalink to this section

Step 5 — Make recovery idempotent Permalink to this section

Configuration Reference Permalink to this section

Debugging & Recovery Permalink to this section

FAQ Permalink to this section

Should I use strict mode for every supplier feed? Permalink to this section

How do I keep a single bad line from failing the whole batch? Permalink to this section

Why are my Decimal quantities drifting in cost calculations? Permalink to this section

Related Permalink to this section