Async Batch Processing for High-Volume Feeds Permalink to this section

↑ Part of Ingestion & Parsing Workflows for Supply Chain Data.

Supply chain reconciliation pipelines routinely ingest millions of line items from EDI 850/810 transactions, ASN manifests, supplier-portal exports, and 3PL polling endpoints. The work is overwhelmingly I/O-bound: the pipeline spends most of its wall-clock time waiting on HTTP responses, SFTP transfers, and database round-trips rather than burning CPU. Synchronous, thread-blocking ingestion collapses under this profile — connection pools exhaust, network latency stacks linearly across thousands of suppliers, and a single slow trading partner stalls the entire run.

This is a throughput-versus-stability decision, not a one-line await rewrite. Asynchronous batch processing multiplexes thousands of in-flight requests on a single event loop, but uncontrolled concurrency simply moves the failure from “too slow” to “too aggressive”: you trip supplier rate limits, saturate the database connection pool, and OOM the worker by buffering every payload at once. The patterns below are implementation-ready — bounded asyncio concurrency, a connection pool sized to match the semaphore, streaming to avoid heap blowups, idempotent batch keys, and the dead-letter-queue (DLQ) recovery flow that keeps a noisy multi-supplier stream auditable. Within the broader Ingestion & Parsing Workflows for Supply Chain Data reference, async batching is the high-throughput acquisition layer that feeds downstream validation and the matching engine without saturating shared infrastructure.

Core Concept & Decision Criteria Permalink to this section

Async batch processing relies on Python’s cooperative multitasking model to interleave I/O-bound operations without the memory and context-switch overhead of OS threads or process forks. Instead of processing records sequentially or spawning unbounded workers, you partition each feed into fixed-size chunks and dispatch them concurrently with asyncio.TaskGroup (Python 3.11+) or asyncio.gather(), while a bounded asyncio.Semaphore caps the number of simultaneously open connections. The semaphore is the entire safety story: it pins concurrency to whatever the most constrained downstream resource allows — supplier API quota, database pool size, or egress bandwidth — so the loop runs hot without thrashing.

The decision signal that governs whether you reach for async at all is the ratio of wait time to compute time per record. If your pipeline is dominated by network or disk wait (the usual case for multi-supplier ingestion), async wins decisively. If it is dominated by CPU-bound transformation — heavy regex, large dataframe joins, decompression — async alone buys nothing, because coroutines share one core; that work belongs in a ProcessPoolExecutor or offloaded with loop.run_in_executor. The second signal is concurrency safety: the moment you have a per-supplier rate limit or a fixed database pool, you need bounded async, never naive gather() over ten thousand coroutines.

Dimension	Synchronous (thread-per-feed)	Async batch (bounded event loop)
Best for	CPU-bound parsing, <50 small feeds	I/O-bound acquisition, 100s–1000s of feeds
Concurrency cost	~8 MB stack + OS context switch per thread	~KB per coroutine, no kernel switch
Throughput ceiling	GIL + thread-pool size	Event loop + semaphore limit
Failure isolation	Thread crash can leak the pool	`return_exceptions=True` isolates per chunk
Backpressure	Manual, queue-based	Native via `Semaphore` / bounded `Queue`
Connection reuse	New session per thread is common	Single pooled `ClientSession`, keep-alive
Right tool when	Compute dominates wait	Wait dominates compute

A useful sizing heuristic comes from Little’s Law: the number of concurrent in-flight requests L you need to hit a target arrival rate λ (requests/sec) at average per-request latency W (seconds) is

L = \lambda \cdot W

So to sustain 400 chunk fetches/sec against suppliers that average 250 ms latency, you need roughly L = 400 × 0.25 = 100 concurrent slots — but only if every downstream pool (HTTP connector, DB) can also absorb 100. The semaphore limit should be the minimum of that computed L and the smallest downstream pool, never larger.

Implementation Permalink to this section

The ingestor below is a bounded, restartable acquisition stage. It chunks the feed URL list, fetches concurrently under a semaphore, isolates per-chunk failures with return_exceptions=True, and emits a typed BatchResult carrying the audit counts the recovery section depends on. The TCPConnector limit is deliberately bound to the same value as the semaphore so the connection pool can never become the silent bottleneck.

PYTHON

import asyncio
import logging
from dataclasses import dataclass, field
from typing import Any, Dict, List

import aiohttp

logger = logging.getLogger("supply_chain.ingest.async")


@dataclass
class BatchResult:
    batch_id: str
    success_count: int = 0
    error_count: int = 0
    failed_records: List[Dict[str, Any]] = field(default_factory=list)
    processed_bytes: int = 0


class AsyncFeedIngestor:
    """Bounded-concurrency acquisition stage for high-volume supplier feeds."""

    def __init__(self, concurrency_limit: int = 20, chunk_size: int = 500) -> None:
        self.concurrency_limit = concurrency_limit
        self.chunk_size = chunk_size
        self.semaphore = asyncio.Semaphore(concurrency_limit)
        self.session: aiohttp.ClientSession | None = None

    async def __aenter__(self) -> "AsyncFeedIngestor":
        # Bind the connector limit to the semaphore so the pool is never the bottleneck.
        connector = aiohttp.TCPConnector(
            limit=self.concurrency_limit, keepalive_timeout=30
        )
        self.session = aiohttp.ClientSession(connector=connector)
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
        if self.session:
            await self.session.close()

    async def _fetch_chunk(self, url: str, headers: Dict[str, str]) -> bytes:
        """Fetch one chunk under the concurrency cap; raise on HTTP error."""
        assert self.session is not None
        async with self.semaphore:
            timeout = aiohttp.ClientTimeout(total=45)
            async with self.session.get(url, headers=headers, timeout=timeout) as resp:
                resp.raise_for_status()
                return await resp.read()

    async def process_feed(
        self, feed_urls: List[str], headers: Dict[str, str], batch_id: str
    ) -> BatchResult:
        """Dispatch all chunk fetches concurrently with per-chunk exception isolation."""
        result = BatchResult(batch_id=batch_id)

        tasks = [self._fetch_chunk(url, headers) for url in feed_urls]
        responses = await asyncio.gather(*tasks, return_exceptions=True)

        for url, resp in zip(feed_urls, responses):
            if isinstance(resp, Exception):
                result.error_count += 1
                result.failed_records.append(
                    {"url": url, "error": str(resp), "type": type(resp).__name__}
                )
                logger.warning("chunk_fetch_failed batch=%s url=%s err=%s", batch_id, url, resp)
            else:
                result.success_count += 1
                result.processed_bytes += len(resp)

        logger.info(
            "batch_complete batch=%s ok=%d err=%d mb=%.1f",
            batch_id, result.success_count, result.error_count,
            result.processed_bytes / (1024 ** 2),
        )
        return result

return_exceptions=True is the load-bearing detail: without it, the first failed supplier cancels every sibling coroutine and you lose an entire batch over one bad endpoint. With it, each chunk’s outcome is independent, and the failures land in failed_records ready for DLQ routing. For a deeper walkthrough of the event loop mechanics, task scheduling, and lower-level coroutine primitives, see Implementing Asyncio for Concurrent Batch Ingestion. The official asyncio documentation is the authoritative reference for event loop configuration and coroutine scheduling.

Configuration & Threshold Calibration Permalink to this section

Concurrency is not a single global constant; it must be tiered per trading partner, because supplier infrastructure varies by orders of magnitude. A tier-1 distributor’s API absorbs hundreds of concurrent requests, while a small supplier’s portal returns 429 Too Many Requests above five. Drive these from a per-supplier config table rather than hard-coded constants, and size the chunk_size so a single chunk fits comfortably in memory after parsing.

Parameter	Recommended range	Rationale
`concurrency_limit` (tier-1 API)	50–100	Match Little’s Law `L`, capped by DB pool size
`concurrency_limit` (tier-3 portal)	3–8	Avoid `429`; small portals throttle aggressively
`chunk_size` (rows/request)	250–1000	Bound per-chunk memory after JSON/dataframe expansion
`TCPConnector.limit`	== `concurrency_limit`	Pool must not be smaller than the semaphore
`ClientTimeout.total`	30–60 s	Long enough for large ASN payloads, short enough to fail fast
`keepalive_timeout`	30 s	Reuse sockets across chunks; avoid handshake churn
Retry attempts	3, exponential backoff	Recover transient `5xx`/timeout without hammering
`iter_chunked` threshold	stream if payload > 50 MB	Prevents heap blowup on multi-year PO histories

A practical rule for the relationship between the connector and the semaphore: the connector limit must always be greater than or equal to the semaphore value. If the connector is smaller, coroutines acquire a semaphore slot and then immediately block waiting for a connection, producing latency that looks like a remote slowdown but is entirely self-inflicted. For per-supplier price and quantity comparison after acquisition, calibrate the downstream policy with Setting Quantity and Price Tolerance Windows so the throughput tier and the matching tier stay consistent.

Orchestration & Integration Permalink to this section

Async batching is the acquisition stage of the pipeline; it should hand off typed, validated payloads and nothing more. Upstream, an orchestrator (Airflow, Prefect, a cron-driven worker, or a queue consumer) enumerates the feed URLs for a run and assigns a batch_id. Downstream, the raw bytes returned by each chunk still require format-specific normalization before reconciliation: tabular exports route through the vectorized readers in Parsing CSV and Excel Feeds with Pandas, while hierarchical EDI and supplier documents pass through XML to JSON Conversion with xmltodict. Every normalized record is then enforced against a typed contract via Schema Validation Using Pydantic before it is allowed near the matching engine.

The non-negotiable orchestration property is idempotency. Networks are at-least-once: a retried poll, a redelivered webhook, or an orchestrator restart will re-fetch the same chunk. Derive a deterministic idempotency key — supplier_id + feed_window + content hash of the chunk — and dedupe at the staging-write boundary so a replay produces zero duplicate inventory rows. Because the ingestor is restartable and writes results keyed by batch_id, a failed run can be replayed from its last checkpoint without re-pulling already-staged chunks. Timestamps inside the payloads frequently arrive in supplier-local time; normalize them to a single UTC anchor at the staging boundary using Timezone Normalization for Global Supply Chains so the feed window and the watermark agree.

High-volume async services also scale horizontally for seasonal procurement spikes and supplier-onboarding waves. Run multiple replicas behind a queue, scaling on queue depth and event-loop saturation rather than CPU (an I/O-bound loop shows low CPU even when fully loaded). Liveness probes, graceful shutdown that drains in-flight coroutines, and distributed tracing across the acquisition-to-staging hop are mandatory for SLA compliance during rolling updates.

Debugging & Pipeline Recovery Permalink to this section

Most async ingestion incidents reduce to three classes: connection-pool exhaustion, unbounded memory growth, and leaked sockets. Each has a specific signal and a specific fix.

Connection-pool exhaustion. Symptom: latency climbs while suppliers report normal response times; tasks pile up acquiring connections. Watch connector._acquired and connector._acquired_per_host; alert when sustained acquisition approaches the connector limit. Fix: raise the connector limit to match the semaphore, or lower the semaphore to match the smallest downstream pool.
Memory pressure / OOM. Symptom: RSS grows monotonically across a run, then the worker is OOM-killed mid-batch. Cause: large JSON payloads held in memory across asyncio.gather. Fix: stream large bodies with response.content.iter_chunked(8192) directly to disk or a broker, and process in bounded windows; profile with tracemalloc or objgraph in staging before production.
Leaked sockets (CLOSE_WAIT). Symptom: OSError: [Errno 24] Too many open files after hours of runtime. Cause: response bodies never consumed or closed. Fix: always use async with self.session.get(...) as resp: and read or release the body; never return an open response object.

Route every entry in failed_records to a DLQ keyed by batch_id so a run is fully auditable and replayable. A minimal but sufficient audit record per failed chunk is {batch_id, supplier_id, url, error_type, http_status, attempt, ts_utc}. The error_type field is what lets you triage at a glance: a wave of ClientConnectorError points at a supplier outage (retry later), a wave of 429/ClientResponseError points at your own concurrency being too high (lower the tier), and asyncio.TimeoutError clustered on one host points at a slow partner (raise that host’s timeout, not the global one). Records that exhaust retries stay in the DLQ for manual replay rather than silently dropping inventory.

FAQ Permalink to this section

Why does my async pipeline get slower when I raise the concurrency limit? Permalink to this section

You have almost certainly pushed the semaphore above a downstream ceiling — usually the TCPConnector.limit or the database connection pool. When the semaphore is larger than the connector limit, coroutines acquire a semaphore slot and then queue waiting for an actual socket, so added “concurrency” turns into added queueing latency. Bind TCPConnector.limit to the same value as the semaphore and size both to the smallest real downstream pool.

Should I use `asyncio.gather` or `asyncio.TaskGroup`? Permalink to this section

On Python 3.11+, prefer TaskGroup for its structured-concurrency semantics and cleaner cancellation. Use gather(*tasks, return_exceptions=True) when you specifically want per-task failure isolation without one bad chunk cancelling its siblings, which is the common case for multi-supplier acquisition. The two are not mutually exclusive — you can run a TaskGroup per supplier and isolate exceptions inside each.

Will async make my CPU-bound parsing faster? Permalink to this section

No. Coroutines share a single core, so heavy regex, large dataframe joins, or decompression do not speed up under asyncio — they block the event loop and stall every other in-flight request. Keep async for the I/O wait (fetching, SFTP, DB round-trips) and push CPU-bound transformation into a ProcessPoolExecutor via loop.run_in_executor.

How do I keep a redelivered feed from creating duplicate inventory rows? Permalink to this section

Treat the transport as at-least-once and make the staging write idempotent. Derive a deterministic key from supplier_id + feed window + a content hash of the chunk, and upsert on that key at the staging boundary. A retried poll or restarted orchestrator then collapses onto the same key, so a replay is a no-op instead of a duplicate.

What concurrency limit should I start with for a new supplier? Permalink to this section

Start low — 3 to 5 — and raise it only after you observe clean responses with no 429s, because small portals throttle far more aggressively than tier-1 APIs. For large APIs, compute the target from Little’s Law (L = λ × W) and then cap that at your database pool size. Always store the limit per supplier in config, never as a global constant.

Async Batch Processing for High-Volume Feeds Permalink to this section#

Core Concept & Decision Criteria Permalink to this section#

Implementation Permalink to this section#

Configuration & Threshold Calibration Permalink to this section#

Orchestration & Integration Permalink to this section#

Debugging & Pipeline Recovery Permalink to this section#

FAQ Permalink to this section#

Why does my async pipeline get slower when I raise the concurrency limit? Permalink to this section#

Should I use asyncio.gather or asyncio.TaskGroup? Permalink to this section#

Will async make my CPU-bound parsing faster? Permalink to this section#

How do I keep a redelivered feed from creating duplicate inventory rows? Permalink to this section#

What concurrency limit should I start with for a new supplier? Permalink to this section#

Related Permalink to this section#