Implementing Asyncio for Concurrent Batch Ingestion Permalink to this section

↑ Part of Async Batch Processing for High-Volume Feeds.

When a synchronous, thread-per-feed ingestor can no longer pull purchase-order acknowledgments, ASN manifests, and multi-warehouse inventory snapshots from hundreds of supplier endpoints inside a fixed reconciliation window, the fix is not more threads — it is a single event loop multiplexing thousands of in-flight requests. This page is the concrete asyncio build-out that sits beneath the strategy in Async Batch Processing for High-Volume Feeds: the exact event-loop and connection-pool wiring, the semaphore-bounded dispatch loop, retry with jittered backoff, and the debugging protocols that keep a non-deterministic pipeline auditable when it degrades.

Operational Trigger Signals Permalink to this section

Reach for a bounded asyncio ingestor — rather than threads or a synchronous loop — only when the workload actually shows these measurable signals across consecutive runs:

Wait-to-compute ratio > 5:1. Profiling shows the ingestor spends the large majority of wall-clock time blocked on HTTP, SFTP, or database round-trips rather than on CPU-bound parsing. If compute dominates instead, coroutines share one core and async buys nothing.
Endpoint fan-out ≥ 100 feeds per run. You are polling 100s–1000s of supplier URLs and the linear sum of per-request latency already breaches the run SLA (e.g. 500 endpoints × 250 ms = 125 s of pure serial wait).
SLA pressure under 15 minutes. Logistics or procurement ops must reconcile stock levels or PO state inside a tight window, and synchronous ingestion misses it during peak procurement hours.
Connection-pool thrash on the thread model. The synchronous ingestor opens a fresh session per thread and you see ClientOSError spikes, handshake churn, or thread-pool saturation (~8 MB stack per thread) as supplier count grows.
Per-supplier rate limits in play. At least one trading partner returns 429 Too Many Requests above a low concurrency, so you need a hard, tunable cap on simultaneous connections — exactly what a semaphore provides.

Step-by-Step Implementation Permalink to this section

Build the ingestor in four ordered stages. Each stage is independently testable, and the connection pool is deliberately bound to the same value as the semaphore so the pool can never become a silent secondary bottleneck.

Step 1 — Stand up a controlled event loop and connection pool. Supply chain APIs and legacy EDI gateways behave unpredictably under load, so the session is configured once with a bounded TCPConnector, an explicit total timeout, and enable_cleanup_closed to reclaim half-closed sockets. See the aiohttp client advanced configuration reference for socket lifecycle details.

PYTHON

import asyncio
import logging
from dataclasses import dataclass
from typing import Any, Dict, List, Optional

import aiohttp

logging.basicConfig(level=logging.INFO, format="%(asctime)s | %(levelname)s | %(message)s")
logger = logging.getLogger("supply_chain.ingest.asyncio")


@dataclass
class FeedConfig:
    max_concurrency: int = 50
    timeout_seconds: float = 30.0
    batch_size: int = 100
    retry_attempts: int = 3
    backoff_base: float = 1.5


class AsyncBatchIngestor:
    """Bounded-concurrency asyncio ingestor for high-volume supplier feeds."""

    def __init__(self, config: FeedConfig) -> None:
        self.config = config
        self.semaphore = asyncio.Semaphore(config.max_concurrency)
        self.session: Optional[aiohttp.ClientSession] = None

    async def __aenter__(self) -> "AsyncBatchIngestor":
        timeout = aiohttp.ClientTimeout(total=self.config.timeout_seconds)
        # Bind the connector limit to the semaphore so the pool is never the bottleneck.
        connector = aiohttp.TCPConnector(
            limit=self.config.max_concurrency,
            limit_per_host=10,
            enable_cleanup_closed=True,
            force_close=False,
        )
        self.session = aiohttp.ClientSession(connector=connector, timeout=timeout)
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
        if self.session:
            await self.session.close()

Step 2 — Chunk the feed and dispatch concurrently with per-task isolation. Raw supplier payloads rarely arrive in optimal sizes, so partition the URL list into fixed-size chunks that bound per-chunk memory, then fan each chunk out with asyncio.gather(..., return_exceptions=True). The return_exceptions=True flag is load-bearing: without it, the first failed endpoint cancels every sibling coroutine and you lose the whole chunk over one bad supplier.

PYTHON

class AsyncBatchIngestor:
    # Continued — constructor and __aenter__/__aexit__ omitted for brevity.

    async def fetch_feed_batch(self, urls: List[str]) -> List[Dict[str, Any]]:
        """Chunk the feed and dispatch each chunk concurrently under the semaphore."""
        chunks = [
            urls[i:i + self.config.batch_size]
            for i in range(0, len(urls), self.config.batch_size)
        ]
        all_successful: List[Dict[str, Any]] = []

        for chunk in chunks:
            tasks = [self._fetch_single(url) for url in chunk]
            results = await asyncio.gather(*tasks, return_exceptions=True)
            for url, result in zip(chunk, results):
                if isinstance(result, Exception):
                    logger.error("batch_fetch_failed url=%s err=%s", url, result)
                    continue
                if result is not None:
                    all_successful.append(result)

        logger.info("batch_complete ok=%d of=%d", len(all_successful), len(urls))
        return all_successful

Step 3 — Cap concurrency and retry with jittered exponential backoff. Every fetch acquires a semaphore slot before opening a socket, so simultaneous connections never exceed max_concurrency. Transient 5xx responses and timeouts retry with an exponential delay plus a deterministic per-URL jitter that spreads retries so a recovering supplier is not hammered by a synchronized thundering herd. The wait before retry attempt $n$ (zero-indexed) is

t_{\text{wait}} = b^{\,n} + \frac{\text{hash}(url) \bmod 1000}{1000}

where $b$ is backoff_base. The jitter term stays in [0, 1) seconds and is stable per URL, which keeps logs reproducible while still de-synchronizing the fleet.

PYTHON

class AsyncBatchIngestor:
    async def _fetch_single(self, url: str) -> Optional[Dict[str, Any]]:
        """Fetch one feed under the concurrency cap with jittered backoff retries."""
        assert self.session is not None
        async with self.semaphore:
            for attempt in range(self.config.retry_attempts):
                try:
                    # `async with` guarantees the body is consumed and the socket released.
                    async with self.session.get(url) as response:
                        response.raise_for_status()
                        return await response.json()
                except (aiohttp.ClientError, asyncio.TimeoutError) as exc:
                    wait = self.config.backoff_base ** attempt + (hash(url) % 1000) / 1000
                    logger.warning(
                        "fetch_retry url=%s attempt=%d err=%s wait=%.2fs",
                        url, attempt + 1, exc, wait,
                    )
                    await asyncio.sleep(wait)
            logger.critical("fetch_exhausted url=%s", url)
            return None

Step 4 — Hand off normalized payloads, never raw bytes. The ingestor is an acquisition stage only. Route tabular exports through Parsing CSV and Excel Feeds with Pandas, pass hierarchical EDI and supplier documents through XML to JSON Conversion with xmltodict, and enforce every record against a typed contract with Schema Validation Using Pydantic before anything reaches the matching engine. Wrap any blocking synchronous parsing (legacy XML, large dataframe joins) in loop.run_in_executor(None, sync_func) so it never starves the event loop.

Configuration Reference Permalink to this section

These parameters map directly to the FeedConfig dataclass above. Tier them per trading partner from a config table rather than hard-coding constants, because tier-1 distributor APIs and small supplier portals differ by orders of magnitude.

Parameter	Accepted values	Default	Notes
`max_concurrency`	3–100	50	Cap simultaneous sockets; size to the smallest downstream pool (DB, API quota)
`limit_per_host`	2–20	10	Match the supplier’s DNS-resolved IP count; too low and requests queue forever
`timeout_seconds`	30–60	30	Long enough for large ASN payloads, short enough to fail fast
`batch_size`	250–1000	100	Bound per-chunk memory after JSON/dataframe expansion
`retry_attempts`	2–5	3	Recover transient `5xx`/timeouts without hammering the partner
`backoff_base`	1.5–2.0	1.5	Base $b$ in the exponential backoff formula above
`TCPConnector.limit`	== `max_concurrency`	50	Must be ≥ the semaphore or coroutines block on the pool, not the network
`iter_chunked` threshold	stream if payload > 50 MB	50 MB	Stream to disk instead of buffering multi-year PO histories in memory

Start a brand-new supplier at max_concurrency of 3–5 and raise it only after observing clean responses with no 429s. For tier-1 APIs, compute the target from Little’s Law as described in Async Batch Processing for High-Volume Feeds, then cap that figure at your database connection-pool size.

Debugging & Recovery Permalink to this section

Asyncio’s non-deterministic execution ordering complicates traditional debugging, so triage by signal. When throughput degrades or the error rate exceeds 2%, walk this failure-reason taxonomy:

Event-loop blocking. Symptom: cascading timeouts across unrelated feeds while CPU sits low. Cause: a synchronous call (legacy XML parsing, decompression) blocks the single thread and starves the I/O multiplexer. Fix: offload it with loop.run_in_executor(None, sync_func) or a ProcessPoolExecutor.
Connection-pool exhaustion. Symptom: latency climbs while suppliers report normal response times. Monitor connector._acquired and connector._acquired_per_host; alert when sustained acquisition nears the connector limit. Fix: raise limit_per_host to match the endpoint’s IP distribution, or align the semaphore to the smallest pool.
Unclosed response leaks (CLOSE_WAIT). Symptom: OSError: [Errno 24] Too many open files after hours of runtime. Cause: a response body was never consumed or closed. Fix: always use async with self.session.get(...) as response: and read or release the body — never return an open response object.
Memory pressure / OOM. Symptom: RSS grows monotonically across a run, then the worker is OOM-killed mid-batch. Cause: large JSON payloads held in memory across asyncio.gather. Fix: stream bodies with response.content.iter_chunked(8192) and process in bounded windows; profile with tracemalloc or objgraph in staging first.
Trace correlation gaps. Symptom: concurrent coroutine logs are impossible to follow. Fix: inject a request_id header and propagate it via asyncio.current_task().get_name() so logs aggregate deterministically. See the asyncio task documentation for coroutine introspection.

Route every URL that exhausts its retries to a dead-letter queue (DLQ) keyed by batch_id so the run stays auditable and replayable. A sufficient audit record per failed feed is {batch_id, supplier_id, url, error_type, http_status, attempt, ts_utc}. The error_type field drives triage at a glance: a wave of ClientConnectorError signals a supplier outage (retry later), a wave of 429/ClientResponseError signals your own concurrency is too high (lower that supplier’s tier), and asyncio.TimeoutError clustered on one host signals a slow partner (raise that host’s timeout, not the global one). Because the ingestor is restartable and writes results keyed by batch_id, a failed run replays from its last checkpoint without re-pulling already-staged chunks.

FAQ Permalink to this section

Why does raising `max_concurrency` sometimes make ingestion slower? Permalink to this section

You have almost certainly pushed the semaphore above a downstream ceiling — usually TCPConnector.limit, limit_per_host, or the database pool. Coroutines then acquire a semaphore slot and immediately queue waiting for an actual socket, so the added “concurrency” turns into queueing latency that looks like a remote slowdown but is entirely self-inflicted. Bind TCPConnector.limit to max_concurrency and size both to the smallest real downstream pool.

How do I stop retries from forming a thundering herd against a recovering supplier? Permalink to this section

Add jitter to the backoff. The implementation uses a deterministic per-URL term, (hash(url) % 1000) / 1000, added to the exponential delay $b^{\,n}$ , which spreads retries across a one-second band so a fleet of coroutines does not re-hit the same endpoint in lockstep. The per-URL stability keeps the delay reproducible in logs while still de-synchronizing the load.

Should heavy parsing run inside the same coroutines that fetch? Permalink to this section

No. Coroutines share a single core, so CPU-bound work — large xmltodict trees, dataframe joins, decompression — blocks the event loop and stalls every other in-flight request. Keep the coroutines for I/O wait only and push transformation into a ProcessPoolExecutor via loop.run_in_executor, then hand the result to Schema Validation Using Pydantic.

Implementing Asyncio for Concurrent Batch Ingestion Permalink to this section#

Operational Trigger Signals Permalink to this section#

Step-by-Step Implementation Permalink to this section#

Configuration Reference Permalink to this section#

Debugging & Recovery Permalink to this section#

FAQ Permalink to this section#

Why does raising max_concurrency sometimes make ingestion slower? Permalink to this section#

How do I stop retries from forming a thundering herd against a recovering supplier? Permalink to this section#

Should heavy parsing run inside the same coroutines that fetch? Permalink to this section#

Related Permalink to this section#

Implementing Asyncio for Concurrent Batch Ingestion Permalink to this section

Operational Trigger Signals Permalink to this section

Step-by-Step Implementation Permalink to this section

Configuration Reference Permalink to this section

Debugging & Recovery Permalink to this section

FAQ Permalink to this section

Why does raising `max_concurrency` sometimes make ingestion slower? Permalink to this section

How do I stop retries from forming a thundering herd against a recovering supplier? Permalink to this section

Should heavy parsing run inside the same coroutines that fetch? Permalink to this section

Related Permalink to this section