Workflow Context

Collection APIs operate at the intersection of public access, institutional compliance, and digital preservation. Security boundaries here function as deterministic enforcement layers embedded directly into metadata ingestion and asset delivery pipelines. External aggregators and internal DAMs must never query raw database tables. Every request traverses a stateless validation gate that decouples public consumption from internal data models. This architecture anchors the Core Architecture & Collection Taxonomy framework.

Boundary Architecture & Data Flow

The enforcement layer executes a strict four-stage pipeline. First, it authenticates the consumer and resolves institutional rights tiers. Second, it validates structural compliance against predefined schemas. Third, it isolates sensitive provenance and restricted fields before serialization. Fourth, it routes payloads through IIIF-compliant delivery channels. This deterministic flow prevents schema drift and enforces granular access control. Engineers should map these stages to the Designing Museum Object Schemas guidelines to maintain consistency.

flowchart LR
    Req["Consumer request"] --> A["1 · Authenticate<br/>resolve rights tier"]
    A --> V["2 · Validate schema"]
    V --> F["3 · Field redaction<br/>tier intersection"]
    F --> D["4 · IIIF delivery"]
    A -.->|invalid token| X["HTTP 403"]

Implementation: Async Enforcement Layer

Production environments require non-blocking request handling to manage concurrent metadata fetches. The following Python 3.9+ implementation demonstrates an async boundary using httpx and pydantic. It batches requests, validates scopes, and routes payloads based on institutional rights tiers.

python
import asyncio
import httpx
from pydantic import BaseModel, Field, ValidationError
from typing import List, Optional

class CollectionRequest(BaseModel):
    record_id: str = Field(min_length=1, max_length=64)
    fields: List[str] = Field(default_factory=list)
    access_token: str
    idempotency_key: str

class RightsTier(BaseModel):
    tier: str = Field(pattern="^(public|restricted|embargoed|internal)$")
    max_resolution: Optional[int] = None
    allowed_fields: List[str]

class APIBoundary:
    def __init__(self, base_url: str, api_key: str,
                 token_tiers: Optional[dict[str, RightsTier]] = None) -> None:
        self.client = httpx.AsyncClient(base_url=base_url, timeout=10.0)
        self.api_key = api_key
        # Seeded from OAuth2 introspection / JWT validation in production.
        self._rights_cache: dict[str, RightsTier] = dict(token_tiers or {})

    async def validate_token(self, token: str) -> RightsTier:
        # Fail closed: only tokens with a resolved rights tier are admitted.
        # Production wiring populates the cache via OAuth2 introspection or
        # JWT verification; an unknown token is rejected, never defaulted.
        tier = self._rights_cache.get(token)
        if tier is None:
            raise PermissionError("Invalid or unrecognized access token")
        return tier

    async def _process_single(self, req: CollectionRequest, tier: RightsTier) -> dict:
        allowed = [f for f in req.fields if f in tier.allowed_fields]
        payload = {"ids": [req.record_id], "fields": allowed}
        resp = await self.client.post(
            "/internal/fetch", json=payload,
            headers={"Authorization": f"Bearer {self.api_key}"}
        )
        resp.raise_for_status()
        return resp.json()

    async def batch_process(self, requests: List[CollectionRequest], batch_size: int = 10) -> List[dict]:
        semaphore = asyncio.Semaphore(batch_size)
        async def _bounded(req: CollectionRequest) -> dict:
            async with semaphore:
                tier = await self.validate_token(req.access_token)
                return await self._process_single(req, tier)
        return await asyncio.gather(*[_bounded(r) for r in requests])

    async def aclose(self) -> None:
        await self.client.aclose()

Rights Routing & Field Redaction

The boundary layer must enforce strict field-level filtering before data leaves the internal network. Sensitive provenance, donor restrictions, and conservation notes require explicit tier mapping. The implementation applies an intersection filter that strips unauthorized fields at the serialization boundary. This prevents accidental exposure during bulk harvesting or cross-institutional syncs. Mapping these restrictions aligns directly with Mapping LIDO to Internal Databases protocols.

IIIF/LIDO Serialization Compliance

Output payloads must conform to the IIIF Presentation API Specification and LIDO v1.1 standards. The boundary layer injects @context declarations and normalizes controlled vocabulary URIs. Image delivery routes through IIIF Image API endpoints with resolution caps enforced by the max_resolution parameter. Rights metadata maps to cc:license or rightsStatement URIs per institutional policy. Validation failures trigger deterministic HTTP 403 responses with machine-readable error codes.

Operational Hardening

Rate limiting and idempotency checks prevent abuse during high-volume research queries. The idempotency_key field ensures safe retries without duplicating asset generation tasks. Cache invalidation must synchronize with DAM ingest pipelines to prevent stale rights states. Monitoring pipelines should track field-stripping ratios and token validation latency. These metrics feed directly into access control audits and compliance reporting. Refer to Python asyncio Documentation for advanced concurrency tuning.

Conclusion

The security boundary works because it fails closed: an unrecognized token raises a PermissionError before any internal fetch executes, and field-level intersection filtering strips disallowed fields at serialization rather than trusting callers to request only what they are entitled to. The max_resolution cap on the RightsTier model gives IIIF Image API delivery a hard ceiling that enforces donor restrictions without per-request logic.