CLIP™ — System Architecture (Architect's View)

1 · Architecture Philosophy

Four convictions that every design decision derives from — rooted in the sales planner's real-world quarterly workflow.

1.0 The Workflow We Replace

Sales planners repeat this cycle every quarter, for every active contract, across every territory. CLIP automates steps 1–4 completely and adds systematic intelligence to step 5 (Make-It-Eligible).

#	Manual Step	Dimensions Checked	CLIP Replacement
1	Re-read contract, identify rules, check amendments	Per contract, per quarter	Engine 1 — Agentic RAG extraction (minutes)
2	Match titles against rules (spreadsheets)	Title × category × term year	Engine 2 — deterministic evaluator (<2s for 12K titles)
3	Confirm rights: legal, exclusivity, holdbacks	Title × territory × language × media × license × window	Engine 2 — RMS integration (Rightsline / FilmTrack)
4	Calculate fees, MG, commissions per term	Rate card tiers × escalators × term years	Engine 2 — deterministic rate evaluator
5	Find near-miss titles, explore levers (ad-hoc)	What-if across BO, dates, screens, cast	Engine 2 — Agentic MKE playbook (systematic)
6	Book revenue → ERP / Salesforce	Per title per term	Integration layer — auto-push

Key insight: Eligibility is checked across six dimensions simultaneously — Title × Territory × Language × Media Type (AVOD/SVOD/Pay-TV/Free-TV) × License Type (exclusive/non-exclusive/conditional) × Window (start→end). A title can be eligible for SVOD-Exclusive in Germany but blocked for Pay-TV in the same territory due to a prior sale. The architecture must evaluate every combination.

1.1 Core Convictions

Conviction 1

Two Engines, One Intelligence Loop

Engine 1 (contract extraction) and Engine 2 (catalog aggregation) are separate bounded contexts joined by a well-defined data contract. The moat is the loop: Engine 2 grounds Engine 1; Engine 1 feeds Engine 2. Neither engine alone delivers the business value.

Conviction 2

Rules Are Data, Not Code

Every contract rule is stored as declarative JSON — nested AND/OR trees for qualifiers, date formulas for windows, typed caps, tiered rate cards. A tiny generic evaluator processes them all. Adding a new attribute or operator never requires a code deployment.

Conviction 3

The Catalog Is Alive

Contract rules are fixed at extraction time. The title catalog grows daily — new releases, updated box office, cast changes. Engine 2 must treat the catalog as a continuously-changing dataset and re-evaluate eligibility on every change.

Conviction 4

Deterministic Core, Agentic Shell

The rule evaluator is deterministic and testable — same inputs, same outputs, always. The LLM/agentic layers sit around the core: upstream (extraction), downstream (pattern analysis, lever generation), never inside the evaluation loop itself.

2 · Overall System Architecture

Three domain boundaries. Four data contracts. One shared platform.

Domain Ownership Matrix

Domain	Owns	Publishes	Consumes
Contract Extraction	Contracts, terms, categories, rules, audit log, PDFs	`rules.json` · `contract:confirmed` event	Attribute vocabulary from Catalog domain
Title Catalog	Canonical title records, multi-territory attributes, release dates	`catalog.json` · attribute vocabulary · `catalog:changed` event	External metadata (IMDb, TMDb, BO Mojo)
Eligibility & Insights	Per-title/term verdicts, revenue projections, MKE levers	`engine:results` event · dashboard data · API	Rules from D1, catalog from D2, rights from RMS

3 · Module 1 — Deep Dive

Contract Rule Extraction — from PDF to executable JSON in minutes.

3.1 Service Decomposition

Service

Contract API

REST API for the 5-step wizard. Accepts uploads, serves extracted data, persists confirmations. Stateless; backed by PostgreSQL.

Service

Ingestion Worker

Pulls from SQS. OCR (Textract) → semantic chunking → embedding → vector store. Writes extraction status. Auto-scales 0–8 on queue depth.

Service

Extraction Agent

Agentic RAG (LangGraph). Plans retrieval across term years. Dispatches to 5 specialist LLMs in parallel. Verifies against catalog vocabulary. Writes structured rule JSON + confidence scores.

3.2 The Specialist LLM Cluster

"A single LLM cannot hold a 150-page contract coherently while extracting cross-segment dependencies. Five specialists — each fine-tuned for one segment — run in parallel and are assembled by the segment router."

3.3 Data Model — Key Entities

Entity	Primary Key	Key Columns	Notes
`contract`	contractId (UUID)	name, number, effectiveDate, licensee, licensor, territory, status	Status lifecycle: PENDING → PROCESSING → EXTRACTED → CONFIRMED → SUBMITTED
`contract_term_years`	termId	contractId FK, start, end, type, number	No overlap validation. No window-type concept on term.
`contract_title_categories`	categoryId	contractId FK, name, pageRef, sectionRef, confidence	Stable ID — rename doesn't break rule associations.
`contract_qualifier_rules`	ruleId	categoryId FK, sectionRuleJson (JSONB)	Recursive AND/OR tree. Unlimited nesting.
`contract_startdate_rules`	ruleId	categoryId FK, windowRuleJson (JSONB)	Earlier-of, Later-of, fallback chains, $HOME_OFFICE.
`contract_cap_rules`	capId	categoryId FK, type, value, scope, termRef	PER_TITLE / PER_TERM / PER_CATEGORY
`contract_ratecard_rules`	rateId	categoryId FK, type, value, currency, tiers (JSONB)	Tiered breakpoints for graduated pricing.
`contract_audit_log`	auditId	contractId, entityType, field, aiValue, userValue, changedBy	Append-only. Every edit tracked.

4 · Module 2 — Deep Dive

Catalog Aggregator & Insights — from rules × titles to revenue.

4.1 The Two-Layer Design

Decision: Separate the deterministic core (rule evaluation — pure, testable, O(T×C×N×L)) from the agentic shell (pattern analysis, lever generation — LLM-powered, probabilistic). The core runs in <1s for 12K titles. The shell runs on-demand for near-miss titles only.

Layer 1

Deterministic Evaluator

evalQualifier(tree, title, term) — recursive AND/OR walker with leaf-level trace
evalWindow(formula, title) — EARLIEST_OF / LATEST_OF / fixed offset + fallback cascade
evalCaps(state, category, term) — stateful counter per (category × termYear)
evalRate(tiers, title, term) — tiered lookup with per-term escalator
Per-term sweep: for every title × category × term → {Q, W, C, R, passes, fee, eligible}

Node.js / TypeScript (standalone or Next.js API routes). No LLM. Same inputs → same outputs, always. Headless smoke-testable.

Layer 2

Agentic Insight Layer

Pattern Analyzer — clusters near-miss titles, spots recurring disqualifiers ("12% of catalog fails only on US screens")
Eligibility Reasoner — what-if simulation: "if BO +$6M → eligible", "if screens +200 → eligible"
MKE Lever Generator — ranked, per-title, per-term levers with expected revenue impact
Insight Narrator — LLM-powered plain-English explanations for each verdict, suitable for planner dashboards

Python (LangGraph) + GPT-4o. Runs on-demand. Results cached in Redis.

4.1b Engine 2 — End-to-End Flow

4.2 Output Schema (Canonical)

{
  "titleId": "T-0004",
  "title": "FRONTIER QUEST",
  "bestCategory": "feature-films",
  "bucket": "eligible",
  "perTerm": [
    { "term":1, "year":2026, "passes":4, "eligible":true,
      "Q":{"pass":true}, "W":{"pass":true,"start":"2026-06-17","end":"2027-03-17"},
      "C":{"pass":true,"used":1,"cap":25},
      "R":{"pass":true,"tier":"B","fee":2750000,"mg":900000} },
    { "term":2, "year":2027, "passes":4, "eligible":true,
      "R":{"pass":true,"tier":"B","fee":2970000} },
    { "term":3, "year":2028, "passes":3, "eligible":false,
      "W":{"pass":false,"reason":"window doesn't overlap T3"} }
  ],
  "totalRevenue": 5720000,
  "levers": []
}
  

4.3 The Four Buckets

Bucket	Condition	Dashboard Colour	Business Action
Eligible	4/4 passes in ≥1 term	● Teal	Book revenue now
Conditional	2/4 with Q+R pass, blocked by cap or rights	● Cyan	Negotiate cap amendment or wait for rights release
Forecast	≥2/4 with W as only fail	● Blue	Future term-year revenue — pipeline visibility
Make-It-Eligible	3/4 — one lever away	● Gold	Execute the recommended lever (boost BO, add cast, shift date)

5 · Integration Architecture

CLIP™ is not a walled garden. Every insight flows into the tools sales and finance teams already use.

Inbound

Catalog Feeds

IMDb, TMDb (REST) + Aurora internal theatrical / HE / SVOD / Linear systems. EventBridge catalog:changed triggers Engine 2 re-run for affected titles only.

Inbound

Rights Management

Two-way adapter to Rightsline / FilmTrack / ERP RMS. Reads active licence grants; writes back new commitments. Detects "already-sold" conflicts before surfacing eligibility.

Outbound

Salesforce

Eligible titles auto-create Opportunities. MKE levers sync as Tasks on the Opportunity owner. Win/loss signals flow back to retrain the Pattern Analyzer.

Outbound

Finance (ERP)

Bookable revenue posts to AR/forecasting. Rate-card values populate contract line items. Cap consumption reported to deal controllers.

6 · Key Architecture Decisions

The decisions that shape the system — each with the trade-off considered.

ADR-1

PostgreSQL JSONB for Rule Storage

Decision: Store rule trees as JSONB in PostgreSQL instead of a normalised relational schema.

Rationale: Rule trees are recursive and variably-shaped. JSONB preserves the tree structure natively; GIN indexes enable clause-level queries. Normalising would require a complex adjacency-list schema for unlimited nesting.

Trade-off: Schema enforcement is application-side, not DB-side. Mitigated by JSON Schema validation on write.

ADR-2

Separate LLMs per Segment

Decision: Five specialist LLMs (Q, W, C, R, Amendment) instead of one generalist model.

Rationale: Each segment has distinct input patterns (tables for rate cards, date formulas for windows, AND/OR trees for qualifiers). Fine-tuned specialists achieve ≥94% accuracy vs ~68% for a single-LLM baseline.

Trade-off: Higher operational complexity (5 model versions to manage). Mitigated by a shared agent framework (LangGraph) and unified prompt templates.

ADR-3

Deterministic Evaluator Separate from LLM

Decision: The rule evaluator is pure deterministic code — no LLM in the evaluation loop.

Rationale: Eligibility verdicts must be reproducible and auditable. An LLM in the loop would make results non-deterministic, breaking audit and financial reconciliation.

Trade-off: LLM intelligence is limited to extraction (upstream) and insight narration (downstream).

ADR-4

Event-Driven Engine-to-UI Communication

Decision: Engine emits CustomEvent('engine:results') (browser) / SNS engine:results (server). UI subscribes. No callbacks.

Rationale: Adding a new chart, integration, or downstream consumer never changes the engine signature. Loose coupling = safe extension.

Trade-off: Event ordering requires idempotent consumers. Mitigated by including a monotonic version in each event payload.

ADR-5

Catalog Grounds the LLM

Decision: Engine 2 publishes a list of valid attribute names; Engine 1's LLM must map into that vocabulary.

Rationale: Prevents hallucinated field names (e.g., the LLM inventing grossRevenue when the catalog only has usBoxOffice). The feedback loop is the core of CLIP's accuracy claim.

Trade-off: New attributes require master-data registration before the LLM can use them.

ADR-6

Monorepo SPA, Feature-Flagged Modules

Decision: One Angular 19 or Next.js 15 application for Module 1, Module 2, and admin. Modules are lazy-loaded routes behind feature flags.

Rationale: Shared design system (light theme, Fraunces + Plus Jakarta Sans + Unbounded), shared auth, shared state patterns. Separate SPAs would diverge visually and operationally. Angular offers SPE-standard tooling; Next.js offers SSR, React Server Components, and a broader ecosystem. Either framework supports lazy loading and tree-shaking.

Trade-off: Larger initial bundle if not code-split. Mitigated by Angular lazy loading + Vite tree-shaking, or Next.js automatic code-splitting + RSC streaming.

7 · Non-Functional Requirements

Category	Requirement	Target
Performance	Contract extraction (80-page PDF)	≤ 120 seconds end-to-end
	Eligibility engine run (12K titles × 4 cats × 4 terms)	≤ 2 seconds
	Dashboard page load (cold)	≤ 1.5 seconds
Scalability	Concurrent extraction jobs	50 simultaneous contracts
	Title catalog size	100K titles without engine redesign
Availability	Uptime SLA (production)	99.9% (8.7 hrs/yr downtime)
	RTO / RPO	RTO 30 min / RPO 1 min (Multi-AZ RDS)
Security	Auth	Okta OIDC SSO, RBAC, MFA
	Encryption	TLS 1.3 in transit, AES-256 at rest (KMS)
	Data residency	US-West-2 primary (SPE standard)
Auditability	Rule edit trail	100% — every AI vs human value logged
	Eligibility verdicts	Full trace per leaf — rule ID, attribute value, operator truth
Accuracy	Engine 1 rule-level extraction	≥ 94% (vs 68% single-LLM baseline)
	Engine 2 eligibility verdicts	100% deterministic (same inputs → same output)

System Architecture

1 · Architecture Philosophy

1.0 The Workflow We Replace

1.1 Core Convictions

Two Engines, One Intelligence Loop

Rules Are Data, Not Code

The Catalog Is Alive

Deterministic Core, Agentic Shell

2 · Overall System Architecture

Domain Ownership Matrix

3 · Module 1 — Deep Dive

3.1 Service Decomposition

Contract API

Ingestion Worker

Extraction Agent

3.2 The Specialist LLM Cluster

3.3 Data Model — Key Entities

4 · Module 2 — Deep Dive

4.1 The Two-Layer Design

Deterministic Evaluator

Agentic Insight Layer

4.1b Engine 2 — End-to-End Flow

4.2 Output Schema (Canonical)

4.3 The Four Buckets

5 · Integration Architecture

Catalog Feeds

Rights Management

Salesforce

Finance (ERP)

6 · Key Architecture Decisions

PostgreSQL JSONB for Rule Storage

Separate LLMs per Segment

Deterministic Evaluator Separate from LLM

Event-Driven Engine-to-UI Communication

Catalog Grounds the LLM

Monorepo SPA, Feature-Flagged Modules

7 · Non-Functional Requirements

8 · Platform Evolution Roadmap

Architecture Extensibility Points

Amendment Engine

Forecast Service

Negotiate Agent