CLIP™ Executive Deck Pitch Case Study Workshop Architecture Technical PRD

System Architecture

The architect's blueprint — domain boundaries, service topology, data ownership, and the design decisions that shape CLIP™ at enterprise scale.

Principal Architect's Perspective

1 · Architecture Philosophy

Four convictions that every design decision derives from — rooted in the sales planner's real-world quarterly workflow.

1.0 The Workflow We Replace

Sales planners repeat this cycle every quarter, for every active contract, across every territory. CLIP automates steps 1–4 completely and adds systematic intelligence to step 5 (Make-It-Eligible).

#Manual StepDimensions CheckedCLIP Replacement
1Re-read contract, identify rules, check amendmentsPer contract, per quarterEngine 1 — Agentic RAG extraction (minutes)
2Match titles against rules (spreadsheets)Title × category × term yearEngine 2 — deterministic evaluator (<2s for 12K titles)
3Confirm rights: legal, exclusivity, holdbacksTitle × territory × language × media × license × windowEngine 2 — RMS integration (Rightsline / FilmTrack)
4Calculate fees, MG, commissions per termRate card tiers × escalators × term yearsEngine 2 — deterministic rate evaluator
5Find near-miss titles, explore levers (ad-hoc)What-if across BO, dates, screens, castEngine 2 — Agentic MKE playbook (systematic)
6Book revenue → ERP / SalesforcePer title per termIntegration layer — auto-push
MANUAL (today) — every quarter × every contract × every territory ① Read Contract Re-read 80+ pg Check amendments 2–3 weeks ② Match Titles Excel cross-ref Genre, BO, year Titles missed ③ Confirm Rights 6 dimensions check Title×Terr×Lang×Media Multiple systems ④ Calc Fees Tiered rate cards MG + commissions Error-prone ⑤ MKE (ad-hoc) Near-miss hunt Tweak BO / dates Intuition only ⑥ Book ERP / Salesforce Lag = lost $ ⬇ CLIP™ replaces this with ⬇ CLIP™ (automated) — continuous, real-time, every combination evaluated Engine 1 Contract → Rules (minutes) Steps ① replaces Engine 2 · Core Rules × Catalog × Rights Steps ②③④ in <2s Engine 2 · MKE Systematic near-miss levers Step ⑤ (intelligent) Integrations ERP · SF · BI auto-push Step ⑥ (zero lag) 94%+ extraction accuracy 15% revenue recovered Minutes, not weeks New revenue via MKE
Key insight: Eligibility is checked across six dimensions simultaneously — Title × Territory × Language × Media Type (AVOD/SVOD/Pay-TV/Free-TV) × License Type (exclusive/non-exclusive/conditional) × Window (start→end). A title can be eligible for SVOD-Exclusive in Germany but blocked for Pay-TV in the same territory due to a prior sale. The architecture must evaluate every combination.

1.1 Core Convictions

Conviction 1

Two Engines, One Intelligence Loop

Engine 1 (contract extraction) and Engine 2 (catalog aggregation) are separate bounded contexts joined by a well-defined data contract. The moat is the loop: Engine 2 grounds Engine 1; Engine 1 feeds Engine 2. Neither engine alone delivers the business value.

Conviction 2

Rules Are Data, Not Code

Every contract rule is stored as declarative JSON — nested AND/OR trees for qualifiers, date formulas for windows, typed caps, tiered rate cards. A tiny generic evaluator processes them all. Adding a new attribute or operator never requires a code deployment.

Conviction 3

The Catalog Is Alive

Contract rules are fixed at extraction time. The title catalog grows daily — new releases, updated box office, cast changes. Engine 2 must treat the catalog as a continuously-changing dataset and re-evaluate eligibility on every change.

Conviction 4

Deterministic Core, Agentic Shell

The rule evaluator is deterministic and testable — same inputs, same outputs, always. The LLM/agentic layers sit around the core: upstream (extraction), downstream (pattern analysis, lever generation), never inside the evaluation loop itself.


2 · Overall System Architecture

Three domain boundaries. Four data contracts. One shared platform.

Domain 1 · Contract Extraction Bounded Context: Contract → Structured Rules Ingestion Service Upload · OCR · Chunk · Embed Extraction Service Agentic RAG · LLM Cluster Verification Service Catalog-grounded · Confidence Review & Confirm API 5-step wizard · Audit trail Contract Rules DB (PostgreSQL · JSONB) — owns contracts, terms, categories, rules, audit S3 · PDF Object Store Vector Store (PGVector) Domain 2 · Title Catalog Bounded Context: External + Internal → Canonical Catalog Catalog Aggregator IMDb · TMDb · Internal feeds Attribute Normaliser CLIP Start-Date Reference Catalog DB — canonical title records, multi-territory attributes Domain 3 · Eligibility & Insights Bounded Context: Rules × Catalog → Verdicts + Revenue + Playbooks Deterministic Evaluator Q · W · C · R per (T,Cat,Term) Agentic Insight Layer Pattern · Reasoner · Narrator Rights Check Rightsline · FilmTrack Dashboard & Export Timeline · Gantt · CSV Results DB — per-title, per-term eligibility + revenue + levers Redis Cache + Pub/Sub SNS · engine:results events rules.json catalog attribute vocabulary Integration Zone · Outbound Salesforce Opportunity sync ERP AR / financial Tableau / BI Dashboards RMS Rights check REST / GraphQL API consumers Webhooks State-change push Unified SPA — Angular 19 / Next.js 15 Module 1 wizard · Module 2 dashboard · Admin console — one codebase, feature-flagged Okta SSO · OIDC · RBAC (Planner · Manager · Admin · API-Consumer)

Domain Ownership Matrix

DomainOwnsPublishesConsumes
Contract ExtractionContracts, terms, categories, rules, audit log, PDFsrules.json · contract:confirmed eventAttribute vocabulary from Catalog domain
Title CatalogCanonical title records, multi-territory attributes, release datescatalog.json · attribute vocabulary · catalog:changed eventExternal metadata (IMDb, TMDb, BO Mojo)
Eligibility & InsightsPer-title/term verdicts, revenue projections, MKE leversengine:results event · dashboard data · APIRules from D1, catalog from D2, rights from RMS

3 · Module 1 — Deep Dive

Contract Rule Extraction — from PDF to executable JSON in minutes.

3.1 Service Decomposition

Service

Contract API

REST API for the 5-step wizard. Accepts uploads, serves extracted data, persists confirmations. Stateless; backed by PostgreSQL.

Service

Ingestion Worker

Pulls from SQS. OCR (Textract) → semantic chunking → embedding → vector store. Writes extraction status. Auto-scales 0–8 on queue depth.

Service

Extraction Agent

Agentic RAG (LangGraph). Plans retrieval across term years. Dispatches to 5 specialist LLMs in parallel. Verifies against catalog vocabulary. Writes structured rule JSON + confidence scores.

3.2 The Specialist LLM Cluster

"A single LLM cannot hold a 150-page contract coherently while extracting cross-segment dependencies. Five specialists — each fine-tuned for one segment — run in parallel and are assembled by the segment router."
Planner Identifies segments per product category Qualifier LLM Start Date LLM Caps LLM Rate Card LLM Amendment Diff LLM Segment Router → Product Category Verifier Catalog-grounded

3.3 Data Model — Key Entities

EntityPrimary KeyKey ColumnsNotes
contractcontractId (UUID)name, number, effectiveDate, licensee, licensor, territory, statusStatus lifecycle: PENDING → PROCESSING → EXTRACTED → CONFIRMED → SUBMITTED
contract_term_yearstermIdcontractId FK, start, end, type, numberNo overlap validation. No window-type concept on term.
contract_title_categoriescategoryIdcontractId FK, name, pageRef, sectionRef, confidenceStable ID — rename doesn't break rule associations.
contract_qualifier_rulesruleIdcategoryId FK, sectionRuleJson (JSONB)Recursive AND/OR tree. Unlimited nesting.
contract_startdate_rulesruleIdcategoryId FK, windowRuleJson (JSONB)Earlier-of, Later-of, fallback chains, $HOME_OFFICE.
contract_cap_rulescapIdcategoryId FK, type, value, scope, termRefPER_TITLE / PER_TERM / PER_CATEGORY
contract_ratecard_rulesrateIdcategoryId FK, type, value, currency, tiers (JSONB)Tiered breakpoints for graduated pricing.
contract_audit_logauditIdcontractId, entityType, field, aiValue, userValue, changedByAppend-only. Every edit tracked.

4 · Module 2 — Deep Dive

Catalog Aggregator & Insights — from rules × titles to revenue.

4.1 The Two-Layer Design

Decision: Separate the deterministic core (rule evaluation — pure, testable, O(T×C×N×L)) from the agentic shell (pattern analysis, lever generation — LLM-powered, probabilistic). The core runs in <1s for 12K titles. The shell runs on-demand for near-miss titles only.
Layer 1

Deterministic Evaluator

  • evalQualifier(tree, title, term) — recursive AND/OR walker with leaf-level trace
  • evalWindow(formula, title) — EARLIEST_OF / LATEST_OF / fixed offset + fallback cascade
  • evalCaps(state, category, term) — stateful counter per (category × termYear)
  • evalRate(tiers, title, term) — tiered lookup with per-term escalator
  • Per-term sweep: for every title × category × term → {Q, W, C, R, passes, fee, eligible}

Node.js / TypeScript (standalone or Next.js API routes). No LLM. Same inputs → same outputs, always. Headless smoke-testable.

Layer 2

Agentic Insight Layer

  • Pattern Analyzer — clusters near-miss titles, spots recurring disqualifiers ("12% of catalog fails only on US screens")
  • Eligibility Reasoner — what-if simulation: "if BO +$6M → eligible", "if screens +200 → eligible"
  • MKE Lever Generator — ranked, per-title, per-term levers with expected revenue impact
  • Insight Narrator — LLM-powered plain-English explanations for each verdict, suitable for planner dashboards

Python (LangGraph) + GPT-4o. Runs on-demand. Results cached in Redis.

4.1b Engine 2 — End-to-End Flow

INPUTS rules.json From Engine 1 Q · W · C · R per category Title Catalog AtlasMock · IMDb · TMDb 12K+ titles × attributes Rights Status Rightsline / FilmTrack Layer 1 · Deterministic Core (Node.js / TypeScript) evalQualifier() AND/OR tree walk · leaf trace evalWindow() EARLIEST_OF · fallback cascade evalCaps() Stateful counter per cat×term evalRate() Tiered lookup · escalators Per-Term Sweep ∀ title × category × term → {Q, W, C, R, passes, fee, eligible} Segmentation (4 Buckets) ✅ Eligible ⏳ Conditional 🔮 Forecast 🚀 Make-It-Eligible Layer 2 · Agentic Insight Layer (Python + GPT-4o) near-miss titles Pattern Analyzer Cluster near-misses Eligibility Reasoner What-if · Booster sims MKE Lever Generator Ranked levers + $ impact Insight Narrator Plain-English verdicts OUTPUTS 📊 Dashboard + Timeline Multi-term-year visual 📋 Per-Title Verdicts 6-D eligibility trace 💰 Revenue Projections Fee calc per term year 🚀 MKE Playbooks Ranked levers + $ impact 🔗 SF · ERP · BI · RMS Auto-push integrations

4.2 Output Schema (Canonical)

{ "titleId": "T-0004", "title": "FRONTIER QUEST", "bestCategory": "feature-films", "bucket": "eligible", "perTerm": [ { "term":1, "year":2026, "passes":4, "eligible":true, "Q":{"pass":true}, "W":{"pass":true,"start":"2026-06-17","end":"2027-03-17"}, "C":{"pass":true,"used":1,"cap":25}, "R":{"pass":true,"tier":"B","fee":2750000,"mg":900000} }, { "term":2, "year":2027, "passes":4, "eligible":true, "R":{"pass":true,"tier":"B","fee":2970000} }, { "term":3, "year":2028, "passes":3, "eligible":false, "W":{"pass":false,"reason":"window doesn't overlap T3"} } ], "totalRevenue": 5720000, "levers": [] }

4.3 The Four Buckets

BucketConditionDashboard ColourBusiness Action
Eligible4/4 passes in ≥1 term● TealBook revenue now
Conditional2/4 with Q+R pass, blocked by cap or rights● CyanNegotiate cap amendment or wait for rights release
Forecast≥2/4 with W as only fail● BlueFuture term-year revenue — pipeline visibility
Make-It-Eligible3/4 — one lever away● GoldExecute the recommended lever (boost BO, add cast, shift date)

5 · Integration Architecture

CLIP™ is not a walled garden. Every insight flows into the tools sales and finance teams already use.

CLIP™ Engine 2 Results API + Event Bus Salesforce Opportunities · Tasks ERP AR · Contract lines Tableau / Power BI Direct connector RMS Rightsline · FilmTrack REST / GraphQL API Versioned JSON Webhooks / Export CSV · Excel · PDF
Inbound

Catalog Feeds

IMDb, TMDb (REST) + Aurora internal theatrical / HE / SVOD / Linear systems. EventBridge catalog:changed triggers Engine 2 re-run for affected titles only.

Inbound

Rights Management

Two-way adapter to Rightsline / FilmTrack / ERP RMS. Reads active licence grants; writes back new commitments. Detects "already-sold" conflicts before surfacing eligibility.

Outbound

Salesforce

Eligible titles auto-create Opportunities. MKE levers sync as Tasks on the Opportunity owner. Win/loss signals flow back to retrain the Pattern Analyzer.

Outbound

Finance (ERP)

Bookable revenue posts to AR/forecasting. Rate-card values populate contract line items. Cap consumption reported to deal controllers.


6 · Key Architecture Decisions

The decisions that shape the system — each with the trade-off considered.

ADR-1

PostgreSQL JSONB for Rule Storage

Decision: Store rule trees as JSONB in PostgreSQL instead of a normalised relational schema.

Rationale: Rule trees are recursive and variably-shaped. JSONB preserves the tree structure natively; GIN indexes enable clause-level queries. Normalising would require a complex adjacency-list schema for unlimited nesting.

Trade-off: Schema enforcement is application-side, not DB-side. Mitigated by JSON Schema validation on write.

ADR-2

Separate LLMs per Segment

Decision: Five specialist LLMs (Q, W, C, R, Amendment) instead of one generalist model.

Rationale: Each segment has distinct input patterns (tables for rate cards, date formulas for windows, AND/OR trees for qualifiers). Fine-tuned specialists achieve ≥94% accuracy vs ~68% for a single-LLM baseline.

Trade-off: Higher operational complexity (5 model versions to manage). Mitigated by a shared agent framework (LangGraph) and unified prompt templates.

ADR-3

Deterministic Evaluator Separate from LLM

Decision: The rule evaluator is pure deterministic code — no LLM in the evaluation loop.

Rationale: Eligibility verdicts must be reproducible and auditable. An LLM in the loop would make results non-deterministic, breaking audit and financial reconciliation.

Trade-off: LLM intelligence is limited to extraction (upstream) and insight narration (downstream).

ADR-4

Event-Driven Engine-to-UI Communication

Decision: Engine emits CustomEvent('engine:results') (browser) / SNS engine:results (server). UI subscribes. No callbacks.

Rationale: Adding a new chart, integration, or downstream consumer never changes the engine signature. Loose coupling = safe extension.

Trade-off: Event ordering requires idempotent consumers. Mitigated by including a monotonic version in each event payload.

ADR-5

Catalog Grounds the LLM

Decision: Engine 2 publishes a list of valid attribute names; Engine 1's LLM must map into that vocabulary.

Rationale: Prevents hallucinated field names (e.g., the LLM inventing grossRevenue when the catalog only has usBoxOffice). The feedback loop is the core of CLIP's accuracy claim.

Trade-off: New attributes require master-data registration before the LLM can use them.

ADR-6

Monorepo SPA, Feature-Flagged Modules

Decision: One Angular 19 or Next.js 15 application for Module 1, Module 2, and admin. Modules are lazy-loaded routes behind feature flags.

Rationale: Shared design system (light theme, Fraunces + Plus Jakarta Sans + Unbounded), shared auth, shared state patterns. Separate SPAs would diverge visually and operationally. Angular offers SPE-standard tooling; Next.js offers SSR, React Server Components, and a broader ecosystem. Either framework supports lazy loading and tree-shaking.

Trade-off: Larger initial bundle if not code-split. Mitigated by Angular lazy loading + Vite tree-shaking, or Next.js automatic code-splitting + RSC streaming.


7 · Non-Functional Requirements

CategoryRequirementTarget
PerformanceContract extraction (80-page PDF)≤ 120 seconds end-to-end
Eligibility engine run (12K titles × 4 cats × 4 terms)≤ 2 seconds
Dashboard page load (cold)≤ 1.5 seconds
ScalabilityConcurrent extraction jobs50 simultaneous contracts
Title catalog size100K titles without engine redesign
AvailabilityUptime SLA (production)99.9% (8.7 hrs/yr downtime)
RTO / RPORTO 30 min / RPO 1 min (Multi-AZ RDS)
SecurityAuthOkta OIDC SSO, RBAC, MFA
EncryptionTLS 1.3 in transit, AES-256 at rest (KMS)
Data residencyUS-West-2 primary (SPE standard)
AuditabilityRule edit trail100% — every AI vs human value logged
Eligibility verdictsFull trace per leaf — rule ID, attribute value, operator truth
AccuracyEngine 1 rule-level extraction≥ 94% (vs 68% single-LLM baseline)
Engine 2 eligibility verdicts100% deterministic (same inputs → same output)

8 · Platform Evolution Roadmap

Where the architecture goes next — from Module 1 + 2 to a full licensing intelligence platform.

1 Phase 1 CLIP Core Module 1 + Module 2 6 months 2 Phase 2 Amendments + RMS Version control, diff engine Rights integration 3 Phase 3 CLIP Forecast 12-mo pipeline, revenue sim SF + ERP connectors 4 Phase 4 CLIP Optimize AI deal recommendation Negotiation assistant 5 Phase 5 CLIP Intelligence Cloud Multi-studio, cross-territory Enterprise data layer WE ARE HERE

Architecture Extensibility Points

Phase 2

Amendment Engine

Published/Draft pattern versioning. Diff engine compares old vs new rule JSON. Effective-date activation per title.

Phase 3

Forecast Service

New service consuming Engine 2 results + historical win/loss. Produces 12-month revenue pipeline with confidence bands.

Phase 4

Negotiate Agent

LLM-powered agent that simulates deal terms. "If we offer 15 more screens, we unlock 3 Tier-A titles worth $7.5M."