The architect's blueprint — domain boundaries, service topology, data ownership, and the design decisions that shape CLIP™ at enterprise scale.
Principal Architect's PerspectiveFour convictions that every design decision derives from — rooted in the sales planner's real-world quarterly workflow.
Sales planners repeat this cycle every quarter, for every active contract, across every territory. CLIP automates steps 1–4 completely and adds systematic intelligence to step 5 (Make-It-Eligible).
| # | Manual Step | Dimensions Checked | CLIP Replacement |
|---|---|---|---|
| 1 | Re-read contract, identify rules, check amendments | Per contract, per quarter | Engine 1 — Agentic RAG extraction (minutes) |
| 2 | Match titles against rules (spreadsheets) | Title × category × term year | Engine 2 — deterministic evaluator (<2s for 12K titles) |
| 3 | Confirm rights: legal, exclusivity, holdbacks | Title × territory × language × media × license × window | Engine 2 — RMS integration (Rightsline / FilmTrack) |
| 4 | Calculate fees, MG, commissions per term | Rate card tiers × escalators × term years | Engine 2 — deterministic rate evaluator |
| 5 | Find near-miss titles, explore levers (ad-hoc) | What-if across BO, dates, screens, cast | Engine 2 — Agentic MKE playbook (systematic) |
| 6 | Book revenue → ERP / Salesforce | Per title per term | Integration layer — auto-push |
Engine 1 (contract extraction) and Engine 2 (catalog aggregation) are separate bounded contexts joined by a well-defined data contract. The moat is the loop: Engine 2 grounds Engine 1; Engine 1 feeds Engine 2. Neither engine alone delivers the business value.
Every contract rule is stored as declarative JSON — nested AND/OR trees for qualifiers, date formulas for windows, typed caps, tiered rate cards. A tiny generic evaluator processes them all. Adding a new attribute or operator never requires a code deployment.
Contract rules are fixed at extraction time. The title catalog grows daily — new releases, updated box office, cast changes. Engine 2 must treat the catalog as a continuously-changing dataset and re-evaluate eligibility on every change.
The rule evaluator is deterministic and testable — same inputs, same outputs, always. The LLM/agentic layers sit around the core: upstream (extraction), downstream (pattern analysis, lever generation), never inside the evaluation loop itself.
Three domain boundaries. Four data contracts. One shared platform.
| Domain | Owns | Publishes | Consumes |
|---|---|---|---|
| Contract Extraction | Contracts, terms, categories, rules, audit log, PDFs | rules.json · contract:confirmed event | Attribute vocabulary from Catalog domain |
| Title Catalog | Canonical title records, multi-territory attributes, release dates | catalog.json · attribute vocabulary · catalog:changed event | External metadata (IMDb, TMDb, BO Mojo) |
| Eligibility & Insights | Per-title/term verdicts, revenue projections, MKE levers | engine:results event · dashboard data · API | Rules from D1, catalog from D2, rights from RMS |
Contract Rule Extraction — from PDF to executable JSON in minutes.
REST API for the 5-step wizard. Accepts uploads, serves extracted data, persists confirmations. Stateless; backed by PostgreSQL.
Pulls from SQS. OCR (Textract) → semantic chunking → embedding → vector store. Writes extraction status. Auto-scales 0–8 on queue depth.
Agentic RAG (LangGraph). Plans retrieval across term years. Dispatches to 5 specialist LLMs in parallel. Verifies against catalog vocabulary. Writes structured rule JSON + confidence scores.
"A single LLM cannot hold a 150-page contract coherently while extracting cross-segment dependencies. Five specialists — each fine-tuned for one segment — run in parallel and are assembled by the segment router."
| Entity | Primary Key | Key Columns | Notes |
|---|---|---|---|
contract | contractId (UUID) | name, number, effectiveDate, licensee, licensor, territory, status | Status lifecycle: PENDING → PROCESSING → EXTRACTED → CONFIRMED → SUBMITTED |
contract_term_years | termId | contractId FK, start, end, type, number | No overlap validation. No window-type concept on term. |
contract_title_categories | categoryId | contractId FK, name, pageRef, sectionRef, confidence | Stable ID — rename doesn't break rule associations. |
contract_qualifier_rules | ruleId | categoryId FK, sectionRuleJson (JSONB) | Recursive AND/OR tree. Unlimited nesting. |
contract_startdate_rules | ruleId | categoryId FK, windowRuleJson (JSONB) | Earlier-of, Later-of, fallback chains, $HOME_OFFICE. |
contract_cap_rules | capId | categoryId FK, type, value, scope, termRef | PER_TITLE / PER_TERM / PER_CATEGORY |
contract_ratecard_rules | rateId | categoryId FK, type, value, currency, tiers (JSONB) | Tiered breakpoints for graduated pricing. |
contract_audit_log | auditId | contractId, entityType, field, aiValue, userValue, changedBy | Append-only. Every edit tracked. |
Catalog Aggregator & Insights — from rules × titles to revenue.
Node.js / TypeScript (standalone or Next.js API routes). No LLM. Same inputs → same outputs, always. Headless smoke-testable.
Python (LangGraph) + GPT-4o. Runs on-demand. Results cached in Redis.
| Bucket | Condition | Dashboard Colour | Business Action |
|---|---|---|---|
| Eligible | 4/4 passes in ≥1 term | ● Teal | Book revenue now |
| Conditional | 2/4 with Q+R pass, blocked by cap or rights | ● Cyan | Negotiate cap amendment or wait for rights release |
| Forecast | ≥2/4 with W as only fail | ● Blue | Future term-year revenue — pipeline visibility |
| Make-It-Eligible | 3/4 — one lever away | ● Gold | Execute the recommended lever (boost BO, add cast, shift date) |
CLIP™ is not a walled garden. Every insight flows into the tools sales and finance teams already use.
IMDb, TMDb (REST) + Aurora internal theatrical / HE / SVOD / Linear systems. EventBridge catalog:changed triggers Engine 2 re-run for affected titles only.
Two-way adapter to Rightsline / FilmTrack / ERP RMS. Reads active licence grants; writes back new commitments. Detects "already-sold" conflicts before surfacing eligibility.
Eligible titles auto-create Opportunities. MKE levers sync as Tasks on the Opportunity owner. Win/loss signals flow back to retrain the Pattern Analyzer.
Bookable revenue posts to AR/forecasting. Rate-card values populate contract line items. Cap consumption reported to deal controllers.
The decisions that shape the system — each with the trade-off considered.
Decision: Store rule trees as JSONB in PostgreSQL instead of a normalised relational schema.
Rationale: Rule trees are recursive and variably-shaped. JSONB preserves the tree structure natively; GIN indexes enable clause-level queries. Normalising would require a complex adjacency-list schema for unlimited nesting.
Trade-off: Schema enforcement is application-side, not DB-side. Mitigated by JSON Schema validation on write.
Decision: Five specialist LLMs (Q, W, C, R, Amendment) instead of one generalist model.
Rationale: Each segment has distinct input patterns (tables for rate cards, date formulas for windows, AND/OR trees for qualifiers). Fine-tuned specialists achieve ≥94% accuracy vs ~68% for a single-LLM baseline.
Trade-off: Higher operational complexity (5 model versions to manage). Mitigated by a shared agent framework (LangGraph) and unified prompt templates.
Decision: The rule evaluator is pure deterministic code — no LLM in the evaluation loop.
Rationale: Eligibility verdicts must be reproducible and auditable. An LLM in the loop would make results non-deterministic, breaking audit and financial reconciliation.
Trade-off: LLM intelligence is limited to extraction (upstream) and insight narration (downstream).
Decision: Engine emits CustomEvent('engine:results') (browser) / SNS engine:results (server). UI subscribes. No callbacks.
Rationale: Adding a new chart, integration, or downstream consumer never changes the engine signature. Loose coupling = safe extension.
Trade-off: Event ordering requires idempotent consumers. Mitigated by including a monotonic version in each event payload.
Decision: Engine 2 publishes a list of valid attribute names; Engine 1's LLM must map into that vocabulary.
Rationale: Prevents hallucinated field names (e.g., the LLM inventing grossRevenue when the catalog only has usBoxOffice). The feedback loop is the core of CLIP's accuracy claim.
Trade-off: New attributes require master-data registration before the LLM can use them.
Decision: One Angular 19 or Next.js 15 application for Module 1, Module 2, and admin. Modules are lazy-loaded routes behind feature flags.
Rationale: Shared design system (light theme, Fraunces + Plus Jakarta Sans + Unbounded), shared auth, shared state patterns. Separate SPAs would diverge visually and operationally. Angular offers SPE-standard tooling; Next.js offers SSR, React Server Components, and a broader ecosystem. Either framework supports lazy loading and tree-shaking.
Trade-off: Larger initial bundle if not code-split. Mitigated by Angular lazy loading + Vite tree-shaking, or Next.js automatic code-splitting + RSC streaming.
| Category | Requirement | Target |
|---|---|---|
| Performance | Contract extraction (80-page PDF) | ≤ 120 seconds end-to-end |
| Eligibility engine run (12K titles × 4 cats × 4 terms) | ≤ 2 seconds | |
| Dashboard page load (cold) | ≤ 1.5 seconds | |
| Scalability | Concurrent extraction jobs | 50 simultaneous contracts |
| Title catalog size | 100K titles without engine redesign | |
| Availability | Uptime SLA (production) | 99.9% (8.7 hrs/yr downtime) |
| RTO / RPO | RTO 30 min / RPO 1 min (Multi-AZ RDS) | |
| Security | Auth | Okta OIDC SSO, RBAC, MFA |
| Encryption | TLS 1.3 in transit, AES-256 at rest (KMS) | |
| Data residency | US-West-2 primary (SPE standard) | |
| Auditability | Rule edit trail | 100% — every AI vs human value logged |
| Eligibility verdicts | Full trace per leaf — rule ID, attribute value, operator truth | |
| Accuracy | Engine 1 rule-level extraction | ≥ 94% (vs 68% single-LLM baseline) |
| Engine 2 eligibility verdicts | 100% deterministic (same inputs → same output) |
Where the architecture goes next — from Module 1 + 2 to a full licensing intelligence platform.
Published/Draft pattern versioning. Diff engine compares old vs new rule JSON. Effective-date activation per title.
New service consuming Engine 2 results + historical win/loss. Produces 12-month revenue pipeline with confidence bands.
LLM-powered agent that simulates deal terms. "If we offer 15 more screens, we unlock 3 Tier-A titles worth $7.5M."