Line Item	Per Call	Calls/Mo	Monthly	% of Total
AssemblyAI Transcription	$0.08	55,000	$4,400	92.2%
GPT-4o-mini (Notos)	$0.003	55,000	$165	3.5%
GPT-4o (Call Analyzer)	$0.08	2,000	$160	3.4%
Infrastructure (Vercel + Supabase)	—	—	$45	0.9%
Total	$0.087		$4,770	100%

Self-Hosted Architecture

Lexington Capital — faster-whisper + pyannote on GPU cloud, everything else unchanged

$3,500

Monthly Revenue

$870

Monthly Cost

+$2,630

Monthly Net

75%

Gross Margin

Pipeline Flow — Self-Hosted Transcription

SOURCE

Kixie

endcall webhook

▶

INGEST

Vercel App

Accept + queue to DB

▶

TRANSCRIBE

GPU Worker

faster-whisper + pyannote

▶

ANALYZE

GPT-4o / 4o-mini

Notos + Call Analyzer

▶

DELIVER

Salesforce

Tasks + Notes + PDF

GPU Transcription Service

Worker Architecture

┌──────────────────────────────────────┐
│ RunPod Serverless / GPU Cloud      │
│                                      │
│ Supabase Queue                      │
│   │ poll for pending jobs              │
│   ▼                                    │
│ ■ Download MP3 from recording URL   │
│   │                                    │
│   ▼                                    │
│ ■ faster-whisper Large V3           │
│   ~18s for 12-min call (40x RT)      │
│   │                                    │
│   ▼                                    │
│ ■ pyannote diarization              │
│   (Call Analyzer only — who said what)│
│   │                                    │
│   ▼                                    │
│ ■ POST result back to Vercel app     │
└──────────────────────────────────────┘

GPU Specs

Spec	Value
Model	faster-whisper Large V3
Diarization	pyannote 3.1
GPU	RTX 4090 or L40S
Speed	40x real-time
12-min call	~18 seconds
Capacity (1 GPU)	~130K calls/mo
Lexington needs	55K calls/mo (42% utilization)
Cost model	Serverless (pay per second)
Monthly compute	~$130

Cost Comparison: Current vs Self-Hosted

Line Item	Current	Self-Hosted	Savings
Transcription	$4,400	$130	$4,270 (97%)
GPT-4o-mini (Notos)	$165	$165	$0
GPT-4o (Call Analyzer)	$160	$160	$0
Infrastructure	$45	$45	$0
Total	$4,770	$500	$4,270
Net (vs $3,500 revenue)	-$1,270	+$3,000

Cost distribution (self-hosted):

Transcription 26%

AI (GPT) 65%

Infra 9%

Scaled Platform — 10 Clients

Multi-tenant SaaS with CRM-agnostic delivery, 24/7 support, fully staffed

$71K

Monthly Revenue

$17.3K

Monthly Cost (staffed)

$53.7K

Monthly Net

75.6%

Gross Margin

Multi-Tenant Platform Architecture

CALL SOURCES

Dialer

Kixie

endcall webhook

Dialer

RingCentral

call recording API

Dialer

Dialpad

webhook

Dialer

Five9 / Other

API / webhook

Manual

File Upload

Web UI / API

▼ ▼ ▼

INGESTION LAYER

API Gateway

Vercel Edge

Webhook verification, org routing, token debit, job creation. Returns 200 immediately.

Queue

Supabase (Postgres)

Job queue with status tracking. Multi-tenant: org_id on every row. RLS policies enforce isolation.

Auth

Multi-Tenant Auth

API keys (webhooks), JWT (UI), org membership + roles (admin/manager/rep). Per-org token metering.

▼

TRANSCRIPTION LAYER (Self-Hosted)

Primary

RunPod Serverless

faster-whisper Large V3 + pyannote 3.1. Auto-scales 0-10 GPUs. Pay per second.

~$650/mo at 280K calls

Failover

vast.ai / Modal

Secondary GPU pool. Auto-failover if primary is down. Same Docker container, different provider.

~$200/mo (standby + overflow)

Fallback

Deepgram API

If both GPU providers fail, route to Deepgram Nova-3 ($0.26/hr). Auto-switch, no manual intervention.

$0 unless activated

▼

AI ANALYSIS LAYER

Always-On

Notos (GPT-4o-mini)

Every call: CRM-ready notes, follow-up extraction, key points, objections, next steps.

$0.003/call

Token-Gated

Call Analyzer (GPT-4o)

Selective: scorecard (0-100), coaching notes, compliance flags, department scoring.

$0.08/call (2 GPT calls)

Future

Custom Products

Lex Brain (RAG), compliance engine, sentiment tracking, custom scoring models per client.

Priced per product

▼

DELIVERY LAYER (CRM-Agnostic)

CRM

Salesforce

Task + ContentNote + PDF

CRM

HubSpot

Engagement + Note

CRM

Zoho CRM

Task + Attachment

CRM

Close / Pipedrive

Activity + Note

Other

Google Docs

Shared folder per rep

Other

Email Digest

Daily follow-up summary

Other

Webhook / API

Push to any system

Team Structure — 24/7 Coverage

Engineering

Senior Dev

Pipeline, integrations, GPU worker, new product development. US timezone.

$6,000/mo

Engineering

Backend Dev

CRM integrations, API routes, bug fixes, client onboarding. Offshore (overlap hours).

$3,000/mo

Support

Support Lead

Client comms, onboarding, escalations, monitoring dashboards. US timezone.

$3,500/mo

Support

Support Eng

After-hours monitoring, alert response, log review, basic troubleshooting. Offshore.

$1,500/mo

Risk Mitigation

HIGH

GPU Provider Outage

RunPod or vast.ai goes down, blocking all transcription for every client.

✓ Mitigation: Tri-layer failover — RunPod (primary) → vast.ai/Modal (secondary) → Deepgram API (tertiary). Auto-switch in <60s. Deepgram costs more but guarantees zero downtime. Each layer is independent infrastructure.

HIGH

Single Engineer Dependency

If the primary dev is unavailable, no one can fix critical issues.

✓ Mitigation: Two devs with overlapping knowledge. Full runbooks and incident playbooks. Infrastructure as code (Docker + Terraform). Offshore dev covers after-hours. Support eng trained on common operational tasks.

MED

CRM Integration Variance

Every client's CRM is configured differently — custom objects, triggers, field names, permissions.

✓ Mitigation: Standardized delivery adapter pattern. Core output is JSON. Each CRM has a thin adapter (Salesforce, HubSpot, Zoho, etc.). Client onboarding includes a CRM mapping session. Backend dev specializes in CRM integrations.

MED

Whisper Quality vs. Managed API

Self-hosted Whisper may have lower accuracy than AssemblyAI/Deepgram on noisy phone audio.

✓ Mitigation: VAD preprocessing (WhisperX) removes silence/noise before transcription. A/B test accuracy on first 1,000 Lexington calls before full cutover. Keep Deepgram as quality fallback for clients who demand premium accuracy (upsell opportunity).

MED

Data Security / Compliance

Processing sales call audio through third-party GPU clouds. Client data on external infrastructure.

✓ Mitigation: Audio is transient — downloaded, processed, deleted within minutes. No persistent storage on GPU workers. All data in transit encrypted (TLS). Results stored only in Supabase (SOC 2 Type II via AWS). DPA available per client. Can add dedicated GPU instances for enterprise clients who require it.

LOW

Scaling Beyond 10 Clients

At 20+ clients, queue depth and GPU utilization could spike.

✓ Mitigation: RunPod Serverless auto-scales to 10+ GPUs. Supabase scales to millions of rows. Vercel handles concurrent webhooks natively. The architecture is stateless — adding capacity is adding GPU workers, not redesigning anything.

LOW

Client Churn

Losing a $7,500/mo client hurts revenue.

✓ Mitigation: 12-month contracts (matching Lexington structure). High switching cost — once CRM integration and scoring models are configured, migration is painful. Case studies and ROI reports reinforce value. Support lead maintains quarterly business reviews.

LOW

pyannote Commercial License

pyannote speaker diarization requires a commercial license for business use.

✓ Mitigation: License is available via HuggingFace. Alternative: use Deepgram exclusively for Call Analyzer calls that need diarization (~20K/mo) at $0.26/hr, keeping self-hosted Whisper for Notos (no diarization needed).

Client Portfolio Model

#	Client	Reps	Calls/Mo	CRM	Products	Monthly
1	Lexington Capital	60	55,000	Salesforce	Notos + Analyzer	$3,500
2	Client B	35	28,000	Salesforce	Notos + Analyzer	$7,500
3	Client C	25	20,000	HubSpot	Notos + Analyzer	$7,500
4	Client D	40	32,000	Salesforce	Notos only	$7,500
5	Client E	20	16,000	Zoho	Notos + Analyzer	$7,500
6	Client F	30	24,000	HubSpot	Notos + Analyzer	$7,500
7	Client G	25	20,000	Salesforce	Notos + Analyzer	$7,500
8	Client H	35	28,000	Close	Notos only	$7,500
9	Client I	30	24,000	Pipedrive	Notos + Analyzer	$7,500
10	Client J	30	24,000	HubSpot	Notos + Analyzer	$7,500
	Total	330	271,000			$71,000

Financial Model

Full P&L across all three scenarios — conservative estimates, fully staffed

Monthly P&L Comparison

	Current (1 client, managed APIs)	Self-Hosted (1 client, own GPUs)	Scaled (10 clients, fully staffed)
Revenue
Client subscriptions	$3,500	$3,500	$71,000
Rep self-purchase (est.)	$0	$500	$4,000
Total Revenue	$3,500	$4,000	$75,000
Infrastructure
Transcription	$4,400	$130	$850
GPT-4o-mini (Notos)	$165	$165	$840
GPT-4o (Call Analyzer)	$160	$160	$1,600
Deepgram fallback (standby)	—	$0	$200
Vercel Pro	$20	$20	$20
Supabase Pro	$25	$25	$75
Monitoring (Betterstack/etc)	—	$0	$50
Total Infrastructure	$4,770	$500	$3,635
Team (Scaled Only)
Senior Dev (US)	—	—	$6,000
Backend Dev (Offshore)	—	—	$3,000
Support Lead (US)	—	—	$3,500
Support Eng (Offshore)	—	—	$1,500
Total Team	—	—	$14,000
Total Cost	$4,770	$500	$17,635
Net Profit	-$1,270	+$3,500	+$57,365
Margin	-36%	87.5%	76.5%

Annual Projection

Current (Losing Money)

-$15,240

Annual Loss

Revenue: $42K — Cost: $57.2K

Self-Hosted (Lexington Only)

+$42,000

Annual Profit

Revenue: $48K — Cost: $6K

Scaled (10 Clients, Staffed)

+$688K

Annual Profit

Revenue: $900K — Cost: $212K

Scaling Economics — Cost vs Revenue per Client Added

Clients	Revenue/Mo	Infra/Mo	Team/Mo	Total Cost	Net Profit	Margin	Annual Profit
1 (Lex only)	$4,000	$500	$0	$500	$3,500	87.5%	$42K
3	$19,000	$1,200	$0	$1,200	$17,800	93.7%	$214K
5	$34,000	$1,800	$9,500	$11,300	$22,700	66.8%	$272K
10	$75,000	$3,635	$14,000	$17,635	$57,365	76.5%	$688K
15	$112,500	$5,400	$18,000	$23,400	$89,100	79.2%	$1.07M
20	$150,000	$7,200	$24,000	$31,200	$118,800	79.2%	$1.43M

Key inflection points:

1-3 clients: Solo operation. No team needed. $42K-$214K/yr profit.
5 clients: Hire first team. Margin dips to 67% but net increases. $272K/yr.
10 clients: Full team (2 dev + 2 support). Sweet spot. $688K/yr at 76% margin.
15-20 clients: Add 1-2 more team members. Margin stabilizes at ~79%. $1M+ annual profit.

Unit Economics — Per Client

Revenue per Client

Base subscription	$7,500/mo
Rep self-purchase (est.)	$400/mo
Custom product add-ons	TBD
Total per client	$7,900/mo
Annual contract value	$94,800

Marginal Cost per Client

Transcription (GPU, ~25K calls)	$60
GPT-4o-mini Notos	$75
GPT-4o Call Analyzer	$160
CRM integration maintenance	$50
Marginal cost/client	$345/mo
Marginal margin	95.6%

Every new client after the first adds ~$7,555/mo in profit. The fixed costs (team, base infra) are already covered.

Current Architecture

Monthly Cost Breakdown

Self-Hosted Architecture

GPU Transcription Service

Cost Comparison: Current vs Self-Hosted

Scaled Platform — 10 Clients

CALL SOURCES

INGESTION LAYER

TRANSCRIPTION LAYER (Self-Hosted)

AI ANALYSIS LAYER

DELIVERY LAYER (CRM-Agnostic)

Team Structure — 24/7 Coverage

Risk Mitigation

Client Portfolio Model

Financial Model

Monthly P&L Comparison

Annual Projection

Scaling Economics — Cost vs Revenue per Client Added

Unit Economics — Per Client

Revenue per Client

Marginal Cost per Client