Current Setup
Self-Hosted (Lexington)
Scaled (10 Clients)
Financials

Current Architecture

AssemblyAI + GPT-4o — Managed APIs, simple but expensive at scale

$3,500
Monthly Revenue
$4,770
Monthly Cost
-$1,270
Monthly Net
-36%
Gross Margin
Pipeline Flow — Current
SOURCE
Kixie
endcall webhook
INGEST
Vercel App
Accept + queue
TRANSCRIBE
AssemblyAI
$0.08/call — 92% of cost
ANALYZE
GPT-4o / 4o-mini
Notos + Call Analyzer
DELIVER
Salesforce
Tasks + Notes + PDF

Monthly Cost Breakdown

Line ItemPer CallCalls/MoMonthly% of Total
AssemblyAI Transcription $0.0855,000 $4,400 92.2%
GPT-4o-mini (Notos) $0.00355,000 $165 3.5%
GPT-4o (Call Analyzer) $0.082,000 $160 3.4%
Infrastructure (Vercel + Supabase) $45 0.9%
Total $0.087 $4,770 100%
Cost distribution:
Transcription 92%
AI + Infra 8%
The Problem
Revenue: $3,500/mo — Cost: $4,770/mo = -$1,270/mo loss
Managed transcription APIs charge 10-40x the actual compute cost. At 55,000 calls/month, this markup makes the deal unprofitable. The AI analysis (GPT) is cheap. The bottleneck is paying someone else to run Whisper for us.

Self-Hosted Architecture

Lexington Capital — faster-whisper + pyannote on GPU cloud, everything else unchanged

$3,500
Monthly Revenue
$870
Monthly Cost
+$2,630
Monthly Net
75%
Gross Margin
Pipeline Flow — Self-Hosted Transcription
SOURCE
Kixie
endcall webhook
INGEST
Vercel App
Accept + queue to DB
TRANSCRIBE
GPU Worker
faster-whisper + pyannote
ANALYZE
GPT-4o / 4o-mini
Notos + Call Analyzer
DELIVER
Salesforce
Tasks + Notes + PDF

GPU Transcription Service

Worker Architecture
┌──────────────────────────────────────┐
RunPod Serverless / GPU Cloud
│ │
Supabase Queue
│ │ poll for pending jobs │
│ ▼ │
■ Download MP3 from recording URL │
│ │ │
│ ▼ │
■ faster-whisper Large V3 │
~18s for 12-min call (40x RT)
│ │ │
│ ▼ │
■ pyannote diarization │
(Call Analyzer only — who said what)
│ │ │
│ ▼ │
■ POST result back to Vercel app │
└──────────────────────────────────────┘
GPU Specs
SpecValue
Modelfaster-whisper Large V3
Diarizationpyannote 3.1
GPURTX 4090 or L40S
Speed40x real-time
12-min call~18 seconds
Capacity (1 GPU)~130K calls/mo
Lexington needs55K calls/mo (42% utilization)
Cost modelServerless (pay per second)
Monthly compute~$130

Cost Comparison: Current vs Self-Hosted

Line ItemCurrentSelf-HostedSavings
Transcription $4,400 $130 $4,270 (97%)
GPT-4o-mini (Notos) $165 $165 $0
GPT-4o (Call Analyzer) $160 $160 $0
Infrastructure $45 $45 $0
Total $4,770 $500 $4,270
Net (vs $3,500 revenue) -$1,270 +$3,000
Cost distribution (self-hosted):
Transcription 26%
AI (GPT) 65%
Infra 9%

Scaled Platform — 10 Clients

Multi-tenant SaaS with CRM-agnostic delivery, 24/7 support, fully staffed

$71K
Monthly Revenue
$17.3K
Monthly Cost (staffed)
$53.7K
Monthly Net
75.6%
Gross Margin
Multi-Tenant Platform Architecture

CALL SOURCES

Dialer
Kixie
endcall webhook
Dialer
RingCentral
call recording API
Dialer
Dialpad
webhook
Dialer
Five9 / Other
API / webhook
Manual
File Upload
Web UI / API
▼ ▼ ▼

INGESTION LAYER

API Gateway
Vercel Edge
Webhook verification, org routing, token debit, job creation. Returns 200 immediately.
Queue
Supabase (Postgres)
Job queue with status tracking. Multi-tenant: org_id on every row. RLS policies enforce isolation.
Auth
Multi-Tenant Auth
API keys (webhooks), JWT (UI), org membership + roles (admin/manager/rep). Per-org token metering.

TRANSCRIPTION LAYER (Self-Hosted)

Primary
RunPod Serverless
faster-whisper Large V3 + pyannote 3.1. Auto-scales 0-10 GPUs. Pay per second.
~$650/mo at 280K calls
Failover
vast.ai / Modal
Secondary GPU pool. Auto-failover if primary is down. Same Docker container, different provider.
~$200/mo (standby + overflow)
Fallback
Deepgram API
If both GPU providers fail, route to Deepgram Nova-3 ($0.26/hr). Auto-switch, no manual intervention.
$0 unless activated

AI ANALYSIS LAYER

Always-On
Notos (GPT-4o-mini)
Every call: CRM-ready notes, follow-up extraction, key points, objections, next steps.
$0.003/call
Token-Gated
Call Analyzer (GPT-4o)
Selective: scorecard (0-100), coaching notes, compliance flags, department scoring.
$0.08/call (2 GPT calls)
Future
Custom Products
Lex Brain (RAG), compliance engine, sentiment tracking, custom scoring models per client.
Priced per product

DELIVERY LAYER (CRM-Agnostic)

CRM
Salesforce
Task + ContentNote + PDF
CRM
HubSpot
Engagement + Note
CRM
Zoho CRM
Task + Attachment
CRM
Close / Pipedrive
Activity + Note
Other
Google Docs
Shared folder per rep
Other
Email Digest
Daily follow-up summary
Other
Webhook / API
Push to any system

Team Structure — 24/7 Coverage

Engineering
Senior Dev
Pipeline, integrations, GPU worker, new product development. US timezone.
$6,000/mo
Engineering
Backend Dev
CRM integrations, API routes, bug fixes, client onboarding. Offshore (overlap hours).
$3,000/mo
Support
Support Lead
Client comms, onboarding, escalations, monitoring dashboards. US timezone.
$3,500/mo
Support
Support Eng
After-hours monitoring, alert response, log review, basic troubleshooting. Offshore.
$1,500/mo

Risk Mitigation

HIGH
GPU Provider Outage
RunPod or vast.ai goes down, blocking all transcription for every client.
✓ Mitigation: Tri-layer failover — RunPod (primary) → vast.ai/Modal (secondary) → Deepgram API (tertiary). Auto-switch in <60s. Deepgram costs more but guarantees zero downtime. Each layer is independent infrastructure.
HIGH
Single Engineer Dependency
If the primary dev is unavailable, no one can fix critical issues.
✓ Mitigation: Two devs with overlapping knowledge. Full runbooks and incident playbooks. Infrastructure as code (Docker + Terraform). Offshore dev covers after-hours. Support eng trained on common operational tasks.
MED
CRM Integration Variance
Every client's CRM is configured differently — custom objects, triggers, field names, permissions.
✓ Mitigation: Standardized delivery adapter pattern. Core output is JSON. Each CRM has a thin adapter (Salesforce, HubSpot, Zoho, etc.). Client onboarding includes a CRM mapping session. Backend dev specializes in CRM integrations.
MED
Whisper Quality vs. Managed API
Self-hosted Whisper may have lower accuracy than AssemblyAI/Deepgram on noisy phone audio.
✓ Mitigation: VAD preprocessing (WhisperX) removes silence/noise before transcription. A/B test accuracy on first 1,000 Lexington calls before full cutover. Keep Deepgram as quality fallback for clients who demand premium accuracy (upsell opportunity).
MED
Data Security / Compliance
Processing sales call audio through third-party GPU clouds. Client data on external infrastructure.
✓ Mitigation: Audio is transient — downloaded, processed, deleted within minutes. No persistent storage on GPU workers. All data in transit encrypted (TLS). Results stored only in Supabase (SOC 2 Type II via AWS). DPA available per client. Can add dedicated GPU instances for enterprise clients who require it.
LOW
Scaling Beyond 10 Clients
At 20+ clients, queue depth and GPU utilization could spike.
✓ Mitigation: RunPod Serverless auto-scales to 10+ GPUs. Supabase scales to millions of rows. Vercel handles concurrent webhooks natively. The architecture is stateless — adding capacity is adding GPU workers, not redesigning anything.
LOW
Client Churn
Losing a $7,500/mo client hurts revenue.
✓ Mitigation: 12-month contracts (matching Lexington structure). High switching cost — once CRM integration and scoring models are configured, migration is painful. Case studies and ROI reports reinforce value. Support lead maintains quarterly business reviews.
LOW
pyannote Commercial License
pyannote speaker diarization requires a commercial license for business use.
✓ Mitigation: License is available via HuggingFace. Alternative: use Deepgram exclusively for Call Analyzer calls that need diarization (~20K/mo) at $0.26/hr, keeping self-hosted Whisper for Notos (no diarization needed).

Client Portfolio Model

#ClientRepsCalls/MoCRMProductsMonthly
1Lexington Capital6055,000SalesforceNotos + Analyzer$3,500
2Client B3528,000SalesforceNotos + Analyzer$7,500
3Client C2520,000HubSpotNotos + Analyzer$7,500
4Client D4032,000SalesforceNotos only$7,500
5Client E2016,000ZohoNotos + Analyzer$7,500
6Client F3024,000HubSpotNotos + Analyzer$7,500
7Client G2520,000SalesforceNotos + Analyzer$7,500
8Client H3528,000CloseNotos only$7,500
9Client I3024,000PipedriveNotos + Analyzer$7,500
10Client J3024,000HubSpotNotos + Analyzer$7,500
Total 330 271,000 $71,000

Financial Model

Full P&L across all three scenarios — conservative estimates, fully staffed

Monthly P&L Comparison

Current
(1 client, managed APIs)
Self-Hosted
(1 client, own GPUs)
Scaled
(10 clients, fully staffed)
Revenue
Client subscriptions $3,500 $3,500 $71,000
Rep self-purchase (est.) $0 $500 $4,000
Total Revenue $3,500 $4,000 $75,000
Infrastructure
Transcription $4,400 $130 $850
GPT-4o-mini (Notos) $165 $165 $840
GPT-4o (Call Analyzer) $160 $160 $1,600
Deepgram fallback (standby) $0 $200
Vercel Pro $20 $20 $20
Supabase Pro $25 $25 $75
Monitoring (Betterstack/etc) $0 $50
Total Infrastructure $4,770 $500 $3,635
Team (Scaled Only)
Senior Dev (US) $6,000
Backend Dev (Offshore) $3,000
Support Lead (US) $3,500
Support Eng (Offshore) $1,500
Total Team $14,000
Total Cost $4,770 $500 $17,635
Net Profit -$1,270 +$3,500 +$57,365
Margin -36% 87.5% 76.5%

Annual Projection

Current (Losing Money)
-$15,240
Annual Loss
Revenue: $42K — Cost: $57.2K
Self-Hosted (Lexington Only)
+$42,000
Annual Profit
Revenue: $48K — Cost: $6K
Scaled (10 Clients, Staffed)
+$688K
Annual Profit
Revenue: $900K — Cost: $212K

Scaling Economics — Cost vs Revenue per Client Added

Clients Revenue/Mo Infra/Mo Team/Mo Total Cost Net Profit Margin Annual Profit
1 (Lex only) $4,000 $500 $0 $500 $3,500 87.5% $42K
3 $19,000 $1,200 $0 $1,200 $17,800 93.7% $214K
5 $34,000 $1,800 $9,500 $11,300 $22,700 66.8% $272K
10 $75,000 $3,635 $14,000 $17,635 $57,365 76.5% $688K
15 $112,500 $5,400 $18,000 $23,400 $89,100 79.2% $1.07M
20 $150,000 $7,200 $24,000 $31,200 $118,800 79.2% $1.43M
Key inflection points:
  • 1-3 clients: Solo operation. No team needed. $42K-$214K/yr profit.
  • 5 clients: Hire first team. Margin dips to 67% but net increases. $272K/yr.
  • 10 clients: Full team (2 dev + 2 support). Sweet spot. $688K/yr at 76% margin.
  • 15-20 clients: Add 1-2 more team members. Margin stabilizes at ~79%. $1M+ annual profit.

Unit Economics — Per Client

Revenue per Client

Base subscription$7,500/mo
Rep self-purchase (est.)$400/mo
Custom product add-onsTBD
Total per client$7,900/mo
Annual contract value$94,800

Marginal Cost per Client

Transcription (GPU, ~25K calls)$60
GPT-4o-mini Notos$75
GPT-4o Call Analyzer$160
CRM integration maintenance$50
Marginal cost/client$345/mo
Marginal margin95.6%
Every new client after the first adds ~$7,555/mo in profit. The fixed costs (team, base infra) are already covered.