Cost Measurement Methodology¶

Continuous Architecture Platform — Phase 1 AI Tool Comparison

Last Updated: 2026-03-04

Incorporates deep research findings on agentic token economics, the ReAct re-transmission tax, and Copilot's semantic retrieval architecture. See DEEP-RESEARCH-1.md and DEEP-RESEARCH-2.md.

Updated for OpenRouter (replacing Kong AI Gateway) — OpenRouter provides exact per-request token counts and costs.

REVISED 2026-03-04: Updated with actual billing data from run 002 execution on both platforms and deep research findings on Copilot billing mechanics. Previous estimates were significantly wrong — OpenRouter actual cost was ~7.5x higher than projected; Copilot bills per user prompt (not per model turn), making the original formula irrelevant. See DEEP-RESEARCH-RESULTS-COPILOT-BILLING.md for the definitive billing analysis with 39 cited sources.

Purpose¶

This document describes how we measure the exact cost of running architecture scenarios through each AI toolchain. It covers what we can measure, what we cannot, the methodology behind our estimates, and the full cost analysis.

Key finding: The two toolchains have fundamentally different cost visibility. OpenRouter provides exact per-request token counts and costs, while GitHub Copilot provides zero token-level data. This creates an asymmetric measurement challenge that we address through a combination of direct measurement (OpenRouter) and content-based estimation (Copilot).

The Fundamental Asymmetry¶

The two toolchains have incompatible cost models and incompatible context management architectures:

Dimension	OpenRouter (Roo Code)	GitHub Copilot
Cost model	Variable — pay per token	Fixed — flat monthly subscription
Context management	Client-side — entire conversation history re-sent every turn	Server-side — @workspace semantic retrieval + sliding window compaction
Input tokens per turn	50K-180K (full history payload, growing each turn)	<5K (only top-k relevant code chunks via RAG)
Token visibility	Full — exact counts in API response and activity dashboard	None — no per-request token API
Billing API	OpenRouter Activity page + API response `usage` object	Not accessible for individual accounts*
Cost per scenario	Directly measurable with exact precision	Premium requests x $0.04 (actual billing rate)
Cost sensitivity	Scales quadratically with session length (re-transmission)	Scales linearly with user prompts only; autonomous tool calls are free; absorbed by flat subscription up to 1,500 premium requests/month
Infrastructure required	None (fully managed SaaS)	None (fully managed SaaS)

* We tested all known GitHub APIs — see API Availability below.

Measurement Approach¶

OpenRouter: Exact Measurement¶

OpenRouter provides exact per-request token counts and costs through multiple channels:

Source	Data Available	Collection Method
API response `usage` object	`prompt_tokens`, `completion_tokens`, `total_tokens`	Logged by Roo Code in request/response cycle
OpenRouter Activity page	Per-request cost breakdown, model used, timestamps	Manual export from https://openrouter.ai/activity
OpenRouter API	Programmatic access to usage history	`GET https://openrouter.ai/api/v1/auth/key` for credit balance

For each Roo Code run, we collect:

Metric	Source	Precision
Input tokens (cumulative)	OpenRouter Activity page	Exact
Output tokens	OpenRouter Activity page	Exact
Cost per request	OpenRouter Activity page	Exact (to $0.0001)
Model used	OpenRouter Activity page	Exact
Request count	OpenRouter Activity page	Exact
Total run cost	Sum of per-request costs	Exact

OpenRouter Pricing (Claude Opus 4.6)¶

OpenRouter pricing varies by model. For Claude Opus 4.6 (the model used in this comparison):

Parameter	Value
Input price	Check https://openrouter.ai/models for current pricing
Output price	Check https://openrouter.ai/models for current pricing
Context window	200K tokens

Pricing should be captured at the time of each run from the OpenRouter Activity page, which shows the exact dollar amount charged.

Measuring the Re-transmission Tax¶

Because OpenRouter reports per-request token counts, we can directly observe the re-transmission tax — the growing input token count across successive turns in an agentic session:

Turn 1:  prompt_tokens = 12,000   (system prompt + tools + initial context)
Turn 5:  prompt_tokens = 45,000   (+ file reads + previous outputs)
Turn 10: prompt_tokens = 95,000   (cumulative growth)
Turn 15: prompt_tokens = 140,000  (approaching context limit)

The total billed input is the sum across all turns, not the final context size. This is the dominant cost driver.

GitHub Copilot: Content-Based Estimation¶

Since GitHub Copilot provides no token-level billing data, we use content-based estimation from git history as a secondary metric:

Metric	Source	Purpose
Output content	`git diff` — added lines (bytes)	Proxy for output tokens generated by the AI
Input context	Workspace file inventory (bytes)	Proxy for input tokens (files read as context)
Files changed	`git diff --stat`	Scope of work performed
Per-scenario breakdown	`git diff` filtered by ticket ID	Cost attribution per scenario
Token estimate	Character count ÷ 4	Industry-standard approximation for English/code mix

What We Cannot Measure (Copilot)¶

Metric	Why Unavailable
Exact input/output token counts	Copilot does not expose per-request token data
Model selection per request	Copilot routes requests internally; user sees only the response
Rejected/retry attempts	Failed completions and retries are invisible
Context window packing	Internal prompt engineering overhead is unknown
Premium request count	API endpoint returns 404 for personal accounts

Token Estimation Method¶

We use the 4 characters per token heuristic:

$$\text{Estimated Tokens} = \left\lfloor \frac{\text{Character Count}}{4} \right\rfloor$$

This is conservative for architecture prose (which tends to use longer words and structured markdown, averaging closer to 4.5-5 chars/token). Our estimates therefore represent a slight overcount, making the variable-cost projection a ceiling rather than a floor.

GitHub API Availability¶

We systematically tested every known GitHub API endpoint that could provide Copilot usage or billing data. All returned 404 Not Found.

Endpoint	Result	Notes
`GET /user/copilot/billing`	404	Requires org admin scope
`GET /copilot/usage`	404	Org-level API (GA late 2024)
`GET /user/copilot`	404	Not available for individual accounts
`GET /user/settings/billing/actions`	404	Actions billing, not Copilot
`GET /user/settings/billing/packages`	404	Packages billing
`GET /user/settings/billing/shared-storage`	404	Storage billing
`gh copilot --help`	"Cannot find GitHub Copilot CLI"	CLI extension not installed
GraphQL `viewer` query	✅ Works	No Copilot-specific fields available

Conclusion: GitHub's Copilot Metrics API (/orgs/{org}/copilot/metrics) requires organization-level admin access with a manage_billing:copilot scope. Individual/personal accounts have no programmatic access to their own Copilot usage data. This is a documented limitation of the GitHub API as of March 2026.

The Agentic Re-transmission Tax¶

How Agentic Loops Drive Cost¶

Deep research (DEEP-RESEARCH-1.md, DEEP-RESEARCH-2.md) reveals that the dominant cost driver in usage-based agentic tools is cumulative re-transmission of the conversation history:

LLMs are stateless. They have no memory of previous turns.
To maintain continuity, the orchestration layer (Roo Code) must bundle the entire conversation history — system prompt, tool definitions, every previous file read, every tool output, every assistant response — and re-transmit it to the LLM at every single turn.
Context grows monotonically: turn 1 sends ~10K tokens, turn 10 sends ~80K tokens, turn 20 sends ~150K+ tokens.
The total billed input tokens are the sum across all turns, not the final context size.

This creates a quadratic cost curve: doubling the number of turns more than doubles the cost, because each additional turn sends a larger payload.

The Two Architectures¶

	Roo Code + OpenRouter	GitHub Copilot
Context model	Client-side state machine — full history re-serialized and transmitted every turn	Server-side @workspace RAG — semantic search retrieves only top-k relevant chunks (<5K tokens/turn)
Input per turn	50K-180K tokens (cumulative, growing)	<5K tokens (bounded, stable)
Re-transmission	Entire history repeated at every turn	Backend manages state; only deltas sent
Context limit handling	Client-side "Intelligent Context Condensing" — halts loop, sends secondary API call to summarize (itself billable)	Server-side sliding window + auto-compaction — invisible to user, no additional API cost
Failure mode	Context-length errors may cause retry loops	Aggressive truncation → precision loss on early instructions (mitigable with /compact)

Copilot's @workspace Semantic Retrieval¶

GitHub Copilot does not dump raw files into the context window. Instead:

A background process parses the codebase and generates dense embeddings using proprietary code-optimized models.
When the agent needs context, it performs a semantic similarity search against this index.
Only the top-k most relevant code chunks are bound to the prompt — typically keeping context overhead to <5K tokens per turn.
This is augmented by persistent "Agentic Memory" — cross-session knowledge of coding conventions and architectural patterns.
When the session approaches 95% of the context limit, background auto-compaction summarizes history transparently.

This means Copilot's internal token consumption, while potentially large, is entirely absorbed by the flat subscription fee. The enterprise bears zero variable cost regardless of how many tokens are processed internally.

Revised Cost Analysis: GitHub Copilot Execution¶

Execution Summary¶

All 5 scenarios were executed in a single Copilot Agent session on 2026-03-01, committed as 34150d9.

Metric	Value
Commit range	`e83f83e`..`34150d9`
Files changed	23
Lines added	1,754
Lines removed	165
Net content added	80,584 bytes
Total tool calls (observed)	~85
Files read	40
Files created	16
Files modified	5
Wall-clock time	~100 minutes
Copilot cost	4 user prompts x 3x multiplier x $0.04 = $0.48 (see Copilot Billing below)

What Would This Cost via OpenRouter + Roo Code?¶

Using the agentic re-transmission model from the deep research, we estimate the true variable cost for each scenario if executed through the Roo Code + OpenRouter stack. These estimates will be validated against actual OpenRouter Activity data once the Roo Code execution completes.

Methodology: For each scenario, model the context window growing from an initial ~10K tokens (system prompt + tools) through N turns, with each file read and tool output adding to the cumulative payload. Total input = sum of context size at each turn. Pricing: Claude Opus 4.6 via OpenRouter (see OpenRouter pricing page for current rates).

NOTE: The estimates below use Claude Sonnet pricing ($3.00/1M input, $15.00/1M output) as a baseline. Actual costs will differ based on the model and OpenRouter's current pricing. After each Roo Code run, replace these estimates with exact costs from the OpenRouter Activity page.

Scenario	Ticket	Tool Calls	Files Read	Avg Context/Turn	Cumulative Input	Output Est.	Variable Cost
SC-01	NTK-10005	12	3	~25K	~300K	~10K	$1.05
SC-02	NTK-10002	18	12	~45K	~810K	~15K	$2.66
SC-03	NTK-10004	25	8	~65K	~1,625K	~30K	$5.33
SC-04	NTK-10001	10	3	~22K	~220K	~8K	$0.78
TOTAL		85	40		4,055K	83K	$13.42

CORRECTION (2026-03-04): The estimates above used Claude Sonnet pricing ($3.00/1M input, $15.00/1M output). Actual run 002 used Claude Opus 4.6 via OpenRouter, which is substantially more expensive. Actual OpenRouter billing for the run 002 execution window (March 4, 10:11-10:37 AM) showed $100 in auto-top-up charges (4 x $25). This means the actual per-run cost is approximately $100 — roughly 7.5x higher than the Sonnet-based estimate. The re-transmission tax model was directionally correct but the pricing input was wrong.

Monthly Cost Projection¶

Using the measurement protocol's monthly frequency (26 base runs + 12 PROMOTE runs = 38 runs/month):

REVISED (2026-03-04): Original estimates used Claude Sonnet pricing. Actual Claude Opus 4.6 costs are ~7.5x higher. Tables below show both the original estimates and the revised actuals.

Original Estimates (Claude Sonnet pricing — SUPERSEDED)¶

Scenario	Per-Run (est.)	Monthly Freq (+PROMOTE)	Monthly Cost (est.)
SC-01	$1.05	10	$10.50
SC-02	$2.66	6	$15.96
SC-03	$5.33	4	$21.32
SC-04	$0.78	4	$3.12
SC-05	$3.60	2	$7.20
PROMOTE (SC-04-like)	$0.78	12	$9.36
TOTAL		38	$67.46

Revised Actuals (Claude Opus 4.6 via OpenRouter)¶

Metric	Value
Actual cost for 1 run (5 scenarios)	~$100 (based on auto-top-up charges)
Average cost per scenario	~$20
Estimated monthly (38 runs)	~$507 (using proportional $13.35/scenario avg)
Estimated monthly (adjusted)	~$507

NOTE: The $100/run figure includes some overhead from other concurrent usage and the Claude Opus 4.6 model premium. Exact per-generation costs should be retrieved from the OpenRouter Activity dashboard.

Revised Platform Comparison¶

Cost Model	Monthly (38 runs, Sonnet est.)	Monthly (38 runs, Opus actuals)
OpenRouter (variable)	$67.46 (est.)	~$507 (actual-based)
GitHub Copilot Pro+ (base)	$39.00	$39.00
Ratio	OpenRouter 1.7x more	OpenRouter ~13x more

Break-Even Analysis¶

The break-even question: at what usage volume would OpenRouter become cheaper than Copilot?

$$\text{Break-even runs} = \frac{\text{Copilot Monthly Cost}}{\text{Average Variable Cost per Run}}$$

Average variable cost per run (actual): ~$100 / 5 scenarios full run = ~$100/run

Tier	Break-Even Point	Current Volume	Verdict
Copilot Pro+ ($39/month)	<1 run/month	38 runs/month	Copilot wins by ~13x
Copilot Pro+ with full overage	~5 runs/month (at $8/run est.)	38 runs/month	Copilot still wins dramatically

REVISED (2026-03-04): With actual Opus 4.6 pricing, OpenRouter never breaks even against Copilot at any reasonable volume. A single OpenRouter run (~$100) costs more than an entire month of Copilot Pro+ ($39). Deep research confirmed that Copilot's per-session cost is $0.48 (4 user prompts x 3x x $0.04), making the gap even wider: ~208x cheaper per session.

Cost Per Quality Point¶

Metric	OpenRouter (variable, est.)	Copilot Pro+
Monthly cost (38 runs)	~$67.46 (estimated)	$39.00 + overage
Quality score	TBD	TBD
Cost per quality point	TBD	TBD

Total Cost of Ownership (Beyond Token Costs)¶

Both tools now operate as fully managed SaaS — OpenRouter replaces the self-hosted Kong AI Gateway, eliminating most infrastructure overhead. Remaining TCO differences:

Factor	OpenRouter + Roo Code	GitHub Copilot
Infrastructure	None (SaaS)	None (SaaS)
API key management	Single OpenRouter API key	GitHub OAuth (managed)
Token cost visibility	Full — exact per-request costs	None — fixed subscription
Budget predictability	Variable — depends on usage volume and model	Fixed — known monthly cost
Context management	Client-side (Roo Code manages history)	Server-side (Copilot manages internally)
Model flexibility	Any model on OpenRouter	Limited to Copilot-supported models
Rate limiting	OpenRouter rate limits apply	Copilot premium request limits apply

Important Caveats¶

1. OpenRouter Provides Exact Costs — Estimates Will Be Replaced¶

The variable cost estimates in this document are preliminary based on the agentic re-transmission model. After each Roo Code execution, the estimates will be replaced with exact costs from the OpenRouter Activity page. This is a significant advantage over the previous Kong AI setup, which required infrastructure-level monitoring.

2. Copilot Has Its Own Weakness: Precision Loss¶

The deep research identifies that Copilot's aggressive sliding window truncation can cause the agent to "forget" instructions from early in a long session. This is a quality risk, not a cost risk. It is mitigable by: - Using the /compact command to manually anchor critical instructions - Periodically summarizing progress into checkpoint files - Breaking very long sessions into discrete sub-tasks

This precision loss was observable in our execution: later scenarios had less access to early scenario context. However, quality scores remained >92% across all scenarios.

3. Our Variable Cost Estimates Are Conservative¶

The per-scenario variable cost estimates above assume: - Each scenario runs as a separate session (context resets between scenarios) - No error correction loops or self-correction retries - No context condensing overhead (secondary API calls to summarize)

In practice, all of these add 20-50% overhead. The deep research documents a 5-9× iteration tax for agentic systems vs. standard chat, driven by multi-step planning and self-correction loops. Our estimates do not apply this multiplier, making them a floor, not a ceiling.

4. Model Pricing Differences Matter¶

Claude Opus 4.6 via OpenRouter has different pricing than Claude Sonnet. The estimates in the Monthly Cost Projection section use Sonnet pricing as a baseline — actual Opus 4.6 costs will be higher. Always use the measured OpenRouter Activity data rather than these estimates.

5. Copilot Pro+ Billing: Resolved via Deep Research¶

GitHub Copilot Pro+ ($39/month) includes 1,500 premium requests/month. Deep research (DEEP-RESEARCH-RESULTS-COPILOT-BILLING.md) definitively resolved the billing mechanics:

Billing unit = user prompt, NOT model invocation. In Agent Mode, the autonomous loop (tool calls, file reads, terminal commands, sub-agents, context summarization) is entirely free — absorbed by GitHub's infrastructure. Only explicit human-typed prompts consume premium requests.

Parameter	Original (WRONG)	Corrected (Deep Research)
Billing unit	Per model turn/invocation	Per user prompt
Rate per premium request	$0.028 ("Pro+ discount")	$0.04 (actual, no discount)
Model multiplier	x30 ("fast preview")	x3 (standard Opus 4.6)
Formula	turns x $0.028 x 30	User Prompts x Model Multiplier x $0.04
Run 002 session cost	$46.20 (estimate)	$0.48 (4 prompts x 3 x $0.04)
Autonomous tool calls	Assumed billed	Free

Origin of the $0.028 error: The $0.028 rate was a per-million-token cache-hit rate from DeepSeek/Azure OpenAI API pricing — a completely different billing model and unit. It was never a valid Copilot rate.

Model multipliers (applied per user prompt):

Model	Multiplier	Cost per User Prompt
GPT-4.1, GPT-4o	x0	$0 (included, unlimited)
Claude Opus 4.6 (standard)	x3	$0.12
Claude Opus 4.6 fast (preview)	x30	$1.20

Run 002 verification: 4 user prompts x 3x (standard Opus) = 12 premium requests = $0.48. The daily total of 120 premium requests ($4.80) included all other Copilot usage across projects. At 3x multiplier, 120 requests = ~40 user prompts across all VS Code instances for the day.

Additional findings: - Sub-agents: Intended to be free, but a known VS Code bug in early 2026 caused some to be billed. Frequently fall back to 0x models. - Context summarization: Free — uses cheaper/free models. - 1,500 allowance resets on calendar month at 00:00 UTC (not billing cycle). - Quota exhaustion: Silent fallback to 0x models (GPT-4.1). - Auto-model selection: 10% multiplier discount when enabled.

6. OpenRouter Cost Retrieval Script¶

The scripts/openrouter-cost.py tool automates cost data collection from the OpenRouter API. It supports:

Balance check: python3 scripts/openrouter-cost.py balance -- shows current credit usage
Single generation: python3 scripts/openrouter-cost.py generation <id> -- detailed cost for one API call
Multiple generations: python3 scripts/openrouter-cost.py generations <id1> <id2> -- batch lookup
Summary from file: python3 scripts/openrouter-cost.py summary --file ids.txt --format json -- bulk cost report

Set OPENROUTER_API_KEY environment variable before use. Generation IDs are returned in each OpenRouter API response (id field, format: gen-xxxxxxxxxxxxxxxx).

Reproducing This Analysis¶

# Git-diff-based content measurement (captures output, not process cost):
cd /path/to/continuous-architecture-platform-poc
python3 scripts/cost-measurement.py analyze e83f83e 34150d9

# Note: The script measures content delta only. The true variable cost
# requires modeling the agentic re-transmission tax as described above.

Summary¶

Finding	Estimated (pre-run)	Actual (post-run 002)
OpenRouter cost (5 scenarios)	~$13.42 (Sonnet pricing)	~$100 (Opus 4.6 actuals)
OpenRouter monthly (38 runs)	~$67.46 (Sonnet pricing)	~$507 (extrapolated)
Copilot Pro+ monthly (base)	$39.00	$39.00 (confirmed)
Copilot Pro+ full-day cost	$0.084/turn x ~55 turns = $4.62	$4.80 (120 req x $0.04 all day); $0.48 for run 002 (4 prompts x 3 x $0.04)
Cost ratio	Copilot ~1.7x cheaper (est.)	Copilot ~13x cheaper (full day) / ~208x cheaper (per session)
Break-even	~22 runs/month (est.)	<1 run/month (Copilot always wins)
OpenRouter measurement precision	Exact (confirmed)	Exact (auto-top-ups observable)
Copilot measurement precision	Deterministic formula	Resolved — user prompts x multiplier x $0.04
Key correction	Sonnet pricing undercounted OpenRouter by ~7.5x	Copilot bills per user prompt, not per turn; $0.028 was never valid
Recommendation	Collect actual OpenRouter costs	Data collected; Copilot is decisively cheaper