OpenSpec Evaluation Plan¶

Date: 2026-03-17 Status: Proposed Prerequisites: Review of OPENSPEC-ANALYSIS.md and approval of Option C (Time-Boxed PoC) Effort: 1 development session (single architecture scenario)

1. Objective¶

Run a controlled evaluation of OpenSpec alongside the existing solution design workflow to determine whether OpenSpec adds measurable value for:

AI agent workflow enforcement and guidance
Behavioral specification capture (Given/When/Then)
Change tracking and delta management
Multi-tool AI agent portability

2. Pre-Conditions¶

Before starting the PoC:

[ ] Run deep research prompt from DEEP-RESEARCH-PROMPT-OPENSPEC-EVALUATION.md
[ ] Review deep research results — confirm custom schema feasibility and enterprise adoption evidence
[ ] Confirm Option C (PoC) is still the right approach given research findings
[ ] Identify a suitable NTK ticket for the evaluation (cross-domain, 2+ services, API contract changes)

3. Phase 1: Installation and Setup¶

3.1 Install OpenSpec¶

npm install -g @fission-ai/openspec@latest

3.2 Initialize in Workspace¶

cd /Users/christopherblaisdell/Documents/continuous-architecture-platform-poc-2
openspec init

3.3 Create Custom Architecture Schema¶

This is the critical integration point. The custom schema must map to NovaTrek's architecture artifact structure.

# openspec/schemas/novatrek-architecture/schema.yaml
name: novatrek-architecture
artifacts:
  - id: requirements
    generates: requirements.md
    requires: []

  - id: analysis
    generates: analysis.md
    requires: [requirements]

  - id: specs
    generates: specs/**/*.md
    requires: [requirements]

  - id: decisions
    generates: decisions.md
    requires: [specs]

  - id: impacts
    generates: impacts/**/*.md
    requires: [specs, decisions]

  - id: risks
    generates: risks.md
    requires: [specs]

  - id: tasks
    generates: tasks.md
    requires: [specs, decisions, impacts]

3.4 Create Project Configuration¶

# openspec/config.yaml
schema: novatrek-architecture

context: |
  NovaTrek Adventures is a fictional adventure tourism platform with 19
  microservices across 9 bounded domains. Architecture decisions follow
  MADR format. API contracts are defined in OpenAPI YAML specs under
  architecture/specs/. All architecture work is ticket-driven (NTK-XXXXX).

rules:
  requirements:
    - Source requirements from JIRA ticket (mock tool) first
    - Include ticket ID (NTK-XXXXX) in the document title
    - Cross-reference with capability model in architecture/metadata/capabilities.yaml
  specs:
    - Use Given/When/Then format for all scenarios
    - Use RFC 2119 keywords (SHALL, MUST, SHOULD, MAY)
    - Organize by affected service domain
    - Reference OpenAPI spec paths where applicable
  decisions:
    - Use MADR format (Status, Date, Context, Decision Drivers, Options, Outcome, Consequences)
    - Require at minimum 2 genuinely considered options
    - Tie decision outcome to decision drivers
    - Include Positive, Negative, and Neutral consequences
  impacts:
    - Create one impact file per affected service
    - Focus on WHAT changes (API contracts, data models, integration points)
    - Do NOT include implementation code or deployment steps
  risks:
    - Assess ISO 25010 quality attributes (reliability, maintainability, compatibility at minimum)
    - Include mitigation strategies for each risk

3.5 Verify AI Tool Integration¶

Check that openspec init generated compatible skill/instruction files for GitHub Copilot:

ls -la .github/copilot/skills/openspec-*/ 2>/dev/null || echo "Check OpenSpec docs for Copilot integration path"

Verify no conflicts with existing copilot-instructions.md.

4. Phase 2: Parallel Execution¶

4.1 Select Test Ticket¶

Choose an NTK ticket that:

Touches at least 2 services (cross-domain preferred)
Requires API contract changes
Has clear behavioral requirements
Has not been started yet

Check current candidates:

python3 scripts/ticket-client.py --list --status "New"

4.2 Execute: Traditional Workflow¶

Run the ticket through the existing solution design workflow as documented in copilot-instructions.md:

Create branch solution/NTK-XXXXX-slug
Create folder structure under architecture/solutions/_NTK-XXXXX-slug/
Execute mock tools (JIRA, Elastic, GitLab)
Create requirements, analysis, decisions, impacts, risks, user stories
Update capability-changelog.yaml
Run portal generators

Record:

Number of AI prompts required
Quality of AI output (did the agent follow the workflow correctly?)
Artifact completeness (did all required sections get populated?)
Number of corrections needed

4.3 Execute: OpenSpec Workflow¶

On a separate branch, run the SAME ticket through the OpenSpec workflow:

Create branch solution/NTK-XXXXX-slug-openspec
Run /opsx:propose NTK-XXXXX-description
Let OpenSpec create artifacts according to the custom schema
Use /opsx:continue or /opsx:ff to build out all artifacts
Compare the generated artifacts against the traditional workflow output

Record:

Number of AI prompts required
Quality of AI output (did the agent produce architecture-quality artifacts?)
Artifact completeness (did the custom schema capture all necessary information?)
Delta spec quality (was the behavioral diff clear and useful?)
Number of corrections needed

5. Phase 3: Evaluation¶

5.1 Comparison Criteria¶

Criterion	Traditional Workflow	OpenSpec Workflow
AI guidance quality	Did the agent follow instructions correctly?	Did slash commands produce better-structured output?
Artifact completeness	Were all required artifacts populated?	Did the custom schema capture all architecture needs?
Behavioral spec quality	How well were requirements captured narratively?	Did Given/When/Then scenarios improve clarity?
Change reviewability	How easy is the solution to review?	Do delta specs improve the review experience?
Workflow enforcement	Did the agent follow the folder structure?	Did the schema enforce artifact ordering?
Tool portability	Copilot-specific instructions	Would this work with other AI tools?
Overhead	Familiar workflow, no additional tooling	Was the OpenSpec overhead justified?
Missing artifacts	N/A (purpose-built for architecture)	What architecture artifacts were lost or gained?

5.2 Decision Gate¶

Based on the evaluation, decide:

Decision	Condition
Adopt	OpenSpec adds measurable value across 4+ criteria AND custom schema accommodates all NovaTrek artifact needs
Defer	OpenSpec shows promise but custom schema needs work or critical features are missing — revisit on next major release
Reject	OpenSpec does not add sufficient value over the existing workflow or creates confusion about artifact locations

6. Phase 4: Integration (Conditional)¶

Only execute this phase if the PoC decision gate result is Adopt.

6.1 Workflow Integration¶

Ticket received (NTK-XXXXX)
    │
    ▼
/opsx:propose (creates change with NovaTrek schema)
    │
    ├── requirements.md ──── Sources from mock JIRA
    ├── specs/ ──────────── Behavioral contracts (Given/When/Then)
    ├── decisions.md ────── MADR format (from custom template)
    ├── impacts/ ────────── Per-service impact assessments
    ├── risks.md ────────── ISO 25010 quality analysis
    └── tasks.md ────────── Implementation checklist
    │
    ▼
/opsx:archive (merge behavioral deltas into main specs)
    │
    ▼
Update capability-changelog.yaml
Run portal generators (bash portal/scripts/generate-all.sh)

6.2 File System Layout¶

openspec/
├── specs/                           # Behavioral source of truth
│   ├── operations/spec.md           # svc-check-in, svc-scheduling-orchestrator
│   ├── guest-identity/spec.md       # svc-guest-profiles
│   ├── booking/spec.md              # svc-reservations
│   ├── product-catalog/spec.md      # svc-trip-catalog, svc-trail-management
│   ├── safety/spec.md               # svc-safety-compliance
│   ├── logistics/spec.md            # svc-transport-logistics, svc-gear-inventory
│   ├── guide-management/spec.md     # svc-guide-management
│   ├── external/spec.md             # svc-partner-integrations
│   └── support/spec.md              # 8 support services
├── changes/                         # Active work
│   └── NTK-XXXXX-slug/
│       ├── requirements.md
│       ├── analysis.md
│       ├── decisions.md
│       ├── impacts/
│       ├── risks.md
│       ├── tasks.md
│       └── specs/                   # Delta specs (ADDED/MODIFIED/REMOVED)
└── config.yaml

architecture/
├── metadata/                        # Metadata YAML (unchanged)
├── specs/                           # OpenAPI specs (unchanged)
├── events/                          # AsyncAPI specs (unchanged)
└── solutions/                       # Legacy solutions (preserved, read-only)

6.3 Source-of-Truth Boundaries¶

Concern	Source of Truth	Location
API contracts	OpenAPI specs	`architecture/specs/`
Event schemas	AsyncAPI specs	`architecture/events/`
Behavioral requirements	OpenSpec specs	`openspec/specs/`
Architecture decisions	MADR ADRs	`decisions/` (global)
Capability model	Capability YAML	`architecture/metadata/capabilities.yaml`
Capability changes	Capability changelog	`architecture/metadata/capability-changelog.yaml`
Service metadata	Metadata YAML	`architecture/metadata/`
Active solution work	OpenSpec changes	`openspec/changes/`

6.4 CI Updates¶

Add OpenSpec validation to the CI pipeline:

- name: Validate OpenSpec specs
  run: |
    npx openspec validate || true  # Non-blocking during pilot

7. Risks and Mitigations¶

Risk	Likelihood	Impact	Mitigation
Custom schema cannot accommodate all NovaTrek artifacts	Medium	High	Test in PoC; fall back to existing workflow
OpenSpec updates break custom schema	Medium	Medium	Pin version; test updates in CI
Fission AI abandons OpenSpec	Low	High	MIT license allows forking; evaluate governance trajectory
Behavioral specs drift from OpenAPI contracts	Medium	Medium	CI check comparing specs against OpenAPI endpoints
Confusion about artifact locations (solutions/ vs changes/)	Medium	Medium	Clear source-of-truth table; update copilot-instructions.md
npm dependency in architecture workspace	Low	Low	Isolate as dev dependency

8. Success Criteria¶

The PoC is successful if ALL of the following are true:

Custom schema produces all artifacts the traditional workflow produces, with equivalent or better quality
AI agent output quality improves measurably (fewer corrections, better artifact structure)
Delta specs provide review-time value that narrative analysis does not
The overhead (learning curve, additional files, npm dependency) is justified by the workflow improvement

The PoC fails if ANY of the following are true:

Custom schema cannot accommodate impacts, capabilities, or MADR decisions
AI agent quality improvement is marginal
Delta specs duplicate information already in capability-changelog.yaml without adding value
The two-system overhead creates confusion about where artifacts live

9. Execution Sequence¶

Step	Action	Dependency
1	Run deep research prompt	None
2	Review deep research results	Step 1
3	Confirm PoC approach	Step 2
4	Install OpenSpec and create custom schema	Step 3
5	Select test ticket	Step 3
6	Execute parallel workflows	Steps 4, 5
7	Evaluate results against criteria	Step 6
8	Decision gate: Adopt / Defer / Reject	Step 7
9	(Conditional) Integration implementation	Step 8 = Adopt