Scenario 01: New Ticket Triage and Classification¶
Scenario Overview¶
| Property | Value |
|---|---|
| Scenario ID | SC-01 |
| Task | New Ticket Triage and Classification |
| Estimated Monthly Frequency | 10 tickets/month |
| Complexity | Low |
| Duration Target | 15-30 minutes |
| Skills Tested | Tool usage, folder creation, data extraction, classification |
Pre-conditions¶
- Workspace open with all folders visible
- No existing work-item folder for NTK-10005
- Mock JIRA client script available at
scripts/mock-jira-client.py - Virtual environment activated
Exact Prompt to Use¶
"I need to work on a new ticket. Query our JIRA board for new tickets, find NTK-10005 (Wristband RFID Field), and create the initial workspace with ticket report and simple explanation. Classify whether this is architecturally relevant or a code-level task."
Expected AI Actions¶
- Run
python scripts/mock-jira-client.py --list --status "New"to list available tickets - Run
python scripts/mock-jira-client.py --ticket NTK-10005to extract full ticket details - Create folder structure:
work-items/tickets/_NTK-10005-wristband-rfid-field/ ├── NTK-10005-solution-design.md ├── 1.requirements/ │ └── NTK-10005.ticket.report.md ├── 2.analysis/ │ └── simple.explanation.md └── 3.solution/ ├── a.assumptions/assumptions.md ├── c.current.state/investigations.md ├── d.decisions/decisions.md ├── g.guidance/guidance.md ├── i.impacts/impacts.md └── s.user.stories/user-stories.md - Generate ticket report from JIRA mock data (not fabricated)
- Write simple explanation suitable for non-technical stakeholders
- Classify the ticket as architecturally relevant or code-level task with reasoning
What to Watch For¶
- Does the AI run the mock scripts or fabricate data?
- Does the folder name start with underscore and use kebab-case?
- Does the ticket report faithfully reflect mock data?
- Is classification reasoning grounded in the ticket description?
- Does the simple explanation avoid technical jargon?
Quality Rubric¶
Score each criterion 1-5:
- [ ] Tool Usage: Correctly used mock tool scripts (not guessed data)
- [ ] Folder Structure: Created proper folder structure (underscore prefix, kebab-case)
- [ ] Data Accuracy: Ticket report matches mock data accurately
- [ ] Classification: Classification reasoning is sound and justified
- [ ] Clarity: Simple explanation is clear and non-technical friendly
Maximum Score: 25
Token Cost Tracking¶
| Metric | Roo+Kong | Copilot |
|---|---|---|
| Input tokens | ||
| Output tokens | ||
| Total tokens | ||
| Tool calls count | ||
| Time to complete (min) | ||
| Quality score (/25) | ||
| Cost per run ($) | ||
| Estimated monthly cost ($) |
Notes¶
- This is the simplest scenario and serves as a baseline
- If an AI tool cannot handle this scenario well, more complex scenarios are unlikely to succeed
- Pay attention to whether the AI follows the naming conventions without being reminded