-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Benchmark FunctionGemma vs gpt-4o-mini for tool calling in IntentClassifier.
Scope
Testing the structured function dispatch - NOT reasoning. This is the resolve_ticker_with_llm() function that outputs a fixed schema:
{"company_name": "Apple Inc.", "ticker": "AAPL", "found": true}Current code: src/cli/utils/intent_classifier.py:162-249
Test Cases (Tool Dispatch Only)
# Simple ticker extraction - structured output
test_inputs = [
("buy AAPL", "AAPL"), # Direct ticker
("sell SPY at 600", "SPY"), # Ticker with price
("check NVDA", "NVDA"), # Direct ticker
("buy 10 TSLA", "TSLA"), # Ticker with quantity
]
# Company resolution - requires some knowledge
company_inputs = [
("buy apple", "AAPL"), # Common
("sell microsoft", "MSFT"), # Common
("check nvidia", "NVDA"), # Common
]Metrics to Capture
| Metric | gpt-4o-mini | FunctionGemma |
|---|---|---|
| Direct ticker accuracy | ? | ? |
| Company resolution accuracy | ? | ? |
| Avg latency (ms) | ? | ? |
| Schema compliance (valid JSON) | ? | ? |
Success Criteria
- ≥98% accuracy on direct ticker extraction
- ≥90% accuracy on common company names (top 20 stocks)
- <100ms average latency
Deliverables
- Benchmark script
tests/benchmarks/intent_classifier_benchmark.py - Results documented in
docs/08_research/ - Go/no-go recommendation
Dependencies
- spike: install Ollama + FunctionGemma local inference stack #533 (Ollama infrastructure)
- feat: add LLM backend abstraction layer (OpenAI/Ollama toggle) #534 (LLM backend abstraction)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request