← Back to Experiments
Executive Summary
✅ What Worked
- Training Infrastructure: Successfully trained 561M parameter model on 1,000 examples
- Model Quality: Generates valid JSON decisions with coherent reasoning
- Fast Backtesting: 10x speedup via pre-computed contexts (59 contexts/sec)
- Event Analysis: LLM effectively analyzes SEC filings and identifies catalysts
❌ What Didn't Work
- Position Sizing: 100% override rate - all positions capped from $150K+ to $50K
- Hindsight Bias: Model trained on outcomes, suggests unrealistic positions
- Architecture Mismatch: LLM making portfolio math decisions it can't properly learn
💡 Key Insight
The LLM should generate SIGNALS (conviction 0.0-1.0), not PORTFOLIO DECISIONS (dollar amounts).
Separate concerns: LLM for pattern recognition and conviction scoring, deterministic code for portfolio construction and risk management.
Training Results
Model Architecture
561M
Parameters (d20, 20 layers)
Training Data
1,000
Examples with BUY/SELL/SKIP
Final Validation Loss
0.096
96.5% improvement from start
Training Time
~4hrs
On H100 GPU
Sample Model Output
{
"decision": "BUY",
"position_size": 153000,
"conviction": 0.95,
"reasoning": "Strong acquisition catalyst (IDF 15.2)...",
"risk_factors": ["Integration risk", "Market volatility"],
"catalysts": ["announced_acquisition_major", "dividend_increase"],
"hold_duration_days": 75
}
Problem: Position size of $153K is unrealistic (15% of portfolio!). Model learned from hindsight, not risk management.
The Position Sizing Problem
Evidence from Backtesting
| Ticker |
Model Suggested |
Actually Used |
Override |
| VTGN |
$115,619 |
$50,000 |
57% cut |
| GNL |
$200,000 |
$50,000 |
75% cut |
| TSLA |
$200,000 |
$50,000 |
75% cut |
| IMUX |
$153,000 |
$50,000 |
67% cut |
| NBEV |
$200,000 |
$50,000 |
75% cut |
Override Rate
100%
Every position capped (13/13 trades)
Average Model Size
$150K
15% of portfolio per trade!
Average Actual Size
$50K
5% cap enforced by code
Why This Happens
Training on hindsight outcomes:
- Model sees: "Stock returned +47% at 90 days"
- Model concludes: "Should have bought $200K!"
- Problem: Can't know the future, no risk management understanding
Technical Infrastructure Achievements
1. Fast Backtesting System (10x Speedup)
Innovation: Pre-compute all contexts once, then run inference separately.
Old Approach
4+ sec
Per filing (DB queries)
New Approach
59/sec
Context generation speed
Step 1: Generate contexts (once)
- Query DB for events, portfolio state
- Save to JSONL: {filing, past_events, portfolio_state}
- Speed: 1,745 filings in ~30 seconds
Step 2: Run backtest (fast)
- Load contexts from JSONL
- Model inference only (~0.5-1 sec/filing)
- Execute trades based on decisions
Result: Week-long backtest in ~15 minutes (vs 3+ hours)
2. Comprehensive Decision Logging
Tracks ALL decisions (BUY, SELL, SKIP) with execution status and rejection reasons:
- Model reasoning and conviction scores
- Execution status (true/false)
- Rejection reasons (sector_limit, insufficient_cash, etc.)
- Portfolio state at decision time
3. Nanochat Training Integration
- Custom tokenizer with portfolio vocabulary
- Properly configured training loop
- Checkpoint management (save every 100 steps)
- Validation loss tracking
- Training data generation pipeline
V7 Architecture: The Right Separation
SIGNAL GENERATOR (LLM)
Input: Filing + Events + Market Context
Output:
{
"decision": "BUY" | "SELL" | "SKIP",
"conviction": 0.0 - 1.0, // How confident
"hold_duration_days": 30-90,
"reasoning": "...",
"risk_factors": [...],
"catalysts": [...]
}
What it's GOOD at: Text analysis, pattern recognition, expressing confidence
What it DOESN'T do: No position_size, no portfolio math
↓
PORTFOLIO MANAGER (Code)
Input: Signal + Portfolio State
Logic:
base = conviction * max_allocation * portfolio_value
position_size = min(
base,
portfolio_value * 0.05, // 5% cap
available_cash,
sector_budget(ticker)
)
What it's GOOD at: Math with constraints, risk management, portfolio optimization
Key Changes for V7
- Remove position_size from training: Train on BUY/SELL/SKIP + conviction only
- Calibrate conviction from outcomes: 0.9 for >20% return, 0.7 for >10%, 0.5 for >0%, 0.0 for losses
- Deterministic sizing formula: size = conviction × base_allocation × total_value
- Code enforces constraints: 5% position cap, sector limits, cash available
Lessons Learned
1. Architecture Matters
Don't make the LLM solve problems it's not suited for. Use it for pattern recognition, not portfolio math.
2. Fast Iteration
Pre-computed contexts enable rapid experimentation. Generate once, iterate on inference fast.
3. Enforce Constraints
Training the model to respect constraints doesn't work. Enforce them in deterministic code.
What LLMs Are Good At vs. Bad At
| Good At ✅ |
Bad At ❌ |
| Analyzing text (filings, events) |
Dollar-amount calculations with constraints |
| Identifying patterns and catalysts |
Portfolio-wide optimization |
| Expressing conviction levels |
Risk management math |
| Generating reasoning and explanations |
Multi-asset allocation decisions |
Conclusion
V6 Proved the Concept
✅ LLM can learn from filing events
✅ Built fast backtesting infrastructure
✅ Created comprehensive decision logging
✅ Identified architecture limitations clearly
V6 Revealed the Flaw
❌ Position sizing fundamentally broken
❌ Architecture mismatch (LLM doing portfolio math)
❌ Can't enforce constraints through training
The Path Forward
V7 will implement signal generation + deterministic portfolio management
This enables:
- Better signal quality evaluation (independent of sizing)
- Parallel signal generation (faster backtesting)
- Deterministic portfolio construction (safer)
- Independent optimization of each component
Final Status: Experiment complete. Architecture redesign required for V7.
Model: d20 checkpoint 1499 (val_loss 0.096)
Date: November 7, 2025