← Back to Experiments

v6: Nanochat Portfolio Manager

✅ COMPLETE Architecture Insights
Experiment Period: November 2025 | Model: d20 (561M parameters) | Training: 1,500 steps

Custom LLM trained on SEC filing events to generate portfolio decisions. Success: Model generates valid decisions with reasoning. Discovery: LLM should generate signals (conviction), not portfolio decisions (dollar amounts).

Executive Summary

✅ What Worked

  • Training Infrastructure: Successfully trained 561M parameter model on 1,000 examples
  • Model Quality: Generates valid JSON decisions with coherent reasoning
  • Fast Backtesting: 10x speedup via pre-computed contexts (59 contexts/sec)
  • Event Analysis: LLM effectively analyzes SEC filings and identifies catalysts

❌ What Didn't Work

  • Position Sizing: 100% override rate - all positions capped from $150K+ to $50K
  • Hindsight Bias: Model trained on outcomes, suggests unrealistic positions
  • Architecture Mismatch: LLM making portfolio math decisions it can't properly learn

💡 Key Insight

The LLM should generate SIGNALS (conviction 0.0-1.0), not PORTFOLIO DECISIONS (dollar amounts).

Separate concerns: LLM for pattern recognition and conviction scoring, deterministic code for portfolio construction and risk management.

Training Results

Model Architecture

561M
Parameters (d20, 20 layers)

Training Data

1,000
Examples with BUY/SELL/SKIP

Final Validation Loss

0.096
96.5% improvement from start

Training Time

~4hrs
On H100 GPU

Sample Model Output

{ "decision": "BUY", "position_size": 153000, "conviction": 0.95, "reasoning": "Strong acquisition catalyst (IDF 15.2)...", "risk_factors": ["Integration risk", "Market volatility"], "catalysts": ["announced_acquisition_major", "dividend_increase"], "hold_duration_days": 75 }

Problem: Position size of $153K is unrealistic (15% of portfolio!). Model learned from hindsight, not risk management.

The Position Sizing Problem

Evidence from Backtesting

Ticker Model Suggested Actually Used Override
VTGN $115,619 $50,000 57% cut
GNL $200,000 $50,000 75% cut
TSLA $200,000 $50,000 75% cut
IMUX $153,000 $50,000 67% cut
NBEV $200,000 $50,000 75% cut

Override Rate

100%
Every position capped (13/13 trades)

Average Model Size

$150K
15% of portfolio per trade!

Average Actual Size

$50K
5% cap enforced by code

Why This Happens

Training on hindsight outcomes:

Technical Infrastructure Achievements

1. Fast Backtesting System (10x Speedup)

Innovation: Pre-compute all contexts once, then run inference separately.

Old Approach

4+ sec
Per filing (DB queries)

New Approach

59/sec
Context generation speed
Step 1: Generate contexts (once) - Query DB for events, portfolio state - Save to JSONL: {filing, past_events, portfolio_state} - Speed: 1,745 filings in ~30 seconds Step 2: Run backtest (fast) - Load contexts from JSONL - Model inference only (~0.5-1 sec/filing) - Execute trades based on decisions Result: Week-long backtest in ~15 minutes (vs 3+ hours)

2. Comprehensive Decision Logging

Tracks ALL decisions (BUY, SELL, SKIP) with execution status and rejection reasons:

3. Nanochat Training Integration

V7 Architecture: The Right Separation

SIGNAL GENERATOR (LLM)

Input: Filing + Events + Market Context

Output:

{ "decision": "BUY" | "SELL" | "SKIP", "conviction": 0.0 - 1.0, // How confident "hold_duration_days": 30-90, "reasoning": "...", "risk_factors": [...], "catalysts": [...] }

What it's GOOD at: Text analysis, pattern recognition, expressing confidence

What it DOESN'T do: No position_size, no portfolio math

PORTFOLIO MANAGER (Code)

Input: Signal + Portfolio State

Logic:

base = conviction * max_allocation * portfolio_value position_size = min( base, portfolio_value * 0.05, // 5% cap available_cash, sector_budget(ticker) )

What it's GOOD at: Math with constraints, risk management, portfolio optimization

Key Changes for V7

  1. Remove position_size from training: Train on BUY/SELL/SKIP + conviction only
  2. Calibrate conviction from outcomes: 0.9 for >20% return, 0.7 for >10%, 0.5 for >0%, 0.0 for losses
  3. Deterministic sizing formula: size = conviction × base_allocation × total_value
  4. Code enforces constraints: 5% position cap, sector limits, cash available

Lessons Learned

1. Architecture Matters

Don't make the LLM solve problems it's not suited for. Use it for pattern recognition, not portfolio math.

2. Fast Iteration

Pre-computed contexts enable rapid experimentation. Generate once, iterate on inference fast.

3. Enforce Constraints

Training the model to respect constraints doesn't work. Enforce them in deterministic code.

What LLMs Are Good At vs. Bad At

Good At ✅ Bad At ❌
Analyzing text (filings, events) Dollar-amount calculations with constraints
Identifying patterns and catalysts Portfolio-wide optimization
Expressing conviction levels Risk management math
Generating reasoning and explanations Multi-asset allocation decisions

Conclusion

V6 Proved the Concept

✅ LLM can learn from filing events
✅ Built fast backtesting infrastructure
✅ Created comprehensive decision logging
✅ Identified architecture limitations clearly

V6 Revealed the Flaw

❌ Position sizing fundamentally broken
❌ Architecture mismatch (LLM doing portfolio math)
❌ Can't enforce constraints through training

The Path Forward

V7 will implement signal generation + deterministic portfolio management

This enables:

  • Better signal quality evaluation (independent of sizing)
  • Parallel signal generation (faster backtesting)
  • Deterministic portfolio construction (safer)
  • Independent optimization of each component

Final Status: Experiment complete. Architecture redesign required for V7.

Model: d20 checkpoint 1499 (val_loss 0.096)

Date: November 7, 2025