v6: Nanochat Portfolio Manager

✅ COMPLETE Architecture Insights

Experiment Period: November 2025 | Model: d20 (561M parameters) | Training: 1,500 steps

Custom LLM trained on SEC filing events to generate portfolio decisions. Success: Model generates valid decisions with reasoning. Discovery: LLM should generate signals (conviction), not portfolio decisions (dollar amounts).

Executive Summary

                ✅ What Worked
                Training Infrastructure: Successfully trained 561M parameter model on 1,000 examples
Model Quality: Generates valid JSON decisions with coherent reasoning
Fast Backtesting: 10x speedup via pre-computed contexts (59 contexts/sec)
Event Analysis: LLM effectively analyzes SEC filings and identifies catalysts

            

                ❌ What Didn't Work
                Position Sizing: 100% override rate - all positions capped from $150K+ to $50K
Hindsight Bias: Model trained on outcomes, suggests unrealistic positions
Architecture Mismatch: LLM making portfolio math decisions it can't properly learn

            

💡 Key Insight

The LLM should generate SIGNALS (conviction 0.0-1.0), not PORTFOLIO DECISIONS (dollar amounts).

Separate concerns: LLM for pattern recognition and conviction scoring, deterministic code for portfolio construction and risk management.

Training Results

Model Architecture

561M

Parameters (d20, 20 layers)

Training Data

1,000

Examples with BUY/SELL/SKIP

Final Validation Loss

0.096

96.5% improvement from start

Training Time

~4hrs

On H100 GPU

Sample Model Output

{
  "decision": "BUY",
  "position_size": 153000,
  "conviction": 0.95,
  "reasoning": "Strong acquisition catalyst (IDF 15.2)...",
  "risk_factors": ["Integration risk", "Market volatility"],
  "catalysts": ["announced_acquisition_major", "dividend_increase"],
  "hold_duration_days": 75
}

Problem: Position size of $153K is unrealistic (15% of portfolio!). Model learned from hindsight, not risk management.

The Position Sizing Problem

Evidence from Backtesting

Ticker	Model Suggested	Actually Used	Override
VTGN	$115,619	$50,000	57% cut
GNL	$200,000	$50,000	75% cut
TSLA	$200,000	$50,000	75% cut
IMUX	$153,000	$50,000	67% cut
NBEV	$200,000	$50,000	75% cut

Override Rate

100%

Every position capped (13/13 trades)

Average Model Size

$150K

15% of portfolio per trade!

Average Actual Size

$50K

5% cap enforced by code

Why This Happens

Training on hindsight outcomes:

Model sees: "Stock returned +47% at 90 days"
Model concludes: "Should have bought $200K!"
Problem: Can't know the future, no risk management understanding

Technical Infrastructure Achievements

1. Fast Backtesting System (10x Speedup)

Innovation: Pre-compute all contexts once, then run inference separately.

Old Approach

4+ sec

Per filing (DB queries)

New Approach

59/sec

Context generation speed

Step 1: Generate contexts (once)
  - Query DB for events, portfolio state
  - Save to JSONL: {filing, past_events, portfolio_state}
  - Speed: 1,745 filings in ~30 seconds

Step 2: Run backtest (fast)
  - Load contexts from JSONL
  - Model inference only (~0.5-1 sec/filing)
  - Execute trades based on decisions

Result: Week-long backtest in ~15 minutes (vs 3+ hours)

2. Comprehensive Decision Logging

Tracks ALL decisions (BUY, SELL, SKIP) with execution status and rejection reasons:

Model reasoning and conviction scores
Execution status (true/false)
Rejection reasons (sector_limit, insufficient_cash, etc.)
Portfolio state at decision time

3. Nanochat Training Integration

Custom tokenizer with portfolio vocabulary
Properly configured training loop
Checkpoint management (save every 100 steps)
Validation loss tracking
Training data generation pipeline

V7 Architecture: The Right Separation

SIGNAL GENERATOR (LLM)

Input: Filing + Events + Market Context

Output:

{
  "decision": "BUY" | "SELL" | "SKIP",
  "conviction": 0.0 - 1.0,  // How confident
  "hold_duration_days": 30-90,
  "reasoning": "...",
  "risk_factors": [...],
  "catalysts": [...]
}

What it's GOOD at: Text analysis, pattern recognition, expressing confidence

What it DOESN'T do: No position_size, no portfolio math

↓

PORTFOLIO MANAGER (Code)

Input: Signal + Portfolio State

Logic:

base = conviction * max_allocation * portfolio_value

position_size = min(
    base,
    portfolio_value * 0.05,  // 5% cap
    available_cash,
    sector_budget(ticker)
)

What it's GOOD at: Math with constraints, risk management, portfolio optimization

                Key Changes for V7
                Remove position_size from training: Train on BUY/SELL/SKIP + conviction only
Calibrate conviction from outcomes: 0.9 for >20% return, 0.7 for >10%, 0.5 for >0%, 0.0 for losses
Deterministic sizing formula: size = conviction × base_allocation × total_value
Code enforces constraints: 5% position cap, sector limits, cash available

            

Lessons Learned

1. Architecture Matters

Don't make the LLM solve problems it's not suited for. Use it for pattern recognition, not portfolio math.

2. Fast Iteration

Pre-computed contexts enable rapid experimentation. Generate once, iterate on inference fast.

3. Enforce Constraints

Training the model to respect constraints doesn't work. Enforce them in deterministic code.

What LLMs Are Good At vs. Bad At

Good At ✅	Bad At ❌
Analyzing text (filings, events)	Dollar-amount calculations with constraints
Identifying patterns and catalysts	Portfolio-wide optimization
Expressing conviction levels	Risk management math
Generating reasoning and explanations	Multi-asset allocation decisions

Conclusion

V6 Proved the Concept

✅ LLM can learn from filing events
✅ Built fast backtesting infrastructure
✅ Created comprehensive decision logging
✅ Identified architecture limitations clearly

V6 Revealed the Flaw

❌ Position sizing fundamentally broken
❌ Architecture mismatch (LLM doing portfolio math)
❌ Can't enforce constraints through training

The Path Forward

V7 will implement signal generation + deterministic portfolio management

This enables:

Better signal quality evaluation (independent of sizing)
Parallel signal generation (faster backtesting)
Deterministic portfolio construction (safer)
Independent optimization of each component

Final Status: Experiment complete. Architecture redesign required for V7.

Model: d20 checkpoint 1499 (val_loss 0.096)

Date: November 7, 2025