🎯 Phase 1: Events Only - Ready to Build

November 6, 2025

nanochat Portfolio Manager: Custom LLM for Trading Decisions

Train a custom 200M parameter LLM specifically for portfolio management decisions. Natural language reasoning for every trade. Fast CPU inference, zero API costs, fully explainable. Phase 1: Start with transformer_v2 events only. Target: beat v1 Q-learning (+11.20%) with interpretable intelligence.

The Core Innovation

Train a custom LLM specifically for portfolio management decisions.

Instead of:

❌ Q-learning (learns action values → black box)
❌ Fixed rules (rigid thresholds → can't adapt)
❌ API LLMs (expensive $945/90k decisions, slow 1-3 sec, generic)

We train:

✅ Custom nanochat LLM on portfolio decisions
✅ Input: ALL signals (transformer, SEC events, insider, trends, portfolio state)
✅ Output: BUY/SKIP/SELL + position size + natural language reasoning
✅ Training: Hindsight labels from historical data
✅ Inference: Fast CPU (<500ms), zero API costs

3-4

Weeks to Production

$20

Total Training Cost

Training Examples (Phase 1)

+15%

Phase 1 Target Returns

Baby Steps: Phase 1 Strategy

Start with Events Only - Validate Core Architecture First

Key Decision: Instead of building everything at once, we're starting with transformer_v2 predictions only (multi-horizon returns + future events). This validates the core architecture before adding complexity.

Why this approach?

Faster iteration: 3-4 weeks vs 6-8 weeks for full system
Clear baseline: Know what events alone achieve before adding more signals
Fewer dependencies: Just transformer_v2, easier debugging
Economic validation: Prove the $20 training cost works
Technical validation: Verify real-time inference, explainability, constraints

Phase 1 Success = Build Phase 2

If Phase 1 hits +15% returns, we add agreements + insider trading (Phase 2, target +18-22%). If Phase 2 succeeds, add financial data (Phase 3, target +25%). But first: show events working!

Perfect Alignment: transformer_v2 Integration

Discovery: transformer_v2 Outputs Exactly What We Need!

transformer_v2 (multi-task, multi-horizon) predicts:

✅ Return predictions at 4 horizons (1d, 1w, 1m, 3m) with confidence scores
✅ Future event probabilities (distress, growth, operational - 24 event types)
✅ Risk metrics per horizon (volatility, Sharpe ratio, max drawdown)
✅ Meta-predictions (best horizon recommendation, horizon confidence)

100% schema match + bonuses! No format conversion needed. Portfolio manager gets rich signals:

# transformer_v2 output (simplified)
{
    "best_horizon": "1m",
    "horizons": {
        "1m": {
            "return_class": "up_a_little",
            "confidence": 0.58,
            "volatility": 18.2,
            "sharpe": 2.34,
            "max_drawdown": -2.83
        }
    },
    "future_event_labels": {
        "distress": {"investigation": 0.18, "dismissed_auditor": 0.05},
        "growth": {"merger": 0.35, "expansion": 0.18, "upgraded": 0.25},
        "operational": {"workforce_reduction": 0.12}
    }
}

# Portfolio manager learns:
"Buy because 1m prediction 'up_a_little' (58% confidence),
 merger probability 35% + expansion 18% = growth catalysts,
 investigation risk 18% is manageable, volatility moderate."
            

Why This is Genius (Not Crazy)

✅

We Already Have Everything

nanochat infrastructure training 1M SEC events on vortex. d20 (561M params) trains in 12 hours on 2x 3090. 455k SEC events, historical returns, stock prices all ready.

💰

The Economics

Training cost: ~$10 one-time (12-20 hours electricity). Inference cost: $0 forever (CPU). vs Claude API: $945 per 90k decisions. ROI: Infinite.

🧠

Explainable Intelligence

Natural language reasoning: "Buy because buyback + merger" vs "Q=0.82" black box. Can understand WHY it made each decision. Auditable for compliance.

🎯

Custom to Our Strategy

Not generic GPT-4 finance knowledge. Trained on YOUR data, YOUR patterns, YOUR constraints. Learns what works for us specifically.

The Breakthrough: Portfolio State Simulation

Key Innovation: Generate Millions of Examples from Historical Data

Instead of just 455k examples (one per filing), we simulate 10 different portfolio states per filing:

Empty portfolio - Fresh start, $1M cash
Healthy portfolio - 12 positions, +8% performance, diversified
Overconcentrated - 42% tech (violates 40% sector limit)
In drawdown - -18% drawdown, crisis mode
Portfolio full - 20/20 positions (at capacity)
Low cash - Only 8% cash (below 10% minimum)
Already holding - Already own this ticker
Hot streak - +22% last 30 days (5 wins in a row)
Cold streak - -12% last 30 days (recent losses)
Bear market - VIX 35, defensive positions

Result: 455k filings × 10 scenarios = 4.5M training examples!

Same filing gets DIFFERENT decisions based on portfolio context. This teaches the model discipline and risk management.

Example: Context-Aware Decisions

AAPL strong buy signal + healthy portfolio
→ BUY $95,000

NVDA even stronger signal + overconcentrated tech
→ SKIP (discipline! respect sector limits)
        

Training Data Format

Input (Natural Language Prompt)

You are a portfolio manager analyzing an SEC filing for trading decisions.

COMPANY: AAPL
FILING: 10-Q filed 2024-03-15
CURRENT PRICE: $175.32

TRANSFORMER PREDICTIONS:
- Return forecast: up_a_lot (confidence: 0.72)

SEC EVENTS DETECTED:
- repurchased: $5B buyback authorized (confidence: 10)
- upgraded: Credit rating improved to AA+ (confidence: 9)
- expanded: Opening 3 new facilities (confidence: 8)

DISTRESS SIGNALS:
- investigation: SEC inquiry ongoing (probability: 0.18)

GROWTH SIGNALS:
- merger: Acquisition talks (probability: 0.35)

INSIDER ACTIVITY:
- Last 90 days: 3 buys ($2.3M), 0 sells
- CEO purchased 50K shares @ $168 (15 days ago)

STOCK TRENDS:
- 30-day return: +8.2%
- 90-day return: +15.7%

PORTFOLIO STATE:
- Cash available: $842,000
- Positions held: 12/20
- Recent performance: +4.2% (last 30 days)

MARKET CONTEXT:
- VIX: 18.5 (moderate volatility)

DECISION REQUIRED:
Should we BUY, SKIP, or SELL? If BUY, what position size?
        

Output (Trained with Hindsight)

{
  "decision": "BUY",
  "position_size": 85000,
  "conviction": 0.82,
  "hold_duration_days": 75,
  "reasoning": "Strong bullish case: (1) Transformer predicts up_a_lot with 72%
                confidence, (2) Major buyback ($5B) signals management confidence,
                (3) Credit upgrade to AA+ reduces cost of capital, (4) Manufacturing
                expansion shows growth investment, (5) Insider accumulation ($2.3M
                in 90 days) confirms conviction, (6) Technical momentum strong
                (+15.7% in 90 days), (7) Merger probability (35%) provides
                additional upside catalyst. Risk factors manageable: SEC
                investigation at 18% probability is elevated but company has
                clean history. Recommend above-average position size ($85K vs
                $50K base) given strong signal alignment across multiple factors.",
  "risk_factors": [
    "SEC investigation ongoing (18% probability)",
    "Tech sector concentration (already 3 tech positions)"
  ],
  "catalysts": [
    "Buyback execution ($5B authorization)",
    "Merger announcement (35% probability)"
  ],
  "exit_triggers": [
    "Investigation probability spikes above 30%",
    "Return drops below -8%",
    "Take profit if return exceeds +25%"
  ]
}
        

Advantages Over Alternatives

Feature	nanochat PM	v1 Q-Learning	Fixed Rules	Claude API
Interpretability	✅ Natural language reasoning	❌ Black box Q-values	✅ Clear rules	✅ Natural language
Cost	✅ $0 forever	✅ $0	✅ $0	❌ $945 per backtest
Speed	✅ <500ms CPU	✅ Instant	✅ Instant	❌ 1-3 sec API
Flexibility	✅ Learned patterns	⚠️ Limited context	❌ Rigid thresholds	✅ Very flexible
Context Awareness	✅ Full portfolio state	⚠️ State vector only	❌ Simple if/else	✅ Full context
Custom Training	✅ Trained on YOUR data	✅ Learns YOUR strategy	⚠️ Manual tuning	❌ Generic finance
Privacy	✅ On-prem	✅ On-prem	✅ On-prem	❌ Data to API

Phase 1 Timeline (Events Only)

✅

Week 0: Design & Planning (Complete)

Architecture design, portfolio state simulation strategy, transformer_v2 integration analysis. Documentation: PHASE_1_EVENTS_ONLY.md, ALIGNMENT_CONFIRMED.md, PIPELINE.txt, updated TODO.md.

⚡

Week 1: transformer_v2 Training (IMMEDIATE NEXT STEP)

Generate multi-horizon dataset (2-4 hours). Train Model A (6-8 hours GPU). Generate predictions on all 267k filings. Output: transformer_v2_predictions.jsonl with return forecasts, event probabilities, and risk metrics.

Week 2: Portfolio Manager Training Data

Load transformer_v2 predictions, generate 10 portfolio scenarios per filing, create ~2M training examples with hindsight labels, format as nanochat conversations. Estimated: 6-10 hours.

Week 3: Model Training

Train d12 (200M params) on vortex with 2x RTX 3090. Automated training for 12-20 hours. Cost: ~$10 electricity.

Week 4: Backtest & Evaluate

Implement NanochatPortfolioAgent, run historical backtests, compare vs v1 Q-learning (+11.20%) and always-buy (+3.65%). Target: +15% returns. If successful, plan Phase 2 (add agreements + insider trading).

Phase 1: 3-4 Weeks to Validate Core Architecture

Show that events alone can beat baselines. If Phase 1 hits +15%, we know the architecture works and can confidently add more signals (agreements, insider trading, financial data) in future phases.

Future Phases (Only if Phase 1 Succeeds)

🔄

Phase 2: Agreements + Insider

Add agreement features (credit terms, M&A structure) and insider trading signals (cluster buying, C-suite purchases). Target: +18-22% returns.

📊

Phase 3: Financial Data

Add XBRL parsing (revenue, margins, cash flow) and financial ratios (P/E, debt/equity, ROE). Target: +25% returns.

📰

Phase 4: Alternative Data

Add earnings call transcripts, news sentiment, and options flow. Target: +30%+ returns. Only pursue if Phase 3 delivers.

Phase 1 Success Metrics (Events Only)

Minimum Success (Proof of Concept)

✅ transformer_v2 accuracy >55%
✅ Portfolio manager trains successfully (no crashes)
✅ Generates valid JSON outputs (parseable)
✅ Positive returns in backtest (> 0%)

Good Success

✅ transformer_v2 accuracy >60%
✅ PM beats always-buy baseline (+3.65%)
✅ Explainable decisions (can understand reasoning)
✅ Reasonable trade count (<300)

Excellent Performance (PRIMARY GOAL for Phase 1)

✅ transformer_v2 accuracy >65%
✅ PM beats v1 Q-learning (+11.20%)
✅ +15% returns or higher
✅ Works across market regimes (bull/bear/crash)
✅ Low drawdowns (<15%)
✅ Proceed to Phase 2 (add more signals)

Amazing Success (Phase 1)

✅ transformer_v2 accuracy >70%
✅ PM achieves +20% returns with events alone
✅ Fast inference (<500ms CPU)
✅ Validated on out-of-sample data
✅ Clear path to Phase 2, 3, 4 improvements

Why This Could Be Production-Ready

🎯

Custom to Our Strategy

Not generic GPT-4 finance knowledge. Trained on OUR data, OUR patterns, OUR style. Learns what works for us specifically.

💰

Economically Viable

$10 training cost (one-time), $0 inference cost (forever). vs $1,500/month for Qwen H200, vs $945 per backtest for Claude API.

📖

Explainable & Debuggable

Natural language reasoning. Can understand every decision. Easy to debug failures. Auditable for compliance.

🔄

Iterative Improvement

Easy to retrain with new data. Fine-tune as markets change. Add new signals as available. Continuous learning from outcomes.

🛡️

Controlled & Private

We own the model. No API dependencies. No data leaves our infrastructure. Deploy anywhere (cloud, on-prem).

📊

Data-Rich Training

Phase 1: ~2M training examples from 267k filings via portfolio state simulation. 10 examples per parameter. Future phases can add more signals and examples.

The Vision

What success looks like:

# Load your custom portfolio manager
pm = NanochatPortfolioAgent(model='d12_portfolio')

# Analyze an opportunity
decision = pm.analyze_opportunity(
    ticker='AAPL',
    filing_date='2024-03-15',
    transformer_pred='up_a_lot',
    events=['repurchased', 'expanded'],
    insider='accumulation',
    portfolio_state=current_portfolio
)

# Get explainable decision
print(decision['decision'])      # "BUY"
print(decision['position_size'])  # 85000
print(decision['reasoning'])      # Full natural language explanation
        

            Running Your Hedge Fund
            Custom LLM trained on your proprietary data
Explains every decision in natural language
Runs on your hardware ($0 cost)
Learns from your investment style
Beats professional benchmarks
Respects YOUR risk limits and constraints

        

Key Technical Decisions

Model Size: d12 (200M params)

Why not d20 (561M)?

d12 is 200M params - good balance of accuracy and speed
Faster inference (important for backtesting and real-time)
Easier to quantize and deploy
Portfolio decisions aren't as complex as SEC extraction
Can upgrade to d20 if needed

Training Strategy: Hindsight Labels

Why hindsight vs Q-learning teacher vs rules?

Hindsight: Direct ground truth (we know what happened)
Can generate millions of examples
Covers all market conditions
No bias from existing strategies

Hybrid approach:

Primary: Hindsight labels (actual returns)
Secondary: v1 Q-learning decisions (proven strategy)
Tertiary: v5 rules (human expertise)
Consensus: When all agree → high confidence label

CPU Inference: Multi-threaded

Why CPU vs GPU?

Cost: $0 vs cloud GPU costs
Latency: <500ms acceptable for research and production
Deployment: Easier to scale CPUs
Accessibility: Works anywhere

Current Status

Week 0: Phase 1 Planning Complete ✅

✅ Complete architecture designed (docs/ARCHITECTURE.md - 31KB)
✅ Portfolio state simulation strategy (docs/TRAINING_DATA_STRATEGY.md - 11KB)
✅ Training data format specified
✅ transformer_v2 integration analyzed (ALIGNMENT_CONFIRMED.md - 15KB)
✅ Phase 1 strategy documented (PHASE_1_EVENTS_ONLY.md - 13KB)
✅ Complete pipeline visualization (PIPELINE.txt)
✅ Updated task list (TODO.md - 7KB)
✅ Directory structure created

Week 1: transformer_v2 Training (IMMEDIATE NEXT STEP) ⚡

Location: /home/kee/code/tester/temporal/transformer_v2

Tasks:

Navigate to transformer_v2 directory
Generate multi-horizon dataset (2-4 hours): python data/prepare_data_optimized.py --num-classes 5
Train Model A (6-8 hours GPU): python training/train.py --config configs/model_a_balanced.yaml
Generate predictions on all 267k filings
Save to: /data2/kee/transformer_v2_predictions.jsonl

Week 2-4: Portfolio Manager (Next!) 🎯

Once transformer_v2 completes:

Generate portfolio manager training data (~2M examples)
Train nanochat d12 on vortex (12-20 hours)
Backtest and compare vs baselines
If +15%: Plan Phase 2. If not: Iterate and improve.

Related Projects

transformer_v2 - Multi-task, multi-horizon predictor (ready to train this week)
v1: Transformer + Q-learning - Baseline at +11.20% returns (proven)
v5: Signal-based rules - Design phase (projected +12-18%)
nanochat SEC extraction - Training 1M events on vortex (infrastructure proven)

References & Documentation

Project location: /home/kee/code/tester/rl_trading/experiments/v6_nanochat_portfolio

Phase 1 documentation (NEW):

PHASE_1_EVENTS_ONLY.md - Events-only focused plan (13KB)
ALIGNMENT_CONFIRMED.md - transformer_v2 integration analysis (15KB)
PIPELINE.txt - Complete production pipeline visualization
TODO.md - Updated task list with phased approach (7KB)

Core documentation:

README.md - Project overview (16KB)
STATUS.md - Current state and design details (20KB)
QUICKSTART.md - Quick context for resuming work (3KB)
docs/ARCHITECTURE.md - Complete technical design (31KB)
docs/TRAINING_DATA_STRATEGY.md - Portfolio state simulation strategy (11KB)
docs/LLM_AGENT_ARCHITECTURE.md - LLM comparison analysis (28KB)

nanochat:

GitHub: https://github.com/karpathy/nanochat
Location: /home/kee/code/nanochat
Status: Training 1M SEC events on vortex

Databases:

Events: /data2/kee/events.db (11M SEC extractions)
Returns: /data2/kee/filing_returns_rl.db
Prices: /data2/kee/stock_cache.db

Status: 🎯 Phase 1 planning complete - Ready to build

Strategy: Baby steps - Events only first, then expand if successful

Next: Train transformer_v2 (Week 1)

Timeline: 3-4 weeks to validate core architecture

Confidence: 🔥🔥🔥 Perfect alignment between transformer_v2 and portfolio manager!

Strategic discussion between Kee Kimbrell and Claude Code
Part of the Kaleidoscope project journey
Updated: November 6, 2025