← Back to Ideas
🎯 Phase 1: Events Only - Ready to Build
November 6, 2025

nanochat Portfolio Manager: Custom LLM for Trading Decisions

Train a custom 200M parameter LLM specifically for portfolio management decisions. Natural language reasoning for every trade. Fast CPU inference, zero API costs, fully explainable. Phase 1: Start with transformer_v2 events only. Target: beat v1 Q-learning (+11.20%) with interpretable intelligence.

The Core Innovation

Train a custom LLM specifically for portfolio management decisions.

Instead of:

We train:

3-4
Weeks to Production
$20
Total Training Cost
2M
Training Examples (Phase 1)
+15%
Phase 1 Target Returns

Baby Steps: Phase 1 Strategy

Start with Events Only - Validate Core Architecture First

Key Decision: Instead of building everything at once, we're starting with transformer_v2 predictions only (multi-horizon returns + future events). This validates the core architecture before adding complexity.

Why this approach?

Phase 1 Success = Build Phase 2

If Phase 1 hits +15% returns, we add agreements + insider trading (Phase 2, target +18-22%). If Phase 2 succeeds, add financial data (Phase 3, target +25%). But first: show events working!

Perfect Alignment: transformer_v2 Integration

Discovery: transformer_v2 Outputs Exactly What We Need!

transformer_v2 (multi-task, multi-horizon) predicts:

100% schema match + bonuses! No format conversion needed. Portfolio manager gets rich signals:

# transformer_v2 output (simplified) { "best_horizon": "1m", "horizons": { "1m": { "return_class": "up_a_little", "confidence": 0.58, "volatility": 18.2, "sharpe": 2.34, "max_drawdown": -2.83 } }, "future_event_labels": { "distress": {"investigation": 0.18, "dismissed_auditor": 0.05}, "growth": {"merger": 0.35, "expansion": 0.18, "upgraded": 0.25}, "operational": {"workforce_reduction": 0.12} } } # Portfolio manager learns: "Buy because 1m prediction 'up_a_little' (58% confidence), merger probability 35% + expansion 18% = growth catalysts, investigation risk 18% is manageable, volatility moderate."

Why This is Genius (Not Crazy)

We Already Have Everything
nanochat infrastructure training 1M SEC events on vortex. d20 (561M params) trains in 12 hours on 2x 3090. 455k SEC events, historical returns, stock prices all ready.
💰
The Economics
Training cost: ~$10 one-time (12-20 hours electricity). Inference cost: $0 forever (CPU). vs Claude API: $945 per 90k decisions. ROI: Infinite.
🧠
Explainable Intelligence
Natural language reasoning: "Buy because buyback + merger" vs "Q=0.82" black box. Can understand WHY it made each decision. Auditable for compliance.
🎯
Custom to Our Strategy
Not generic GPT-4 finance knowledge. Trained on YOUR data, YOUR patterns, YOUR constraints. Learns what works for us specifically.

The Breakthrough: Portfolio State Simulation

Key Innovation: Generate Millions of Examples from Historical Data

Instead of just 455k examples (one per filing), we simulate 10 different portfolio states per filing:

  1. Empty portfolio - Fresh start, $1M cash
  2. Healthy portfolio - 12 positions, +8% performance, diversified
  3. Overconcentrated - 42% tech (violates 40% sector limit)
  4. In drawdown - -18% drawdown, crisis mode
  5. Portfolio full - 20/20 positions (at capacity)
  6. Low cash - Only 8% cash (below 10% minimum)
  7. Already holding - Already own this ticker
  8. Hot streak - +22% last 30 days (5 wins in a row)
  9. Cold streak - -12% last 30 days (recent losses)
  10. Bear market - VIX 35, defensive positions

Result: 455k filings × 10 scenarios = 4.5M training examples!

Same filing gets DIFFERENT decisions based on portfolio context. This teaches the model discipline and risk management.

Example: Context-Aware Decisions

AAPL strong buy signal + healthy portfolio → BUY $95,000 NVDA even stronger signal + overconcentrated tech → SKIP (discipline! respect sector limits)

Training Data Format

Input (Natural Language Prompt)

You are a portfolio manager analyzing an SEC filing for trading decisions. COMPANY: AAPL FILING: 10-Q filed 2024-03-15 CURRENT PRICE: $175.32 TRANSFORMER PREDICTIONS: - Return forecast: up_a_lot (confidence: 0.72) SEC EVENTS DETECTED: - repurchased: $5B buyback authorized (confidence: 10) - upgraded: Credit rating improved to AA+ (confidence: 9) - expanded: Opening 3 new facilities (confidence: 8) DISTRESS SIGNALS: - investigation: SEC inquiry ongoing (probability: 0.18) GROWTH SIGNALS: - merger: Acquisition talks (probability: 0.35) INSIDER ACTIVITY: - Last 90 days: 3 buys ($2.3M), 0 sells - CEO purchased 50K shares @ $168 (15 days ago) STOCK TRENDS: - 30-day return: +8.2% - 90-day return: +15.7% PORTFOLIO STATE: - Cash available: $842,000 - Positions held: 12/20 - Recent performance: +4.2% (last 30 days) MARKET CONTEXT: - VIX: 18.5 (moderate volatility) DECISION REQUIRED: Should we BUY, SKIP, or SELL? If BUY, what position size?

Output (Trained with Hindsight)

{ "decision": "BUY", "position_size": 85000, "conviction": 0.82, "hold_duration_days": 75, "reasoning": "Strong bullish case: (1) Transformer predicts up_a_lot with 72% confidence, (2) Major buyback ($5B) signals management confidence, (3) Credit upgrade to AA+ reduces cost of capital, (4) Manufacturing expansion shows growth investment, (5) Insider accumulation ($2.3M in 90 days) confirms conviction, (6) Technical momentum strong (+15.7% in 90 days), (7) Merger probability (35%) provides additional upside catalyst. Risk factors manageable: SEC investigation at 18% probability is elevated but company has clean history. Recommend above-average position size ($85K vs $50K base) given strong signal alignment across multiple factors.", "risk_factors": [ "SEC investigation ongoing (18% probability)", "Tech sector concentration (already 3 tech positions)" ], "catalysts": [ "Buyback execution ($5B authorization)", "Merger announcement (35% probability)" ], "exit_triggers": [ "Investigation probability spikes above 30%", "Return drops below -8%", "Take profit if return exceeds +25%" ] }

Advantages Over Alternatives

Feature nanochat PM v1 Q-Learning Fixed Rules Claude API
Interpretability ✅ Natural language reasoning ❌ Black box Q-values ✅ Clear rules ✅ Natural language
Cost ✅ $0 forever ✅ $0 ✅ $0 ❌ $945 per backtest
Speed ✅ <500ms CPU ✅ Instant ✅ Instant ❌ 1-3 sec API
Flexibility ✅ Learned patterns ⚠️ Limited context ❌ Rigid thresholds ✅ Very flexible
Context Awareness ✅ Full portfolio state ⚠️ State vector only ❌ Simple if/else ✅ Full context
Custom Training ✅ Trained on YOUR data ✅ Learns YOUR strategy ⚠️ Manual tuning ❌ Generic finance
Privacy ✅ On-prem ✅ On-prem ✅ On-prem ❌ Data to API

Phase 1 Timeline (Events Only)

Week 0: Design & Planning (Complete)

Architecture design, portfolio state simulation strategy, transformer_v2 integration analysis. Documentation: PHASE_1_EVENTS_ONLY.md, ALIGNMENT_CONFIRMED.md, PIPELINE.txt, updated TODO.md.

Week 1: transformer_v2 Training (IMMEDIATE NEXT STEP)

Generate multi-horizon dataset (2-4 hours). Train Model A (6-8 hours GPU). Generate predictions on all 267k filings. Output: transformer_v2_predictions.jsonl with return forecasts, event probabilities, and risk metrics.

2
Week 2: Portfolio Manager Training Data

Load transformer_v2 predictions, generate 10 portfolio scenarios per filing, create ~2M training examples with hindsight labels, format as nanochat conversations. Estimated: 6-10 hours.

3
Week 3: Model Training

Train d12 (200M params) on vortex with 2x RTX 3090. Automated training for 12-20 hours. Cost: ~$10 electricity.

4
Week 4: Backtest & Evaluate

Implement NanochatPortfolioAgent, run historical backtests, compare vs v1 Q-learning (+11.20%) and always-buy (+3.65%). Target: +15% returns. If successful, plan Phase 2 (add agreements + insider trading).

Phase 1: 3-4 Weeks to Validate Core Architecture

Show that events alone can beat baselines. If Phase 1 hits +15%, we know the architecture works and can confidently add more signals (agreements, insider trading, financial data) in future phases.

Future Phases (Only if Phase 1 Succeeds)

🔄
Phase 2: Agreements + Insider
Add agreement features (credit terms, M&A structure) and insider trading signals (cluster buying, C-suite purchases). Target: +18-22% returns.
📊
Phase 3: Financial Data
Add XBRL parsing (revenue, margins, cash flow) and financial ratios (P/E, debt/equity, ROE). Target: +25% returns.
📰
Phase 4: Alternative Data
Add earnings call transcripts, news sentiment, and options flow. Target: +30%+ returns. Only pursue if Phase 3 delivers.

Phase 1 Success Metrics (Events Only)

Minimum Success (Proof of Concept)

Good Success

Excellent Performance (PRIMARY GOAL for Phase 1)

Amazing Success (Phase 1)

Why This Could Be Production-Ready

🎯
Custom to Our Strategy
Not generic GPT-4 finance knowledge. Trained on OUR data, OUR patterns, OUR style. Learns what works for us specifically.
💰
Economically Viable
$10 training cost (one-time), $0 inference cost (forever). vs $1,500/month for Qwen H200, vs $945 per backtest for Claude API.
📖
Explainable & Debuggable
Natural language reasoning. Can understand every decision. Easy to debug failures. Auditable for compliance.
🔄
Iterative Improvement
Easy to retrain with new data. Fine-tune as markets change. Add new signals as available. Continuous learning from outcomes.
🛡️
Controlled & Private
We own the model. No API dependencies. No data leaves our infrastructure. Deploy anywhere (cloud, on-prem).
📊
Data-Rich Training
Phase 1: ~2M training examples from 267k filings via portfolio state simulation. 10 examples per parameter. Future phases can add more signals and examples.

The Vision

What success looks like:

# Load your custom portfolio manager pm = NanochatPortfolioAgent(model='d12_portfolio') # Analyze an opportunity decision = pm.analyze_opportunity( ticker='AAPL', filing_date='2024-03-15', transformer_pred='up_a_lot', events=['repurchased', 'expanded'], insider='accumulation', portfolio_state=current_portfolio ) # Get explainable decision print(decision['decision']) # "BUY" print(decision['position_size']) # 85000 print(decision['reasoning']) # Full natural language explanation

Running Your Hedge Fund

Key Technical Decisions

Model Size: d12 (200M params)

Why not d20 (561M)?

Training Strategy: Hindsight Labels

Why hindsight vs Q-learning teacher vs rules?

Hybrid approach:

CPU Inference: Multi-threaded

Why CPU vs GPU?

Current Status

Week 0: Phase 1 Planning Complete ✅

Week 1: transformer_v2 Training (IMMEDIATE NEXT STEP) ⚡

Location: /home/kee/code/tester/temporal/transformer_v2

Tasks:

  1. Navigate to transformer_v2 directory
  2. Generate multi-horizon dataset (2-4 hours): python data/prepare_data_optimized.py --num-classes 5
  3. Train Model A (6-8 hours GPU): python training/train.py --config configs/model_a_balanced.yaml
  4. Generate predictions on all 267k filings
  5. Save to: /data2/kee/transformer_v2_predictions.jsonl

Week 2-4: Portfolio Manager (Next!) 🎯

Once transformer_v2 completes:

  1. Generate portfolio manager training data (~2M examples)
  2. Train nanochat d12 on vortex (12-20 hours)
  3. Backtest and compare vs baselines
  4. If +15%: Plan Phase 2. If not: Iterate and improve.

Related Projects

References & Documentation

Project location: /home/kee/code/tester/rl_trading/experiments/v6_nanochat_portfolio

Phase 1 documentation (NEW):

Core documentation:

nanochat:

Databases: