← Back to Ideas
The Core Innovation
Train a custom LLM specifically for portfolio management decisions.
Instead of:
- ❌ Q-learning (learns action values → black box)
- ❌ Fixed rules (rigid thresholds → can't adapt)
- ❌ API LLMs (expensive $945/90k decisions, slow 1-3 sec, generic)
We train:
- ✅ Custom nanochat LLM on portfolio decisions
- ✅ Input: ALL signals (transformer, SEC events, insider, trends, portfolio state)
- ✅ Output: BUY/SKIP/SELL + position size + natural language reasoning
- ✅ Training: Hindsight labels from historical data
- ✅ Inference: Fast CPU (<500ms), zero API costs
2M
Training Examples (Phase 1)
+15%
Phase 1 Target Returns
Baby Steps: Phase 1 Strategy
Start with Events Only - Validate Core Architecture First
Key Decision: Instead of building everything at once, we're starting with transformer_v2 predictions only (multi-horizon returns + future events). This validates the core architecture before adding complexity.
Why this approach?
- Faster iteration: 3-4 weeks vs 6-8 weeks for full system
- Clear baseline: Know what events alone achieve before adding more signals
- Fewer dependencies: Just transformer_v2, easier debugging
- Economic validation: Prove the $20 training cost works
- Technical validation: Verify real-time inference, explainability, constraints
Phase 1 Success = Build Phase 2
If Phase 1 hits +15% returns, we add agreements + insider trading (Phase 2, target +18-22%). If Phase 2 succeeds, add financial data (Phase 3, target +25%). But first: show events working!
Perfect Alignment: transformer_v2 Integration
Discovery: transformer_v2 Outputs Exactly What We Need!
transformer_v2 (multi-task, multi-horizon) predicts:
- ✅ Return predictions at 4 horizons (1d, 1w, 1m, 3m) with confidence scores
- ✅ Future event probabilities (distress, growth, operational - 24 event types)
- ✅ Risk metrics per horizon (volatility, Sharpe ratio, max drawdown)
- ✅ Meta-predictions (best horizon recommendation, horizon confidence)
100% schema match + bonuses! No format conversion needed. Portfolio manager gets rich signals:
# transformer_v2 output (simplified)
{
"best_horizon": "1m",
"horizons": {
"1m": {
"return_class": "up_a_little",
"confidence": 0.58,
"volatility": 18.2,
"sharpe": 2.34,
"max_drawdown": -2.83
}
},
"future_event_labels": {
"distress": {"investigation": 0.18, "dismissed_auditor": 0.05},
"growth": {"merger": 0.35, "expansion": 0.18, "upgraded": 0.25},
"operational": {"workforce_reduction": 0.12}
}
}
# Portfolio manager learns:
"Buy because 1m prediction 'up_a_little' (58% confidence),
merger probability 35% + expansion 18% = growth catalysts,
investigation risk 18% is manageable, volatility moderate."
Why This is Genius (Not Crazy)
✅
We Already Have Everything
nanochat infrastructure training 1M SEC events on vortex. d20 (561M params) trains in 12 hours on 2x 3090.
455k SEC events, historical returns, stock prices all ready.
💰
The Economics
Training cost: ~$10 one-time (12-20 hours electricity).
Inference cost: $0 forever (CPU).
vs Claude API: $945 per 90k decisions.
ROI: Infinite.
🧠
Explainable Intelligence
Natural language reasoning: "Buy because buyback + merger" vs "Q=0.82" black box.
Can understand WHY it made each decision. Auditable for compliance.
🎯
Custom to Our Strategy
Not generic GPT-4 finance knowledge. Trained on YOUR data, YOUR patterns, YOUR constraints.
Learns what works for us specifically.
The Breakthrough: Portfolio State Simulation
Key Innovation: Generate Millions of Examples from Historical Data
Instead of just 455k examples (one per filing), we simulate 10 different portfolio states per filing:
- Empty portfolio - Fresh start, $1M cash
- Healthy portfolio - 12 positions, +8% performance, diversified
- Overconcentrated - 42% tech (violates 40% sector limit)
- In drawdown - -18% drawdown, crisis mode
- Portfolio full - 20/20 positions (at capacity)
- Low cash - Only 8% cash (below 10% minimum)
- Already holding - Already own this ticker
- Hot streak - +22% last 30 days (5 wins in a row)
- Cold streak - -12% last 30 days (recent losses)
- Bear market - VIX 35, defensive positions
Result: 455k filings × 10 scenarios = 4.5M training examples!
Same filing gets DIFFERENT decisions based on portfolio context. This teaches the model discipline and risk management.
Example: Context-Aware Decisions
AAPL strong buy signal + healthy portfolio
→ BUY $95,000
NVDA even stronger signal + overconcentrated tech
→ SKIP (discipline! respect sector limits)
Training Data Format
Input (Natural Language Prompt)
You are a portfolio manager analyzing an SEC filing for trading decisions.
COMPANY: AAPL
FILING: 10-Q filed 2024-03-15
CURRENT PRICE: $175.32
TRANSFORMER PREDICTIONS:
- Return forecast: up_a_lot (confidence: 0.72)
SEC EVENTS DETECTED:
- repurchased: $5B buyback authorized (confidence: 10)
- upgraded: Credit rating improved to AA+ (confidence: 9)
- expanded: Opening 3 new facilities (confidence: 8)
DISTRESS SIGNALS:
- investigation: SEC inquiry ongoing (probability: 0.18)
GROWTH SIGNALS:
- merger: Acquisition talks (probability: 0.35)
INSIDER ACTIVITY:
- Last 90 days: 3 buys ($2.3M), 0 sells
- CEO purchased 50K shares @ $168 (15 days ago)
STOCK TRENDS:
- 30-day return: +8.2%
- 90-day return: +15.7%
PORTFOLIO STATE:
- Cash available: $842,000
- Positions held: 12/20
- Recent performance: +4.2% (last 30 days)
MARKET CONTEXT:
- VIX: 18.5 (moderate volatility)
DECISION REQUIRED:
Should we BUY, SKIP, or SELL? If BUY, what position size?
Output (Trained with Hindsight)
{
"decision": "BUY",
"position_size": 85000,
"conviction": 0.82,
"hold_duration_days": 75,
"reasoning": "Strong bullish case: (1) Transformer predicts up_a_lot with 72%
confidence, (2) Major buyback ($5B) signals management confidence,
(3) Credit upgrade to AA+ reduces cost of capital, (4) Manufacturing
expansion shows growth investment, (5) Insider accumulation ($2.3M
in 90 days) confirms conviction, (6) Technical momentum strong
(+15.7% in 90 days), (7) Merger probability (35%) provides
additional upside catalyst. Risk factors manageable: SEC
investigation at 18% probability is elevated but company has
clean history. Recommend above-average position size ($85K vs
$50K base) given strong signal alignment across multiple factors.",
"risk_factors": [
"SEC investigation ongoing (18% probability)",
"Tech sector concentration (already 3 tech positions)"
],
"catalysts": [
"Buyback execution ($5B authorization)",
"Merger announcement (35% probability)"
],
"exit_triggers": [
"Investigation probability spikes above 30%",
"Return drops below -8%",
"Take profit if return exceeds +25%"
]
}
Advantages Over Alternatives
| Feature |
nanochat PM |
v1 Q-Learning |
Fixed Rules |
Claude API |
| Interpretability |
✅ Natural language reasoning |
❌ Black box Q-values |
✅ Clear rules |
✅ Natural language |
| Cost |
✅ $0 forever |
✅ $0 |
✅ $0 |
❌ $945 per backtest |
| Speed |
✅ <500ms CPU |
✅ Instant |
✅ Instant |
❌ 1-3 sec API |
| Flexibility |
✅ Learned patterns |
⚠️ Limited context |
❌ Rigid thresholds |
✅ Very flexible |
| Context Awareness |
✅ Full portfolio state |
⚠️ State vector only |
❌ Simple if/else |
✅ Full context |
| Custom Training |
✅ Trained on YOUR data |
✅ Learns YOUR strategy |
⚠️ Manual tuning |
❌ Generic finance |
| Privacy |
✅ On-prem |
✅ On-prem |
✅ On-prem |
❌ Data to API |
Phase 1 Timeline (Events Only)
✅
Week 0: Design & Planning (Complete)
Architecture design, portfolio state simulation strategy, transformer_v2 integration analysis. Documentation: PHASE_1_EVENTS_ONLY.md, ALIGNMENT_CONFIRMED.md, PIPELINE.txt, updated TODO.md.
⚡
Week 1: transformer_v2 Training (IMMEDIATE NEXT STEP)
Generate multi-horizon dataset (2-4 hours). Train Model A (6-8 hours GPU). Generate predictions on all 267k filings. Output: transformer_v2_predictions.jsonl with return forecasts, event probabilities, and risk metrics.
2
Week 2: Portfolio Manager Training Data
Load transformer_v2 predictions, generate 10 portfolio scenarios per filing, create ~2M training examples with hindsight labels, format as nanochat conversations. Estimated: 6-10 hours.
3
Week 3: Model Training
Train d12 (200M params) on vortex with 2x RTX 3090. Automated training for 12-20 hours. Cost: ~$10 electricity.
4
Week 4: Backtest & Evaluate
Implement NanochatPortfolioAgent, run historical backtests, compare vs v1 Q-learning (+11.20%) and always-buy (+3.65%). Target: +15% returns. If successful, plan Phase 2 (add agreements + insider trading).
Phase 1: 3-4 Weeks to Validate Core Architecture
Show that events alone can beat baselines. If Phase 1 hits +15%, we know the architecture works and can confidently add more signals (agreements, insider trading, financial data) in future phases.
Future Phases (Only if Phase 1 Succeeds)
🔄
Phase 2: Agreements + Insider
Add agreement features (credit terms, M&A structure) and insider trading signals (cluster buying, C-suite purchases). Target: +18-22% returns.
📊
Phase 3: Financial Data
Add XBRL parsing (revenue, margins, cash flow) and financial ratios (P/E, debt/equity, ROE). Target: +25% returns.
📰
Phase 4: Alternative Data
Add earnings call transcripts, news sentiment, and options flow. Target: +30%+ returns. Only pursue if Phase 3 delivers.
Phase 1 Success Metrics (Events Only)
Minimum Success (Proof of Concept)
- ✅ transformer_v2 accuracy >55%
- ✅ Portfolio manager trains successfully (no crashes)
- ✅ Generates valid JSON outputs (parseable)
- ✅ Positive returns in backtest (> 0%)
Good Success
- ✅ transformer_v2 accuracy >60%
- ✅ PM beats always-buy baseline (+3.65%)
- ✅ Explainable decisions (can understand reasoning)
- ✅ Reasonable trade count (<300)
Excellent Performance (PRIMARY GOAL for Phase 1)
- ✅ transformer_v2 accuracy >65%
- ✅ PM beats v1 Q-learning (+11.20%)
- ✅ +15% returns or higher
- ✅ Works across market regimes (bull/bear/crash)
- ✅ Low drawdowns (<15%)
- ✅ Proceed to Phase 2 (add more signals)
Amazing Success (Phase 1)
- ✅ transformer_v2 accuracy >70%
- ✅ PM achieves +20% returns with events alone
- ✅ Fast inference (<500ms CPU)
- ✅ Validated on out-of-sample data
- ✅ Clear path to Phase 2, 3, 4 improvements
Why This Could Be Production-Ready
🎯
Custom to Our Strategy
Not generic GPT-4 finance knowledge. Trained on OUR data, OUR patterns, OUR style.
Learns what works for us specifically.
💰
Economically Viable
$10 training cost (one-time), $0 inference cost (forever).
vs $1,500/month for Qwen H200, vs $945 per backtest for Claude API.
📖
Explainable & Debuggable
Natural language reasoning. Can understand every decision.
Easy to debug failures. Auditable for compliance.
🔄
Iterative Improvement
Easy to retrain with new data. Fine-tune as markets change.
Add new signals as available. Continuous learning from outcomes.
🛡️
Controlled & Private
We own the model. No API dependencies.
No data leaves our infrastructure. Deploy anywhere (cloud, on-prem).
📊
Data-Rich Training
Phase 1: ~2M training examples from 267k filings via portfolio state simulation.
10 examples per parameter. Future phases can add more signals and examples.
The Vision
What success looks like:
# Load your custom portfolio manager
pm = NanochatPortfolioAgent(model='d12_portfolio')
# Analyze an opportunity
decision = pm.analyze_opportunity(
ticker='AAPL',
filing_date='2024-03-15',
transformer_pred='up_a_lot',
events=['repurchased', 'expanded'],
insider='accumulation',
portfolio_state=current_portfolio
)
# Get explainable decision
print(decision['decision']) # "BUY"
print(decision['position_size']) # 85000
print(decision['reasoning']) # Full natural language explanation
Running Your Hedge Fund
- Custom LLM trained on your proprietary data
- Explains every decision in natural language
- Runs on your hardware ($0 cost)
- Learns from your investment style
- Beats professional benchmarks
- Respects YOUR risk limits and constraints
Key Technical Decisions
Model Size: d12 (200M params)
Why not d20 (561M)?
- d12 is 200M params - good balance of accuracy and speed
- Faster inference (important for backtesting and real-time)
- Easier to quantize and deploy
- Portfolio decisions aren't as complex as SEC extraction
- Can upgrade to d20 if needed
Training Strategy: Hindsight Labels
Why hindsight vs Q-learning teacher vs rules?
- Hindsight: Direct ground truth (we know what happened)
- Can generate millions of examples
- Covers all market conditions
- No bias from existing strategies
Hybrid approach:
- Primary: Hindsight labels (actual returns)
- Secondary: v1 Q-learning decisions (proven strategy)
- Tertiary: v5 rules (human expertise)
- Consensus: When all agree → high confidence label
CPU Inference: Multi-threaded
Why CPU vs GPU?
- Cost: $0 vs cloud GPU costs
- Latency: <500ms acceptable for research and production
- Deployment: Easier to scale CPUs
- Accessibility: Works anywhere
Current Status
Week 0: Phase 1 Planning Complete ✅
- ✅ Complete architecture designed (docs/ARCHITECTURE.md - 31KB)
- ✅ Portfolio state simulation strategy (docs/TRAINING_DATA_STRATEGY.md - 11KB)
- ✅ Training data format specified
- ✅ transformer_v2 integration analyzed (ALIGNMENT_CONFIRMED.md - 15KB)
- ✅ Phase 1 strategy documented (PHASE_1_EVENTS_ONLY.md - 13KB)
- ✅ Complete pipeline visualization (PIPELINE.txt)
- ✅ Updated task list (TODO.md - 7KB)
- ✅ Directory structure created
Week 1: transformer_v2 Training (IMMEDIATE NEXT STEP) ⚡
Location: /home/kee/code/tester/temporal/transformer_v2
Tasks:
- Navigate to transformer_v2 directory
- Generate multi-horizon dataset (2-4 hours):
python data/prepare_data_optimized.py --num-classes 5
- Train Model A (6-8 hours GPU):
python training/train.py --config configs/model_a_balanced.yaml
- Generate predictions on all 267k filings
- Save to:
/data2/kee/transformer_v2_predictions.jsonl
Week 2-4: Portfolio Manager (Next!) 🎯
Once transformer_v2 completes:
- Generate portfolio manager training data (~2M examples)
- Train nanochat d12 on vortex (12-20 hours)
- Backtest and compare vs baselines
- If +15%: Plan Phase 2. If not: Iterate and improve.
Related Projects
- transformer_v2 - Multi-task, multi-horizon predictor (ready to train this week)
- v1: Transformer + Q-learning - Baseline at +11.20% returns (proven)
- v5: Signal-based rules - Design phase (projected +12-18%)
- nanochat SEC extraction - Training 1M events on vortex (infrastructure proven)
References & Documentation
Project location: /home/kee/code/tester/rl_trading/experiments/v6_nanochat_portfolio
Phase 1 documentation (NEW):
PHASE_1_EVENTS_ONLY.md - Events-only focused plan (13KB)
ALIGNMENT_CONFIRMED.md - transformer_v2 integration analysis (15KB)
PIPELINE.txt - Complete production pipeline visualization
TODO.md - Updated task list with phased approach (7KB)
Core documentation:
README.md - Project overview (16KB)
STATUS.md - Current state and design details (20KB)
QUICKSTART.md - Quick context for resuming work (3KB)
docs/ARCHITECTURE.md - Complete technical design (31KB)
docs/TRAINING_DATA_STRATEGY.md - Portfolio state simulation strategy (11KB)
docs/LLM_AGENT_ARCHITECTURE.md - LLM comparison analysis (28KB)
nanochat:
Databases:
- Events:
/data2/kee/events.db (11M SEC extractions)
- Returns:
/data2/kee/filing_returns_rl.db
- Prices:
/data2/kee/stock_cache.db