Context
You built Ten-K Wizard (sold to Morningstar 2008), now building a Q-learning trading system on SEC filings. You understand the domain deeply, so let's talk about the REAL production challenges.
- "We're in a serious bull market, P/Es sky high"
- "Probably partially in an AI bubble"
- "Biotech going crazy - AI enabling breakthroughs"
- "At some point we'll have interesting discussions on how these affect model building"
The Big Picture Problem: Regime Change
Market Regime Shifts
Your Model Was Trained on 2015-2023
What your model learned:
2015-2016: Bull market (learned: M&A = good)
2018: Correction (-20%)
2020: COVID crash → V-shaped recovery (learned: buy the dip)
2021-2022: Mega bull → Bear market (learned: volatility)
2023: Recovery (learned: resilience)
Overall: 8 years of mostly UP markets
What it HASN'T seen:
2000-2002: Dot-com crash (-78% for NASDAQ)
2008-2009: Financial crisis (-57% for S&P)
1970s: Stagflation (inflation + recession)
1987: Black Monday (-22% in ONE DAY)
2025+: AI bubble burst? (maybe)
The risk: Model trained on bull markets fails catastrophically in bear markets
Example: What Happens When Bubble Pops
# Bull market behavior (2023-2024)
Filing: "NVIDIA announced new AI chip"
Transformer prediction: +25%
Q-learning: BUY
Actual result: +30% ✅
# After AI bubble pops (2025?)
Filing: "NVIDIA announced new AI chip"
Transformer prediction: +25% ← Still bullish (trained on bull market!)
Q-learning: BUY ← Follows prediction
Actual result: -15% ❌ ← Market doesn't care anymore
Model's Blind Spots:
- Trained when "AI" in filing = automatic +20%
- Trained when M&A = good (cheap debt era)
- Trained when growth > profitability
- Hasn't learned: When narratives stop working
Real Example (Ten-K Wizard Era)
2000: "Partnership with Amazon" → +50% (dot-com bubble)
2002: "Partnership with Amazon" → -10% (post-crash, nobody cares)
Same event, different regime, different outcome!
Mitigation Strategies
A. Regime Detection
def detect_market_regime():
"""Detect if market regime changed"""
vix = get_vix()
pe_ratio = get_sp500_pe()
market_return_3m = get_market_return()
if vix > 30:
regime = 'HIGH_VOLATILITY' # Crisis mode
elif pe_ratio > 25:
regime = 'OVERVALUED' # Bubble territory
elif market_return_3m < -10:
regime = 'BEAR_MARKET'
else:
regime = 'NORMAL'
return regime
# Adjust Q-learning behavior
if detect_market_regime() != 'NORMAL':
# Reduce position sizes
# Increase cash holdings
# Only trade highest-confidence predictions
action = conservative_action(state)
B. Market-Adjusted Returns
# Instead of:
reward = stock_return_3m # Absolute return
# Use:
reward = stock_return_3m - sp500_return_3m # Market-adjusted (alpha)
# This teaches Q-learning to beat the market, not just make money
# Works in both bull and bear markets
C. VIX-Based Position Sizing
vix = get_vix()
if vix < 15:
position_size = 100 # Normal
elif vix < 25:
position_size = 50 # Cautious
elif vix < 35:
position_size = 25 # Very cautious
else:
position_size = 0 # Cash only
AI Bubble Risk
AI Bubble Risk (Your Specific Concern)
The Setup:
- 2023-2024: "AI" in filing = automatic stock boost
- NVIDIA, Meta, Google, Microsoft: AI mentions = +50-100%
- Biotech + AI = "breakthrough" narrative = +200%
Your Transformer Learned:
Pattern learned:
"announced AI partnership" → +30% return
"AI drug discovery platform" → +50% return
"implementing AI in operations" → +15% return
This is REAL in 2024... but what about 2026?
When Bubble Pops (maybe 2025-2026?):
Phase 1 (Now): AI hype = free money
"Using AI for customer service" → +10%
Phase 2 (Bubble peak, maybe Q2 2025): Peak euphoria
"AI" mentioned in footnote → +5%
Companies adding "AI" to name
Phase 3 (Pop, maybe Q3-Q4 2025): Reality check
"Using AI for customer service" → -20%
Market: "Show me revenue, not buzzwords"
Phase 4 (Shakeout, 2026): Only real AI wins
Real AI companies: Still valuable
"AI washing" companies: Crushed
Your Model's Risk:
- Trained on Phase 1-2 (hype works)
- Will fail in Phase 3-4 (hype fails)
- Can't distinguish real AI from AI washing
Historical Parallel: Dot-com Bubble (1999-2002)
You saw this with Ten-K Wizard:
1999: "E-commerce strategy" → +100%
2000: "Internet ready" → +50%
2001: "E-commerce strategy" → -50%
2002: "E-commerce strategy" → Nobody cares
Real winners: Amazon, eBay (survived)
Losers: Pets.com, Webvan, etc. (died)
Mitigation Strategies
A. Keyword Penalty During Bubble Phases
def adjust_prediction_for_bubble(prediction, filing_text):
"""Reduce prediction if suspicious AI hype"""
# Count AI mentions
ai_mentions = filing_text.lower().count('artificial intelligence')
ai_mentions += filing_text.lower().count(' ai ')
# Check for substance
has_ai_revenue = 'ai revenue' in filing_text.lower()
has_ai_product = 'ai product' in filing_text.lower()
# If lots of mentions but no substance = AI washing
if ai_mentions > 10 and not (has_ai_revenue or has_ai_product):
bubble_discount = 0.5 # 50% haircut
adjusted_prediction = prediction * bubble_discount
print(f"⚠️ AI washing detected, discounting prediction")
return adjusted_prediction
return prediction
B. Sector Rotation Detection
def check_sector_bubble(sector):
"""Check if sector is overheated"""
sector_pe = get_sector_pe(sector)
historical_pe = get_historical_pe(sector, years=10)
# If 2+ standard deviations above historical = bubble
if sector_pe > historical_pe + 2 * std(historical_pe):
return True, 'BUBBLE_RISK'
return False, 'NORMAL'
# Usage
is_bubble, _ = check_sector_bubble('Technology')
if is_bubble:
# Reduce tech exposure
# Increase defensive sectors (healthcare, utilities)
Biotech + AI Bubble
Biotech + AI Bubble (Double Bubble!)
Your observation: "Biotech going crazy, AI allowing breakthroughs"
The Reality: Some breakthroughs are REAL, some are hype
Real Breakthroughs (probably sustainable):
- AlphaFold (protein folding)
- AI-assisted drug discovery (shortens timelines)
- Personalized medicine with AI
Hype (bubble risk):
- "We use AI for drug discovery" (every biotech says this now)
- Valuations 10x based on AI mention alone
- No actual drugs in pipeline yet
The Challenge: How does your model distinguish?
Example Filings:
Company A (REAL):
"Our AI-designed drug candidate ABC-123 showed 85% efficacy in Phase 2 trials.
FDA granted Breakthrough Therapy designation. Projected $2B peak sales."
Company B (HYPE):
"We are leveraging cutting-edge AI and machine learning to revolutionize
drug discovery. Our platform has analyzed over 1 million compounds."
Your transformer sees BOTH as bullish → +30% prediction
Reality:
Company A: Real drug, real revenue → Actually +50%
Company B: No pipeline, no revenue → Actually -40%
Mitigation: Look for Substance, Not Buzzwords
def biotech_substance_check(filing):
"""Check if biotech filing has real substance"""
has_phase_data = any(p in filing for p in ['Phase 1', 'Phase 2', 'Phase 3'])
has_fda_action = any(f in filing for f in ['FDA approved', 'FDA granted', 'Breakthrough'])
has_revenue_projection = 'peak sales' in filing.lower()
substance_score = sum([has_phase_data, has_fda_action, has_revenue_projection])
if substance_score >= 2:
return 'REAL'
else:
return 'HYPE'
# Adjust prediction
if sector == 'Biotech':
substance = biotech_substance_check(filing)
if substance == 'HYPE':
prediction *= 0.5 # 50% discount
Overfitting to Recent Patterns
Overfitting to Recent Patterns
The problem: Your model is VERY good at 2020-2024, but...
What it Memorized (2020-2024 patterns that might not hold):
- "Work from home" products = bullish (COVID era)
Post-COVID: Return to office = bearish for some - "Supply chain disruption" = bullish (scarcity premium)
Post-COVID: Normalized supply = no premium - "Zero interest rate" = growth stocks soar
2024+: Higher rates = value stocks win - "Inflation hedge" = commodities bullish
If inflation cools: Commodities crash - "ESG focus" = valuation premium
2024: ESG fatigue = premium gone
Historical Example (from Ten-K Wizard days):
2005-2007: "Subprime mortgage growth" = bullish
Pattern: More loans = more revenue = stock up
2008: "Subprime mortgage growth" = death sentence
Same pattern, catastrophic result!
Your model can't learn this from 2015-2023 data (no subprime crisis)
Mitigation: Add Macro Features
state = (
prediction_bucket,
price_bucket,
interest_rate_bucket, # NEW: Fed funds rate
inflation_bucket, # NEW: CPI
vix_bucket, # NEW: Volatility
has_position
)
# Now Q-learning can learn:
# "Buybacks good when rates low, bad when rates high"
Transaction Costs Will Destroy You
Transaction Costs
The hidden killer: Your backtest shows +10% annual returns, but...
Costs You Haven't Included:
Bid-ask spread: 0.1-0.5% per trade
Commission: $0 (Robinhood) to $5 (traditional)
Slippage: 0.2-1.0% (can't trade at exact price)
Market impact: 0.1-2.0% (your order moves price)
Short-term capital gains tax: 37% (if < 1 year holding)
Total: 0.4-3.5% per ROUND TRIP (buy + sell)
Reality vs Backtest:
# Backtest (no costs)
Filing arrives: Prediction +10%
Action: BUY
3 months later: Actual +12%
Action: SELL
Net return: +12% ✅
# Reality (with costs)
Filing arrives: Prediction +10%
Action: BUY
Costs: -0.5% (bid-ask + slippage)
3 months later: Actual +12%
Action: SELL
Costs: -0.5% (bid-ask + slippage)
Tax: -4.0% (37% of 11% gain)
Net return: +7% ❌ (58% of backtest!)
For Ten-Q Capital:
Backtest annual return: +15%
After costs: +8-10%
After taxes: +5-7%
Still good! But 50% haircut from backtest
Mitigation: Include Costs in Training
def execute_action_with_costs(filing, action):
"""Realistic reward function with costs"""
if action == BUY:
entry_cost = 0.003 # 0.3% (bid-ask + slippage)
exit_cost = 0.003 # 0.3%
reward = filing.actual_return - entry_cost - exit_cost
# Tax on gains (if < 1 year)
if reward > 0:
tax = reward * 0.37 # Short-term cap gains
reward -= tax
return reward
Capacity Constraints
Capacity Constraints
The problem: Your strategy works at $1M, but what about $100M?
Example:
Small fund ($1M):
Filing: TSLA earnings beat
Signal: BUY 100 shares ($25k)
Execution: Instant, no market impact ✅
Large fund ($100M):
Filing: SMCI (small cap) earnings beat
Signal: BUY $5M worth
Problem: Daily volume is $10M
Your order: 50% of daily volume!
Result: Price moves UP 5% before you're done buying
Slippage: -5% ❌
Capacity Estimate for Ten-Q Capital:
For $1M fund:
- Can trade stocks with $10M daily volume
- Covers most mid-caps and above ✅
For $10M fund:
- Need $100M daily volume
- Covers large caps ✅
For $100M fund:
- Need $1B daily volume
- Only mega-caps (AAPL, MSFT, GOOGL) ✅
- Small/mid-caps excluded ❌
For $1B fund:
- Need $10B daily volume
- Only top 50 stocks ❌ Strategy breaks down
Your strategy's capacity: $50-100M before slippage kills returns
Mitigation: Liquidity Filter
def check_liquidity(ticker, position_size):
"""Only trade if liquid enough"""
daily_volume = get_daily_volume(ticker)
daily_dollar_volume = daily_volume * get_price(ticker)
# Our position should be < 5% of daily volume
if position_size < daily_dollar_volume * 0.05:
return True # OK to trade
else:
return False # Skip (too illiquid)
Data Quality Issues
Data Quality Issues
The problem: SEC filings are MESSY (you know this from Ten-K Wizard!)
Real Issues:
- Filing delays: Filed Friday after close, you see Monday morning, stock already moved 10%
- Amended filings: Original: "Revenue $100M", Amendment next day: "Revenue $50M (oops)", traded on wrong data
- XBRL errors: Company tagged data wrong, parser extracted wrong numbers
- Non-GAAP adjustments: "Adjusted EBITDA" = $50M vs GAAP loss = -$20M
- Language changes: 2020 "Impacted by COVID-19" = bad, 2023 "COVID-19 headwinds abating" = good (same words, opposite meaning)
Your advantage (from Ten-K Wizard experience): You KNOW the data is messy, you built a company parsing this stuff, you understand the edge cases
Mitigation: Filing Timestamp Checks
def get_filing_age(filing):
"""Check how old filing is when we see it"""
filed_time = filing.filing_datetime
our_time = datetime.now()
age_hours = (our_time - filed_time).total_seconds() / 3600
return age_hours
# Skip if stale
age = get_filing_age(filing)
if age > 4: # More than 4 hours old
print("Filing too old, price already moved")
action = HOLD
Regulatory Risk
Regulatory Risk
The problem: SEC might not like your system
Potential Issues:
- Market manipulation? Your system trades immediately after filings, could be seen as "front-running" public info
- Material non-public information? Your transformer is VERY good (42.8% correlation), SEC: "How are you so accurate? Do you have inside info?"
- Algorithm trading rules: Need to register as algorithmic trader? Need kill switch?
- Investment advisor registration: If managing outside money, need to register
From your Ten-K Wizard experience: You understand SEC regulations, you know what's legal vs. gray area, you can structure Ten-Q Capital properly
Mitigation: Use Public Info Only
# ONLY use data available to everyone
sources = [
'EDGAR filings (public)',
'Stock prices (public)',
'Form 3/4/5 (public)',
]
# DO NOT use:
bad_sources = [
'Expert networks',
'Non-public company calls',
'Leaked documents',
]
Summary: Top 5 Risks for Ten-Q Capital
-
Market Regime Shift (HIGHEST RISK)
Model trained on 2015-2024 (mostly bull market). AI bubble could pop → Patterns break.
Mitigation: Regime detection, market-adjusted returns, VIX-based sizing -
Overfitting to Recent Patterns
Model knows 2020-2024 very well. Might not generalize to 2025-2030.
Mitigation: Periodic retraining, macro features, rolling windows -
Transaction Costs
Backtest ignores bid-ask, slippage, taxes. Real returns = 50-70% of backtest.
Mitigation: Include costs in training, reduce turnover, batch trades -
Capacity Constraints
Works at $1-10M. Breaks down at $100M+ (illiquidity).
Mitigation: Liquidity filters, position scaling -
AI Bubble Risk (Your Specific Concern)
"AI" mention = free money in 2024. If bubble pops, model still thinks "AI" = bullish.
Mitigation: AI washing detection, substance checks, fade hype
What Hedge Funds Actually Do
From talking to quant PMs:
1. Conservative Position Sizing
# Theoretical: Risk 10% per trade
# Reality: Risk 0.5-2% per trade
# Why: Preserve capital through regime changes
2. Multiple Models
# Don't rely on ONE model
# Run 5-10 models simultaneously
# Take signal only when they agree
models = [
'q_learning_model',
'transformer_only',
'insider_features_model',
'macro_adjusted_model',
'ensemble_model',
]
if agree_threshold(models) > 0.7: # 70%+ agree
action = BUY
3. Human Override
# Model suggests BUY
# But PM sees:
# - VIX > 40 (crisis)
# - Sector bubble
# - Recent news negative
# PM overrides to HOLD
# Algorithms suggest, humans decide (at start)
4. Slow Ramp-Up
# Year 1: Paper trading
# Year 2: $100k real money
# Year 3: $1M
# Year 4: $10M
# Year 5: $50M (if still working)
# NOT: $10M day one
"In God we trust, all others must bring data" - W. Edwards Deming
"Markets can remain irrational longer than you can remain solvent" - Keynes
"Past performance is not indicative of future results" - Every hedge fund prospectus ever
Your Domain Expertise Matters
You understand SEC regulations, you've seen bubbles burst, you know data quality issues. Your Ten-K Wizard experience (2000-2008, through dot-com crash and financial crisis) + ML/RL system = Potentially very powerful combination.
But need to respect regime shifts, bubbles, and capacity constraints.