Production Pitfalls for Ten-Q Capital - Hedge Fund Reality Check

← Back to Ideas

Context

You built Ten-K Wizard (sold to Morningstar 2008), now building a Q-learning trading system on SEC filings. You understand the domain deeply, so let's talk about the REAL production challenges.

Your Observations:

"We're in a serious bull market, P/Es sky high"
"Probably partially in an AI bubble"
"Biotech going crazy - AI enabling breakthroughs"
"At some point we'll have interesting discussions on how these affect model building"

The Big Picture Problem: Regime Change

Pitfall #1

HIGHEST RISK

Market Regime Shifts

Your Model Was Trained on 2015-2023

What your model learned:

2015-2016: Bull market (learned: M&A = good)
2018: Correction (-20%)
2020: COVID crash → V-shaped recovery (learned: buy the dip)
2021-2022: Mega bull → Bear market (learned: volatility)
2023: Recovery (learned: resilience)

Overall: 8 years of mostly UP markets

What it HASN'T seen:

2000-2002: Dot-com crash (-78% for NASDAQ)
2008-2009: Financial crisis (-57% for S&P)
1970s: Stagflation (inflation + recession)
1987: Black Monday (-22% in ONE DAY)
2025+: AI bubble burst? (maybe)

The risk: Model trained on bull markets fails catastrophically in bear markets

Example: What Happens When Bubble Pops

# Bull market behavior (2023-2024)
Filing: "NVIDIA announced new AI chip"
Transformer prediction: +25%
Q-learning: BUY
Actual result: +30% ✅

# After AI bubble pops (2025?)
Filing: "NVIDIA announced new AI chip"
Transformer prediction: +25%  ← Still bullish (trained on bull market!)
Q-learning: BUY  ← Follows prediction
Actual result: -15% ❌  ← Market doesn't care anymore

Model's Blind Spots:

Trained when "AI" in filing = automatic +20%
Trained when M&A = good (cheap debt era)
Trained when growth > profitability
Hasn't learned: When narratives stop working

Real Example (Ten-K Wizard Era)

2000: "Partnership with Amazon" → +50% (dot-com bubble)
2002: "Partnership with Amazon" → -10% (post-crash, nobody cares)

Same event, different regime, different outcome!

Mitigation Strategies

A. Regime Detection

def detect_market_regime():
    """Detect if market regime changed"""
    vix = get_vix()
    pe_ratio = get_sp500_pe()
    market_return_3m = get_market_return()

    if vix > 30:
        regime = 'HIGH_VOLATILITY'  # Crisis mode
    elif pe_ratio > 25:
        regime = 'OVERVALUED'  # Bubble territory
    elif market_return_3m < -10:
        regime = 'BEAR_MARKET'
    else:
        regime = 'NORMAL'

    return regime

# Adjust Q-learning behavior
if detect_market_regime() != 'NORMAL':
    # Reduce position sizes
    # Increase cash holdings
    # Only trade highest-confidence predictions
    action = conservative_action(state)

B. Market-Adjusted Returns

# Instead of:
reward = stock_return_3m  # Absolute return

# Use:
reward = stock_return_3m - sp500_return_3m  # Market-adjusted (alpha)

# This teaches Q-learning to beat the market, not just make money
# Works in both bull and bear markets

C. VIX-Based Position Sizing

vix = get_vix()

if vix < 15:
    position_size = 100  # Normal
elif vix < 25:
    position_size = 50   # Cautious
elif vix < 35:
    position_size = 25   # Very cautious
else:
    position_size = 0    # Cash only

AI Bubble Risk

Pitfall #2

HIGH RISK

AI Bubble Risk (Your Specific Concern)

The Setup:

2023-2024: "AI" in filing = automatic stock boost
NVIDIA, Meta, Google, Microsoft: AI mentions = +50-100%
Biotech + AI = "breakthrough" narrative = +200%

Your Transformer Learned:

Pattern learned:
  "announced AI partnership" → +30% return
  "AI drug discovery platform" → +50% return
  "implementing AI in operations" → +15% return

This is REAL in 2024... but what about 2026?

When Bubble Pops (maybe 2025-2026?):

Phase 1 (Now): AI hype = free money
  "Using AI for customer service" → +10%

Phase 2 (Bubble peak, maybe Q2 2025): Peak euphoria
  "AI" mentioned in footnote → +5%
  Companies adding "AI" to name

Phase 3 (Pop, maybe Q3-Q4 2025): Reality check
  "Using AI for customer service" → -20%
  Market: "Show me revenue, not buzzwords"

Phase 4 (Shakeout, 2026): Only real AI wins
  Real AI companies: Still valuable
  "AI washing" companies: Crushed

Your Model's Risk:

Trained on Phase 1-2 (hype works)
Will fail in Phase 3-4 (hype fails)
Can't distinguish real AI from AI washing

Historical Parallel: Dot-com Bubble (1999-2002)

You saw this with Ten-K Wizard:

1999: "E-commerce strategy" → +100%
2000: "Internet ready" → +50%
2001: "E-commerce strategy" → -50%
2002: "E-commerce strategy" → Nobody cares

Real winners: Amazon, eBay (survived)
Losers: Pets.com, Webvan, etc. (died)

Mitigation Strategies

A. Keyword Penalty During Bubble Phases

def adjust_prediction_for_bubble(prediction, filing_text):
    """Reduce prediction if suspicious AI hype"""

    # Count AI mentions
    ai_mentions = filing_text.lower().count('artificial intelligence')
    ai_mentions += filing_text.lower().count(' ai ')

    # Check for substance
    has_ai_revenue = 'ai revenue' in filing_text.lower()
    has_ai_product = 'ai product' in filing_text.lower()

    # If lots of mentions but no substance = AI washing
    if ai_mentions > 10 and not (has_ai_revenue or has_ai_product):
        bubble_discount = 0.5  # 50% haircut
        adjusted_prediction = prediction * bubble_discount
        print(f"⚠️ AI washing detected, discounting prediction")
        return adjusted_prediction

    return prediction

B. Sector Rotation Detection

def check_sector_bubble(sector):
    """Check if sector is overheated"""

    sector_pe = get_sector_pe(sector)
    historical_pe = get_historical_pe(sector, years=10)

    # If 2+ standard deviations above historical = bubble
    if sector_pe > historical_pe + 2 * std(historical_pe):
        return True, 'BUBBLE_RISK'

    return False, 'NORMAL'

# Usage
is_bubble, _ = check_sector_bubble('Technology')
if is_bubble:
    # Reduce tech exposure
    # Increase defensive sectors (healthcare, utilities)

Biotech + AI Bubble

Pitfall #3

HIGH RISK

Biotech + AI Bubble (Double Bubble!)

Your observation: "Biotech going crazy, AI allowing breakthroughs"

The Reality: Some breakthroughs are REAL, some are hype

Real Breakthroughs (probably sustainable):

AlphaFold (protein folding)
AI-assisted drug discovery (shortens timelines)
Personalized medicine with AI

Hype (bubble risk):

"We use AI for drug discovery" (every biotech says this now)
Valuations 10x based on AI mention alone
No actual drugs in pipeline yet

The Challenge: How does your model distinguish?

Example Filings:

Company A (REAL):
"Our AI-designed drug candidate ABC-123 showed 85% efficacy in Phase 2 trials.
FDA granted Breakthrough Therapy designation. Projected $2B peak sales."

Company B (HYPE):
"We are leveraging cutting-edge AI and machine learning to revolutionize
drug discovery. Our platform has analyzed over 1 million compounds."

Your transformer sees BOTH as bullish → +30% prediction

Reality:
Company A: Real drug, real revenue → Actually +50%
Company B: No pipeline, no revenue → Actually -40%

Mitigation: Look for Substance, Not Buzzwords

def biotech_substance_check(filing):
    """Check if biotech filing has real substance"""

    has_phase_data = any(p in filing for p in ['Phase 1', 'Phase 2', 'Phase 3'])
    has_fda_action = any(f in filing for f in ['FDA approved', 'FDA granted', 'Breakthrough'])
    has_revenue_projection = 'peak sales' in filing.lower()

    substance_score = sum([has_phase_data, has_fda_action, has_revenue_projection])

    if substance_score >= 2:
        return 'REAL'
    else:
        return 'HYPE'

# Adjust prediction
if sector == 'Biotech':
    substance = biotech_substance_check(filing)
    if substance == 'HYPE':
        prediction *= 0.5  # 50% discount

Overfitting to Recent Patterns

Pitfall #4

MEDIUM RISK

Overfitting to Recent Patterns

The problem: Your model is VERY good at 2020-2024, but...

What it Memorized (2020-2024 patterns that might not hold):

"Work from home" products = bullish (COVID era)
Post-COVID: Return to office = bearish for some
"Supply chain disruption" = bullish (scarcity premium)
Post-COVID: Normalized supply = no premium
"Zero interest rate" = growth stocks soar
2024+: Higher rates = value stocks win
"Inflation hedge" = commodities bullish
If inflation cools: Commodities crash
"ESG focus" = valuation premium
2024: ESG fatigue = premium gone

Historical Example (from Ten-K Wizard days):

2005-2007: "Subprime mortgage growth" = bullish
Pattern: More loans = more revenue = stock up

2008: "Subprime mortgage growth" = death sentence
Same pattern, catastrophic result!

Your model can't learn this from 2015-2023 data (no subprime crisis)

Mitigation: Add Macro Features

state = (
    prediction_bucket,
    price_bucket,
    interest_rate_bucket,  # NEW: Fed funds rate
    inflation_bucket,       # NEW: CPI
    vix_bucket,            # NEW: Volatility
    has_position
)

# Now Q-learning can learn:
# "Buybacks good when rates low, bad when rates high"

Transaction Costs Will Destroy You

Pitfall #5

HIGH RISK

Transaction Costs

The hidden killer: Your backtest shows +10% annual returns, but...

Costs You Haven't Included:

Bid-ask spread: 0.1-0.5% per trade
Commission: $0 (Robinhood) to $5 (traditional)
Slippage: 0.2-1.0% (can't trade at exact price)
Market impact: 0.1-2.0% (your order moves price)
Short-term capital gains tax: 37% (if < 1 year holding)

Total: 0.4-3.5% per ROUND TRIP (buy + sell)

Reality vs Backtest:

# Backtest (no costs)
Filing arrives: Prediction +10%
Action: BUY
3 months later: Actual +12%
Action: SELL
Net return: +12% ✅

# Reality (with costs)
Filing arrives: Prediction +10%
Action: BUY
Costs: -0.5% (bid-ask + slippage)
3 months later: Actual +12%
Action: SELL
Costs: -0.5% (bid-ask + slippage)
Tax: -4.0% (37% of 11% gain)
Net return: +7% ❌ (58% of backtest!)

For Ten-Q Capital:

Backtest annual return: +15%
After costs: +8-10%
After taxes: +5-7%

Still good! But 50% haircut from backtest

Mitigation: Include Costs in Training

def execute_action_with_costs(filing, action):
    """Realistic reward function with costs"""

    if action == BUY:
        entry_cost = 0.003  # 0.3% (bid-ask + slippage)
        exit_cost = 0.003   # 0.3%
        reward = filing.actual_return - entry_cost - exit_cost

        # Tax on gains (if < 1 year)
        if reward > 0:
            tax = reward * 0.37  # Short-term cap gains
            reward -= tax

    return reward

Capacity Constraints

Pitfall #6

MEDIUM RISK

Capacity Constraints

The problem: Your strategy works at $1M, but what about $100M?

Example:

Small fund ($1M):
  Filing: TSLA earnings beat
  Signal: BUY 100 shares ($25k)
  Execution: Instant, no market impact ✅

Large fund ($100M):
  Filing: SMCI (small cap) earnings beat
  Signal: BUY $5M worth
  Problem: Daily volume is $10M
  Your order: 50% of daily volume!
  Result: Price moves UP 5% before you're done buying
  Slippage: -5% ❌

Capacity Estimate for Ten-Q Capital:

For $1M fund:
- Can trade stocks with $10M daily volume
- Covers most mid-caps and above ✅

For $10M fund:
- Need $100M daily volume
- Covers large caps ✅

For $100M fund:
- Need $1B daily volume
- Only mega-caps (AAPL, MSFT, GOOGL) ✅
- Small/mid-caps excluded ❌

For $1B fund:
- Need $10B daily volume
- Only top 50 stocks ❌ Strategy breaks down

Your strategy's capacity: $50-100M before slippage kills returns

Mitigation: Liquidity Filter

def check_liquidity(ticker, position_size):
    """Only trade if liquid enough"""

    daily_volume = get_daily_volume(ticker)
    daily_dollar_volume = daily_volume * get_price(ticker)

    # Our position should be < 5% of daily volume
    if position_size < daily_dollar_volume * 0.05:
        return True  # OK to trade
    else:
        return False  # Skip (too illiquid)

Data Quality Issues

Pitfall #7

MEDIUM RISK

Data Quality Issues

The problem: SEC filings are MESSY (you know this from Ten-K Wizard!)

Real Issues:

Filing delays: Filed Friday after close, you see Monday morning, stock already moved 10%
Amended filings: Original: "Revenue $100M", Amendment next day: "Revenue $50M (oops)", traded on wrong data
XBRL errors: Company tagged data wrong, parser extracted wrong numbers
Non-GAAP adjustments: "Adjusted EBITDA" = $50M vs GAAP loss = -$20M
Language changes: 2020 "Impacted by COVID-19" = bad, 2023 "COVID-19 headwinds abating" = good (same words, opposite meaning)

Your advantage (from Ten-K Wizard experience): You KNOW the data is messy, you built a company parsing this stuff, you understand the edge cases

Mitigation: Filing Timestamp Checks

def get_filing_age(filing):
    """Check how old filing is when we see it"""

    filed_time = filing.filing_datetime
    our_time = datetime.now()
    age_hours = (our_time - filed_time).total_seconds() / 3600

    return age_hours

# Skip if stale
age = get_filing_age(filing)
if age > 4:  # More than 4 hours old
    print("Filing too old, price already moved")
    action = HOLD

Regulatory Risk

Pitfall #8

LOW-MEDIUM RISK

Regulatory Risk

The problem: SEC might not like your system

Potential Issues:

Market manipulation? Your system trades immediately after filings, could be seen as "front-running" public info
Material non-public information? Your transformer is VERY good (42.8% correlation), SEC: "How are you so accurate? Do you have inside info?"
Algorithm trading rules: Need to register as algorithmic trader? Need kill switch?
Investment advisor registration: If managing outside money, need to register

From your Ten-K Wizard experience: You understand SEC regulations, you know what's legal vs. gray area, you can structure Ten-Q Capital properly

Mitigation: Use Public Info Only

# ONLY use data available to everyone
sources = [
    'EDGAR filings (public)',
    'Stock prices (public)',
    'Form 3/4/5 (public)',
]

# DO NOT use:
bad_sources = [
    'Expert networks',
    'Non-public company calls',
    'Leaked documents',
]

Summary: Top 5 Risks for Ten-Q Capital

Market Regime Shift (HIGHEST RISK)
Model trained on 2015-2024 (mostly bull market). AI bubble could pop → Patterns break.
Mitigation: Regime detection, market-adjusted returns, VIX-based sizing
Overfitting to Recent Patterns
Model knows 2020-2024 very well. Might not generalize to 2025-2030.
Mitigation: Periodic retraining, macro features, rolling windows
Transaction Costs
Backtest ignores bid-ask, slippage, taxes. Real returns = 50-70% of backtest.
Mitigation: Include costs in training, reduce turnover, batch trades
Capacity Constraints
Works at $1-10M. Breaks down at $100M+ (illiquidity).
Mitigation: Liquidity filters, position scaling
AI Bubble Risk (Your Specific Concern)
"AI" mention = free money in 2024. If bubble pops, model still thinks "AI" = bullish.
Mitigation: AI washing detection, substance checks, fade hype

What Hedge Funds Actually Do

From talking to quant PMs:

1. Conservative Position Sizing

# Theoretical: Risk 10% per trade
# Reality: Risk 0.5-2% per trade

# Why: Preserve capital through regime changes

2. Multiple Models

# Don't rely on ONE model
# Run 5-10 models simultaneously
# Take signal only when they agree

models = [
    'q_learning_model',
    'transformer_only',
    'insider_features_model',
    'macro_adjusted_model',
    'ensemble_model',
]

if agree_threshold(models) > 0.7:  # 70%+ agree
    action = BUY

3. Human Override

# Model suggests BUY
# But PM sees:
# - VIX > 40 (crisis)
# - Sector bubble
# - Recent news negative

# PM overrides to HOLD

# Algorithms suggest, humans decide (at start)

4. Slow Ramp-Up

# Year 1: Paper trading
# Year 2: $100k real money
# Year 3: $1M
# Year 4: $10M
# Year 5: $50M (if still working)

# NOT: $10M day one

"In God we trust, all others must bring data" - W. Edwards Deming

"Markets can remain irrational longer than you can remain solvent" - Keynes

"Past performance is not indicative of future results" - Every hedge fund prospectus ever

Your Domain Expertise Matters

You understand SEC regulations, you've seen bubbles burst, you know data quality issues. Your Ten-K Wizard experience (2000-2008, through dot-com crash and financial crisis) + ML/RL system = Potentially very powerful combination.

But need to respect regime shifts, bubbles, and capacity constraints.

← Back to Ideas