← Back to Ideas

Insider Trading Features: From Raw Events to Predictive Signals

We already have insider data from Forms 3/4/5/13D/13F. The realization: raw events aren't enough. We need feature engineering to unlock the predictive power that academic research shows: cluster buying (+13%), C-suite purchases (+8%), activist stakes (+7-12%).

💡 The Realization

"We DO have insider info - we parse Forms 3/4/5/13D/13F!"

The question: Raw events vs. engineered features? Add now or later?

The answer: Feature engineering transforms isolated events into powerful predictive signals backed by decades of academic research.

Forms We're Already Parsing

FORM 3

Initial Ownership Statement

Event: "filed_initial_ownership_statement"
Who: New officers, directors, 10% owners
When: Within 10 days of becoming insider
FORM 4

Changes in Beneficial Ownership

Event: "acquired_shares", "disposed_shares"
Who: Insiders (officers, directors, 10% owners)
When: Within 2 days of transaction
Detail: Buy/sell, shares, price
FORM 5

Annual Statement

Event: "filed_annual_ownership_statement"
Who: Insiders with small transactions
When: Annually (February)
FORM 13D

Activist/Large Shareholder

Event: "filed_beneficial_ownership_major"
Who: Large shareholders >5%, activists
Detail: Purpose, plans for control
FORM 13F

Institutional Holdings

Event: "disclosed_institutional_holdings"
Who: Investment managers ($100M+ AUM)
When: Quarterly
Detail: All holdings >$200k

The Gap: Raw Events vs. Engineered Features

❌ Current: Raw Events

Event stream:
2024-03-10: "acquired_shares"
  (CEO bought 10k shares)

2024-03-12: "acquired_shares"
  (CFO bought 5k shares)

2024-03-15: "acquired_shares"
  (Director bought 2k shares)

Transformer sees: Three separate "acquired_shares" events with sentiment/materiality. NO aggregation, NO trend detection!

✓ Enhanced: Engineered Features

insider_features = {
  # Trend features
  'insider_buy_count_30d': 3,
  'insider_buy_count_90d': 5,
  'insider_sell_count_30d': 0,

  # Consensus features
  'pct_insiders_buying': 0.75,
  'buy_sell_ratio': 5.0,

  # Size features
  'total_purchase_value': 2500000,
  'avg_purchase_size': 833333,

  # Type features
  'c_suite_buying': True,
  'cluster_buying': True,
}

Much more informative! Transformer learns from patterns, not just isolated events.

Academic Evidence: Why This Matters

Decades of peer-reviewed research shows insider trading features predict stock returns

Lakonishok & Lee (2001) - "Are Insider Trades Informative?"

Finding: Cluster buying (multiple insiders) predicts +13% return over 12 months
Method: Analyzed 80,000 insider transactions
Key Insight: Single insider trades have weaker signal (+3%), multiple insiders = strong signal (+13%)

Seyhun (1986) - "Insiders' Profits, Costs of Trading, and Market Efficiency"

Finding: Insider purchases predict +8% return over 6-12 months
Method: Analyzed Form 4 filings from 1975-1981
Conclusion: Market underreacts to insider trading signals - exploitable alpha opportunity

Brav, Jiang, Partnoy, Thomas (2008) - "Hedge Fund Activism"

Finding: Activist 13D filings predict +7-12% return
Method: 1,059 activist events from 2001-2006
Conclusion: Activists create value through engagement and operational improvements

Cohen, Malloy & Pomorski (2012) - "Routine vs. Opportunistic Insider Trading"

Finding: "Routine" trades (predictable patterns) → 0% alpha
"Opportunistic" trades (unusual timing) → +10% alpha
Implication: Raw events don't distinguish routine vs. opportunistic - feature engineering required!

Best Insider Signals (Ranked by Academic Evidence)

1

Cluster Buying (Multiple Insiders)

Feature: insider_buy_count_30d >= 3

Academic Return: +13% over 12 months

★★★★★
NOT extracted
2

C-Suite Purchases (CEO, CFO)

Feature: c_suite_buying = True

Academic Return: +8% over 6-12 months

★★★★☆
Can extract
3

Large Purchases (>$1M or >1% shares)

Feature: purchase_size, pct_of_shares

Academic Return: Strong predictor

★★★★☆
NOT extracted
4

Opportunistic Timing (Unusual, Not Routine)

Feature: is_routine_trade = False

Academic Return: +10% alpha

★★★★☆
Needs patterns
5

Activist Stakes (13D with Control Intent)

Feature: activist_stake_pct, stated_intent

Academic Return: +7-12% over 12 months

★★★★☆
Have events, need parsing
6

Smart Money Clustering (Multiple 13Fs)

Feature: num_institutions_buying

Academic Return: +15% for multiple funds

★★★☆☆
Need aggregation

Integration Strategies

Option 1: Add to Transformer Input RECOMMENDED

Architecture: 512 events + Insider features → Transformer → Prediction

# Enhanced transformer input
sequence_with_features = {
  'event_ids': [event_id_1, ..., event_id_512],
  'insider_buy_count_30d': 3,
  'c_suite_buying': True,
  'total_purchase_value': 2500000,
  'buy_sell_ratio': 5.0,
  'activist_stake_pct': 0.0,
}

Pros

  • Transformer learns to weight insider signals vs. events
  • Features available for all predictions
  • Can learn complex interactions (insider + revenue growth)
  • Maximum predictive power

Cons

  • Requires retraining transformer
  • More complex input format
  • Longer development time

Option 2: Add to Q-Learning State EASIER

Architecture: Transformer → Prediction + Insider features → Q-learning → Action

# Enhanced Q-learning state
state = (
  return_prediction,    # From transformer (512 events)
  confidence,
  price_change_1d,
  has_position,
  # NEW: Insider features
  insider_buy_count_30d,
  c_suite_buying,
  buy_sell_ratio
)

Pros

  • No transformer retraining needed
  • Quick to implement
  • Q-learning learns when insider signals matter
  • Measurable improvement immediately

Cons

  • Transformer doesn't see insider features (missed interactions)
  • Q-learning has simpler model (less pattern detection)
  • May not capture full alpha potential

Option 3: Hybrid Approach BEST LONG-TERM

Architecture: 512 events + Insider → Transformer → Prediction + Real-time insider → Q-learning → Action

# Transformer training (historical)
# Learns: "When 3+ insiders buy + revenue growth → +15% return"

# Q-learning (live trading)
# Learns: "When prediction +15% + NEW insider buying today → BUY"

Pros

  • Transformer learns historical insider patterns
  • Q-learning adapts to real-time insider activity
  • Best of both worlds
  • Maximum alpha generation

Cons

  • Most complex implementation
  • Requires both transformer retrain AND Q-learning update
  • Longer timeline

Phased Implementation Plan

Phase 1: Basic System (Now - Next 2 Weeks)

Timeline: Immediate | Status: In Progress

Focus:

  • Learn RL fundamentals (current phase)
  • Load transformer predictions from checkpoint
  • Implement Q-learning with basic state (prediction + market context)
  • Backtest, validate, prove concept
# Simple state design
state = (
  return_prediction,    # From transformer (512 events)
  confidence,
  price_change_1d,
  has_position
)

Why: Prove the architecture works, build confidence

Phase 2: Add Insider Features to Q-Learning (2-4 Weeks)

Timeline: 2-4 weeks | Status: Planned

Focus:

  • Extract insider features (Forms 3/4/5)
  • Add to Q-learning state (easier than transformer retrain)
  • Compare performance (with vs. without insider features)
  • Validate improvement
# Enhanced state design
state = (
  return_prediction,
  confidence,
  price_change_1d,
  has_position,
  # NEW: Top insider signals
  insider_buy_count_30d,     # Cluster buying
  c_suite_buying,            # CEO/CFO buying
  buy_sell_ratio,            # Consensus
  activist_stake_pct         # 13D activism
)

Why: Quick wins, no transformer retrain, measurable impact

Phase 3: Add to Transformer (1-2 Months)

Timeline: 1-2 months | Status: Conditional on Phase 2 results

Focus (if Phase 2 shows strong impact):

  • Add insider features to transformer input
  • Retrain with 10-year dataset + insider features
  • Let transformer learn interactions (insider + events)
  • Measure correlation improvement (42.8% → 50%+?)

Why: Maximum predictive power, but only if Phase 2 proves value

The Bottom Line

We have insider data from Forms 3/4/5/13D/13F, but we're not using it optimally. Raw events capture isolated transactions. Feature engineering transforms them into powerful predictive signals backed by decades of academic research.

Expected Impact: +5-8% improvement over transformer-only baseline
Implementation: Python extraction code already written and ready
Next Step: Add to Q-learning state after Phase 1 validation