← Back to Ideas

Insider Trading Features: From Raw Events to Predictive Signals

We already have insider data from Forms 3/4/5/13D/13F. The realization: raw events aren't enough. We need feature engineering to unlock the predictive power that academic research shows: cluster buying (+13%), C-suite purchases (+8%), activist stakes (+7-12%).

💡 The Realization

"We DO have insider info - we parse Forms 3/4/5/13D/13F!"

The question: Raw events vs. engineered features? Add now or later?

The answer: Feature engineering transforms isolated events into powerful predictive signals backed by decades of academic research.

Forms We're Already Parsing

FORM 3

Initial Ownership Statement

Event: "filed_initial_ownership_statement"

Who: New officers, directors, 10% owners

When: Within 10 days of becoming insider

FORM 4

Changes in Beneficial Ownership

Event: "acquired_shares", "disposed_shares"

Who: Insiders (officers, directors, 10% owners)

When: Within 2 days of transaction

Detail: Buy/sell, shares, price

FORM 5

Annual Statement

Event: "filed_annual_ownership_statement"

Who: Insiders with small transactions

When: Annually (February)

FORM 13D

Activist/Large Shareholder

Event: "filed_beneficial_ownership_major"

Who: Large shareholders >5%, activists

Detail: Purpose, plans for control

FORM 13F

Institutional Holdings

Event: "disclosed_institutional_holdings"

Who: Investment managers ($100M+ AUM)

When: Quarterly

Detail: All holdings >$200k

The Gap: Raw Events vs. Engineered Features

❌ Current: Raw Events

Event stream:

2024-03-10: "acquired_shares"

  (CEO bought 10k shares)

2024-03-12: "acquired_shares"

  (CFO bought 5k shares)

2024-03-15: "acquired_shares"

  (Director bought 2k shares)

Transformer sees: Three separate "acquired_shares" events with sentiment/materiality. NO aggregation, NO trend detection!

✓ Enhanced: Engineered Features

insider_features = {

  # Trend features

  'insider_buy_count_30d': 3,

  'insider_buy_count_90d': 5,

  'insider_sell_count_30d': 0,

  # Consensus features

  'pct_insiders_buying': 0.75,

  'buy_sell_ratio': 5.0,

  # Size features

  'total_purchase_value': 2500000,

  'avg_purchase_size': 833333,

  # Type features

  'c_suite_buying': True,

  'cluster_buying': True,

}

Much more informative! Transformer learns from patterns, not just isolated events.

Academic Evidence: Why This Matters

Decades of peer-reviewed research shows insider trading features predict stock returns

Lakonishok & Lee (2001) - "Are Insider Trades Informative?"

Finding: Cluster buying (multiple insiders) predicts +13% return over 12 months

Method: Analyzed 80,000 insider transactions

Key Insight: Single insider trades have weaker signal (+3%), multiple insiders = strong signal (+13%)

Seyhun (1986) - "Insiders' Profits, Costs of Trading, and Market Efficiency"

Finding: Insider purchases predict +8% return over 6-12 months

Method: Analyzed Form 4 filings from 1975-1981

Conclusion: Market underreacts to insider trading signals - exploitable alpha opportunity

Brav, Jiang, Partnoy, Thomas (2008) - "Hedge Fund Activism"

Finding: Activist 13D filings predict +7-12% return

Method: 1,059 activist events from 2001-2006

Conclusion: Activists create value through engagement and operational improvements

Cohen, Malloy & Pomorski (2012) - "Routine vs. Opportunistic Insider Trading"

Finding: "Routine" trades (predictable patterns) → 0% alpha
"Opportunistic" trades (unusual timing) → +10% alpha

Implication: Raw events don't distinguish routine vs. opportunistic - feature engineering required!

Best Insider Signals (Ranked by Academic Evidence)

Cluster Buying (Multiple Insiders)

Feature: insider_buy_count_30d >= 3

Academic Return: +13% over 12 months

★★★★★

NOT extracted

C-Suite Purchases (CEO, CFO)

Feature: c_suite_buying = True

Academic Return: +8% over 6-12 months

★★★★☆

Can extract

Large Purchases (>$1M or >1% shares)

Feature: purchase_size, pct_of_shares

Academic Return: Strong predictor

★★★★☆

NOT extracted

Opportunistic Timing (Unusual, Not Routine)

Feature: is_routine_trade = False

Academic Return: +10% alpha

★★★★☆

Needs patterns

Activist Stakes (13D with Control Intent)

Feature: activist_stake_pct, stated_intent

Academic Return: +7-12% over 12 months

★★★★☆

Have events, need parsing

Smart Money Clustering (Multiple 13Fs)

Feature: num_institutions_buying

Academic Return: +15% for multiple funds

★★★☆☆

Need aggregation

Integration Strategies

Option 1: Add to Transformer Input RECOMMENDED

Architecture: 512 events + Insider features → Transformer → Prediction

# Enhanced transformer input

sequence_with_features = {

  'event_ids': [event_id_1, ..., event_id_512],

  'insider_buy_count_30d': 3,

  'c_suite_buying': True,

  'total_purchase_value': 2500000,

  'buy_sell_ratio': 5.0,

  'activist_stake_pct': 0.0,

}

Pros

Transformer learns to weight insider signals vs. events
Features available for all predictions
Can learn complex interactions (insider + revenue growth)
Maximum predictive power

Cons

Requires retraining transformer
More complex input format
Longer development time

Option 2: Add to Q-Learning State EASIER

Architecture: Transformer → Prediction + Insider features → Q-learning → Action

# Enhanced Q-learning state

state = (

  return_prediction,    # From transformer (512 events)

  confidence,

  price_change_1d,

  has_position,

  # NEW: Insider features

  insider_buy_count_30d,

  c_suite_buying,

  buy_sell_ratio

)

Pros

No transformer retraining needed
Quick to implement
Q-learning learns when insider signals matter
Measurable improvement immediately

Cons

Transformer doesn't see insider features (missed interactions)
Q-learning has simpler model (less pattern detection)
May not capture full alpha potential

Option 3: Hybrid Approach BEST LONG-TERM

Architecture: 512 events + Insider → Transformer → Prediction + Real-time insider → Q-learning → Action

# Transformer training (historical)

# Learns: "When 3+ insiders buy + revenue growth → +15% return"

# Q-learning (live trading)

# Learns: "When prediction +15% + NEW insider buying today → BUY"

Pros

Transformer learns historical insider patterns
Q-learning adapts to real-time insider activity
Best of both worlds
Maximum alpha generation

Cons

Most complex implementation
Requires both transformer retrain AND Q-learning update
Longer timeline

Phased Implementation Plan

Phase 1: Basic System (Now - Next 2 Weeks)

Timeline: Immediate | Status: In Progress

Focus:

Learn RL fundamentals (current phase)
Load transformer predictions from checkpoint
Implement Q-learning with basic state (prediction + market context)
Backtest, validate, prove concept

# Simple state design

state = (

  return_prediction,    # From transformer (512 events)

  confidence,

  price_change_1d,

  has_position

)

Why: Prove the architecture works, build confidence

Phase 2: Add Insider Features to Q-Learning (2-4 Weeks)

Timeline: 2-4 weeks | Status: Planned

Focus:

Extract insider features (Forms 3/4/5)
Add to Q-learning state (easier than transformer retrain)
Compare performance (with vs. without insider features)
Validate improvement

# Enhanced state design

state = (

  return_prediction,

  confidence,

  price_change_1d,

  has_position,

  # NEW: Top insider signals

  insider_buy_count_30d,     # Cluster buying

  c_suite_buying,            # CEO/CFO buying

  buy_sell_ratio,            # Consensus

  activist_stake_pct         # 13D activism

)

Why: Quick wins, no transformer retrain, measurable impact

Phase 3: Add to Transformer (1-2 Months)

Timeline: 1-2 months | Status: Conditional on Phase 2 results

Focus (if Phase 2 shows strong impact):

Add insider features to transformer input
Retrain with 10-year dataset + insider features
Let transformer learn interactions (insider + events)
Measure correlation improvement (42.8% → 50%+?)

Why: Maximum predictive power, but only if Phase 2 proves value

The Bottom Line

We have insider data from Forms 3/4/5/13D/13F, but we're not using it optimally. Raw events capture isolated transactions. Feature engineering transforms them into powerful predictive signals backed by decades of academic research.

Expected Impact: +5-8% improvement over transformer-only baseline
Implementation: Python extraction code already written and ready
Next Step: Add to Q-learning state after Phase 1 validation