Insider Trading Features: From Raw Events to Predictive Signals
We already have insider data from Forms 3/4/5/13D/13F. The realization: raw events aren't enough.
We need feature engineering to unlock the predictive power that academic research shows:
cluster buying (+13%), C-suite purchases (+8%), activist stakes (+7-12%).
💡 The Realization
"We DO have insider info - we parse Forms 3/4/5/13D/13F!"
The question: Raw events vs. engineered features? Add now or later?
The answer: Feature engineering transforms isolated events into powerful predictive signals backed by decades of academic research.
Academic Evidence: Why This Matters
Decades of peer-reviewed research shows insider trading features predict stock returns
Lakonishok & Lee (2001) - "Are Insider Trades Informative?"
Finding: Cluster buying (multiple insiders) predicts
+13% return over 12 months
Method: Analyzed 80,000 insider transactions
Key Insight: Single insider trades have weaker signal (+3%), multiple insiders = strong signal (+13%)
Seyhun (1986) - "Insiders' Profits, Costs of Trading, and Market Efficiency"
Finding: Insider purchases predict
+8% return over 6-12 months
Method: Analyzed Form 4 filings from 1975-1981
Conclusion: Market underreacts to insider trading signals - exploitable alpha opportunity
Brav, Jiang, Partnoy, Thomas (2008) - "Hedge Fund Activism"
Finding: Activist 13D filings predict
+7-12% return
Method: 1,059 activist events from 2001-2006
Conclusion: Activists create value through engagement and operational improvements
Cohen, Malloy & Pomorski (2012) - "Routine vs. Opportunistic Insider Trading"
Finding: "Routine" trades (predictable patterns) → 0% alpha
"Opportunistic" trades (unusual timing) → +10% alpha
Implication: Raw events don't distinguish routine vs. opportunistic - feature engineering required!
Integration Strategies
Option 1: Add to Transformer Input
RECOMMENDED
Architecture: 512 events + Insider features → Transformer → Prediction
# Enhanced transformer input
sequence_with_features = {
'event_ids': [event_id_1, ..., event_id_512],
'insider_buy_count_30d': 3,
'c_suite_buying': True,
'total_purchase_value': 2500000,
'buy_sell_ratio': 5.0,
'activist_stake_pct': 0.0,
}
Pros
- Transformer learns to weight insider signals vs. events
- Features available for all predictions
- Can learn complex interactions (insider + revenue growth)
- Maximum predictive power
Cons
- Requires retraining transformer
- More complex input format
- Longer development time
Option 2: Add to Q-Learning State
EASIER
Architecture: Transformer → Prediction + Insider features → Q-learning → Action
# Enhanced Q-learning state
state = (
return_prediction, # From transformer (512 events)
confidence,
price_change_1d,
has_position,
# NEW: Insider features
insider_buy_count_30d,
c_suite_buying,
buy_sell_ratio
)
Pros
- No transformer retraining needed
- Quick to implement
- Q-learning learns when insider signals matter
- Measurable improvement immediately
Cons
- Transformer doesn't see insider features (missed interactions)
- Q-learning has simpler model (less pattern detection)
- May not capture full alpha potential
Option 3: Hybrid Approach
BEST LONG-TERM
Architecture: 512 events + Insider → Transformer → Prediction + Real-time insider → Q-learning → Action
# Transformer training (historical)
# Learns: "When 3+ insiders buy + revenue growth → +15% return"
# Q-learning (live trading)
# Learns: "When prediction +15% + NEW insider buying today → BUY"
Pros
- Transformer learns historical insider patterns
- Q-learning adapts to real-time insider activity
- Best of both worlds
- Maximum alpha generation
Cons
- Most complex implementation
- Requires both transformer retrain AND Q-learning update
- Longer timeline
Phased Implementation Plan
Phase 1: Basic System (Now - Next 2 Weeks)
Timeline: Immediate | Status: In Progress
Focus:
- Learn RL fundamentals (current phase)
- Load transformer predictions from checkpoint
- Implement Q-learning with basic state (prediction + market context)
- Backtest, validate, prove concept
# Simple state design
state = (
return_prediction, # From transformer (512 events)
confidence,
price_change_1d,
has_position
)
Why: Prove the architecture works, build confidence
Phase 2: Add Insider Features to Q-Learning (2-4 Weeks)
Timeline: 2-4 weeks | Status: Planned
Focus:
- Extract insider features (Forms 3/4/5)
- Add to Q-learning state (easier than transformer retrain)
- Compare performance (with vs. without insider features)
- Validate improvement
# Enhanced state design
state = (
return_prediction,
confidence,
price_change_1d,
has_position,
# NEW: Top insider signals
insider_buy_count_30d, # Cluster buying
c_suite_buying, # CEO/CFO buying
buy_sell_ratio, # Consensus
activist_stake_pct # 13D activism
)
Why: Quick wins, no transformer retrain, measurable impact
Phase 3: Add to Transformer (1-2 Months)
Timeline: 1-2 months | Status: Conditional on Phase 2 results
Focus (if Phase 2 shows strong impact):
- Add insider features to transformer input
- Retrain with 10-year dataset + insider features
- Let transformer learn interactions (insider + events)
- Measure correlation improvement (42.8% → 50%+?)
Why: Maximum predictive power, but only if Phase 2 proves value
The Bottom Line
We have insider data from Forms 3/4/5/13D/13F, but we're not using it optimally.
Raw events capture isolated transactions. Feature engineering transforms them into
powerful predictive signals backed by decades of academic research.
Expected Impact: +5-8% improvement over transformer-only baseline
Implementation: Python extraction code already written and ready
Next Step: Add to Q-learning state after Phase 1 validation