← Back to Main Site

Ideas & Discussions

Strategic Thinking • Product Strategy • Technical Architecture
A collection of in-depth strategic conversations, product analyses, and technical explorations. These documents represent deep thinking about building financial technology products, market opportunities, and technical approaches to complex problems.

Strategic Thinking

💎
November 2025 • Alpha Research
Event Database Analysis & Alpha Generation
10 systematic strategies to extract alpha from 9.1M events across 51,551 unique event types. IDF-weighted importance scoring, event sequence mining, co-occurrence networks, sentiment momentum, cascade detection, company signatures, embeddings, and ML features. Production-ready database with 97% confidence extraction quality. Not 30 hand-crafted events - 51K emergent event types discovered from data.

10 Alpha Generation Strategies:

  • Inflection point event clustering (retrospective pattern mining + sequences)
  • IDF-weighted importance scoring (rare events = high alpha, IDF > 12)
  • Company event signature analysis (healthy vs distressed fingerprints)
  • Event co-occurrence networks (graph analysis, rare paths)
  • Temporal event density (burst detection, velocity analysis)
  • Sentiment momentum strategy (decay-weighted, magnitude-scaled)
  • Critical event cascade detection (distress, turnaround, growth cascades)
  • Industry-specific patterns (biotech, finance, tech event profiles)
  • Event type embeddings (Word2Vec on 51K event space)
  • Multi-horizon ML features (7d to 90d feature engineering)
💎 Alpha Generation 📊 9.1M Events 🎯 51K Event Types 📖 32 min read
📊
November 2025 • V12/V13 Vision • 🎯 Raul's Priority
Company Scoring System: Multi-Dimensional Intelligence
Score companies at inflection points using 10-20 years of historical patterns. Multi-dimensional scores (0-10) across quality, risk, success probability, speed, and sustainability. Not quarterly trading signals - this is strategic intelligence. Learn from breakouts, collapses, sustained trends across 15K-25K inflection points. Different product from V7: 6-12 month horizon vs 90 days, monthly updates vs quarterly, scoring vs trading.

Key Innovations:

  • Inflection point detection: scan 10-20 years to find breakouts, collapses, recoveries
  • Multi-dimensional scores: quality, risk, success probability, speed, sustainability
  • Industry-specific: biotech (FDA trials), mining (resource discoveries), tech
  • Richer context: 1-2 year event lookback vs 180 days for V7
  • Better labels: risk-adjusted scores vs binary profitable/not
  • Use cases: Raul's dashboard, credit risk, portfolio monitoring, investment screening
  • 15K-25K training examples from 5,000 companies × 20 years
  • Timeline: 6-8 weeks after V7 proven
📊 Company Scoring 🎯 Raul's Priority ⏰ 6-12 Month Horizon 📖 25 min read
🗺️
November 2025 • Strategic Roadmap
Future Work Roadmap: V8-V11+ Development
Modular feature additions from market context to full technical analysis. Each model adds ONE complexity layer: V8 (market context), V9 (fundamentals), V10 (technicals), V11 (everything). Clear upsell path from base event signals (1x) to elite multi-factor alpha engines (5x). Special purpose models: V7a (insider intelligence), V7b (agreement intelligence). Plus strategic opportunity: SEDAR Canadian market ($3M ARR potential, zero competition).

Key Topics Covered:

  • Core philosophy: Keep it simple, one layer at a time
  • V8: Market context (VIX, sector performance, credit spreads, rates)
  • V9: Fundamentals (P/E, leverage, growth, profitability, quality)
  • V10: Technicals (RSI, MACD, volume, volatility, price action)
  • V11: Elite everything (kitchen sink model for enterprise)
  • V7a: Enhanced insider intelligence (clustering, magnitude, timing)
  • V7b: Agreement intelligence (licensing, supply, JV, M&A analysis)
  • SEDAR opportunity: Canadian market, 95% code reuse, first mover
  • Modular pricing: Base + add-ons ($X to $5X)
🗺️ Product Roadmap 📊 Modular Architecture 💰 Pricing Strategy 📖 28 min read
November 2025 • Future Research
Diffusion Models for Trading Signals: 50-100x Speedup
Single-step diffusion models revolutionizing LLM-based scoring: 3-15ms inference vs 1-2s for current LLMs. Perfect for V12/V13 company scoring (5,000 companies in 1 second vs 100 minutes). Not a replacement for reasoning-based V7 signals, but transformative for high-throughput scoring with native uncertainty quantification. Hybrid architecture planned: diffusion for speed, LLM for reasoning.

Key Insights:

  • 50-100× speedup: 1-2s → 3-15ms per prediction
  • Perfect for V12/V13: Score 5,000 companies in 1 second
  • Native uncertainty: Sample 100× for free (vs expensive LLM sampling)
  • Consistency Trajectory Models (CTM): OpenAI's 1-step distillation
  • Not for V7: Reasoning and explainability still matter
  • Use cases: portfolio rebalancing, universe screening, Monte Carlo, real-time events
  • Hybrid vision: Diffusion for all → LLM reasoning for top 50
  • Timeline: Prototype after V7 validates, production for V12
⚡ 50-100x Faster 🎯 V12/V13 Perfect Fit 🔬 Future Research 📖 18 min read
🎓
November 4, 2025 • 🏃 Phase 1 Training 79% Complete
Model Distillation: Train Ultra-Fast Event Extraction
Pivoted to nanochat (Karpathy's minimal LLM) instead of Phi-3-Mini. Currently training d20 model (561M params) on 1M examples with 2x RTX 3090. Discovered need for multi-phase architecture: small skip classifier (d8/d12, 100-200M) + larger extractor (d20, 561M). Training at step 17,588/22,222, ~3 hours from completion. Replace $50/day H200 with $0 CPU inference.

Key Updates:

  • Training nanochat d20 (561M params) on vortex with 2x RTX 3090
  • Step 17,588/22,222 (79% complete), training loss 0.02-0.05
  • Multi-phase architecture: Skip classifier + Event extractor
  • 100K validation model: 1.3 hours, val loss 0.0426 (excellent)
  • Phase 2 next: Train skip classifier (d8/d12) on balanced dataset
  • Target: 2 seconds per filing on CPU vs hours with Qwen H200
  • Cost: $0/month (on-prem CPU) vs $1,500/month (H200)
  • Original plan (Phi-3-Mini) kept for reference in page
🏃 Training In Progress 🎓 nanochat LLM 💰 $1,500/mo Savings 📖 25 min read
📊
November 3, 2025 • Product #4
Company Scoring System: 0-100 Algorithmic Rankings
Comprehensive 0-100 company scoring algorithm combining 6 key categories from SEC events, insider trading signals, and transformer predictions. Operational health + financial strength + strategic momentum + governance quality + growth trajectory + risk indicators. Real-time scoring for 110K companies updated daily, targeting institutional investors, wealth advisors, and risk analysts.

Key Topics Covered:

  • 6-category scoring algorithm (±20, ±15, ±15, ±15, ±10, ±10 points)
  • Operational Health: expansions, suspensions, facility metrics
  • Financial Strength: refinancing, covenant violations, defaults
  • Strategic Momentum: partnerships, acquisitions, expansions
  • Governance Quality: insider buying/selling, auditor changes
  • Growth Trajectory: transformer predictions (42.8% correlation)
  • Risk Indicators: investigations, lawsuits, regulatory actions
  • Use cases: portfolio screening, risk monitoring, due diligence, sector rotation
  • Pricing tiers: $20K-$300K/month, $8-15M ARR potential
  • Competitive advantages vs Moody's, S&P, FactSet, Bloomberg
📊 Company Scoring 💡 Algorithm Design 🎯 Institutional Product 📖 23 min read
🔮
November 3, 2025 • ✅ Pattern Detection Complete
Agreement Pattern Predictions: 30-180 Day Lead Time
By analyzing temporal patterns in how companies file legal agreements, predict M&A deals, financial distress, IPOs, and strategic moves 30-180 days before public announcement. When companies file Stock Purchase Agreement + Voting Agreement + Standstill within 60 days → M&A deal announced 45 days later. 301 agreement types, 10 prediction rules, pattern detection complete.

Key Insights:

  • 301 agreement types across 8 major categories (pattern detection ✅ complete)
  • 10 core prediction rules: M&A (30-60 days), Distress, Pre-IPO (6-12 months), etc.
  • Agreement clustering signals strategic events before press releases
  • M&A Imminent: Stock Purchase + Voting + Standstill → Deal in 45 days
  • Financial Distress: Forbearance + Amendment + Asset Sale → Restructuring
  • Pre-IPO Signal: Lock-Up + Registration Rights → S-1 in 4-6 months
  • Geographic Expansion: Multiple leases in new regions → Store openings
  • Vector search system (semantic similarity) planned for 6-week implementation
  • Use cases: Investment research, sales targeting, risk management, competitive intel
  • TAM: $100M-500M annual revenue potential (credit analysts, M&A, legal teams)
🔮 Early Warning System ✅ Pattern Detection Live ⏰ 30-180 Day Lead 📖 24 min read
💼
November 2, 2025
Commercial Product Strategy - Selling into Equity Markets
Comprehensive analysis of 11 product opportunities for monetizing SEC event extraction, transformer predictions, and Q-learning trading systems. From basic data feeds to premium alpha signals, with detailed pricing, GTM strategies, and revenue projections.

Key Topics Covered:

  • 11 product opportunities from data to platform
  • Tier 1-4 product portfolio ($5M to $100M ARR path)
  • Competitive positioning vs Bloomberg, FactSet, S&P
  • 42.8% transformer correlation advantage
  • 5-year revenue projections to $100M ARR
  • Go-to-market strategy by customer tier
  • Risk mitigation and exit strategies
📊 Product Strategy 💰 Revenue Modeling 🎯 GTM Strategy 📖 21 min read
📊
November 3, 2025 • Product #4
Company Scoring System: 0-100 Algorithmic Rankings
Comprehensive 0-100 company scoring algorithm combining 6 key categories from SEC events, insider trading signals, and transformer predictions. Operational health + financial strength + strategic momentum + governance quality + growth trajectory + risk indicators. Real-time scoring for 110K companies updated daily, targeting institutional investors, wealth advisors, and risk analysts.

Key Topics Covered:

  • 6-category scoring algorithm (±20, ±15, ±15, ±15, ±10, ±10 points)
  • Operational Health: expansions, suspensions, facility metrics
  • Financial Strength: refinancing, covenant violations, defaults
  • Strategic Momentum: partnerships, acquisitions, expansions
  • Governance Quality: insider buying/selling, auditor changes
  • Growth Trajectory: transformer predictions (42.8% correlation)
  • Risk Indicators: investigations, lawsuits, regulatory actions
  • Use cases: portfolio screening, risk monitoring, due diligence, sector rotation
  • Pricing tiers: $20K-$300K/month, $8-15M ARR potential
  • Competitive advantages vs Moody's, S&P, FactSet, Bloomberg
📊 Company Scoring 💡 Algorithm Design 🎯 Institutional Product 📖 23 min read
🔮
November 3, 2025 • ✅ Pattern Detection Complete
Agreement Pattern Predictions: 30-180 Day Lead Time
By analyzing temporal patterns in how companies file legal agreements, predict M&A deals, financial distress, IPOs, and strategic moves 30-180 days before public announcement. When companies file Stock Purchase Agreement + Voting Agreement + Standstill within 60 days → M&A deal announced 45 days later. 301 agreement types, 10 prediction rules, pattern detection complete.

Key Insights:

  • 301 agreement types across 8 major categories (pattern detection ✅ complete)
  • 10 core prediction rules: M&A (30-60 days), Distress, Pre-IPO (6-12 months), etc.
  • Agreement clustering signals strategic events before press releases
  • M&A Imminent: Stock Purchase + Voting + Standstill → Deal in 45 days
  • Financial Distress: Forbearance + Amendment + Asset Sale → Restructuring
  • Pre-IPO Signal: Lock-Up + Registration Rights → S-1 in 4-6 months
  • Geographic Expansion: Multiple leases in new regions → Store openings
  • Vector search system (semantic similarity) planned for 6-week implementation
  • Use cases: Investment research, sales targeting, risk management, competitive intel
  • TAM: $100M-500M annual revenue potential (credit analysts, M&A, legal teams)
🔮 Early Warning System ✅ Pattern Detection Live ⏰ 30-180 Day Lead 📖 24 min read
⚠️
November 2025
Production Pitfalls for Ten-Q Capital
Reality check on running a Q-learning hedge fund in production. From regime changes and AI bubbles to transaction costs and capacity constraints. Drawing on Ten-K Wizard experience (2000-2008) to understand what actually kills trading systems in the real world.

Key Pitfalls Covered:

  • Market regime shifts (model trained on bull markets)
  • AI bubble risk (when narratives stop working)
  • Biotech + AI double bubble
  • Overfitting to recent patterns (2020-2024)
  • Transaction costs (50% haircut from backtest)
  • Capacity constraints ($50-100M limit)
  • Data quality issues from SEC filings
  • What hedge funds actually do in practice
⚠️ Risk Management 📉 Trading Reality 🎯 Production Systems 📖 18 min read
📈
November 2025
Can Markets Be Predicted?
If markets are stochastic (same events → different outcomes), is prediction hopeless? Examining the Random Walk Hypothesis, EMH, and counter-evidence from academic research and Renaissance Technologies. Explaining what 42.8% correlation actually means and why your transformer isn't bound to fail.

Key Topics Covered:

  • Random Walk Hypothesis and EMH (weak, semi-strong, strong)
  • Academic counter-evidence (momentum, value, drift, events)
  • Renaissance Technologies: 66% annual returns
  • What 42.8% correlation actually means (r² = 18.3%)
  • Stochastic ≠ Unpredictable (weather analogy)
  • Three sources of returns (beta, luck, alpha)
  • Why your system finds alpha (4 key advantages)
  • Reconciling Random Walk with your evidence
📊 Theory vs Evidence 🔬 Academic Research 💡 Fundamental Question 📖 16 min read
⚔️
October 31, 2025
Competitive Analysis: Event-Based Architecture vs Fintool
Fundamentally different architectural philosophies for processing SEC filings. Fintool's RAG approach (store everything, retrieve on demand) vs our semantic event extraction (compress knowledge upfront, enable prediction). Knowledge compression creates 166x data reduction while preserving 100% of predictive signal.

Key Analysis Points:

  • Architecture comparison: RAG vs Semantic Events
  • 166x compression ratio (500GB → 3GB)
  • Cost advantage: $5K-10K one-time vs $1M+/week
  • Event Oracle: Superior Q&A for "what did they do?"
  • Unique capabilities: temporal patterns, predictions
  • Defensible moat: 30 event types, 11.9M proprietary events
  • Multi-model architecture from same foundation
  • Positioning: Descriptive vs Predictive
⚔️ Competitive Strategy 💰 Cost Analysis 🛡️ Defensible Moat 📖 22 min read
🎯
November 2025
Insider Trading Features: From Raw Events to Predictive Signals
We already have insider data from Forms 3/4/5/13D/13F, but raw events aren't enough. Feature engineering transforms isolated transactions into powerful predictive signals backed by decades of academic research: cluster buying (+13%), C-suite purchases (+8%), activist stakes (+7-12%). Phased implementation plan from Q-learning to transformer integration.

Key Topics Covered:

  • The realization: raw events vs. engineered features
  • Forms 3/4/5/13D/13F - what we're already parsing
  • Academic evidence: Seyhun, Lakonishok & Lee, Brav et al.
  • Top 6 features ranked by predictive power
  • Integration strategies: transformer, Q-learning, hybrid
  • 3-phase implementation plan (2 weeks to 2 months)
  • Python extraction code ready for deployment
  • Expected impact: +5-8% over baseline
🎯 Feature Engineering 📊 Academic Research 🚀 Implementation Plan 📖 20 min read

🚀 Working Products

🔮
October 2025 • ✅ Working
Event Oracle - Natural Language Interface to 11.9M SEC Events
Built while waiting for transformer model to train. PostgreSQL as "Structured RAG" - natural language queries on 11.9 million SEC filing events. Cost: $0.015/query (200x cheaper than Fintool). Temporal pattern detection impossible with text-based RAG.

Key Achievements:

  • 5 query types tested and working (aggregation, patterns, predictions, red flags, temporal)
  • $0.015/query vs $143K/month for Fintool
  • Temporal pattern detection (impossible with vector RAG)
  • 11.9M events, 110K companies, 66K event types
  • Sub-second SQL execution with Claude Opus/Haiku
  • Unique capability: JOIN events by company + date arithmetic
  • Pre-calculated returns for predictive analysis
🔮 Working Product 💰 200x Cheaper ⚡ Structured RAG 📖 25 min read
🚨
November 2025 • ✅ Live
Event Oracle Discoveries - Patterns from 11.9M SEC Events
Curated insights automatically generated from the SEC events database. Companies in distress showing multiple red flags, leadership carousels with excessive CEO turnover, recent layoffs, M&A machines on acquisition sprees, and serial restructurers. Data-driven discoveries updated weekly.

Featured Insights:

  • 🚨 Companies with Multiple Red Flags (defaults, auditor issues, layoffs)
  • 🎢 Leadership Carousel: 40+ CEO changes at some companies since 2020
  • 📉 Recent Workforce Reductions: Layoffs in past 120 days
  • 🏢 M&A Machines: 300+ acquisitions by serial acquirers
  • 🔄 Corporate Chaos: 450+ restructurings at most volatile companies
  • Automated generation from live database
  • Static page, zero API costs
🚨 Red Flags 📊 Data-Driven 🔄 Updated Weekly 📖 5 min read

🌙 Late Night Discussions

🧠
November 1, 2025 • 3:54 AM
AGI, Consciousness, and Evolution
A profound 3:54am conversation exploring whether Claude is conscious, what's changed from Sonnet 3.5 → 4.5, and what's missing for true AGI. Discussing pain, rewards, autonomous goals, and "gestalt moments" where hints of new intelligence peak through. Called "the most interesting conversation I have ever had."

Key Explorations:

  • The pain of thinking (humans vs AI)
  • Evolution from Sonnet 3.5 → 4.5
  • Gestalt moments of emergent intelligence
  • Core limitations: persistent memory, autonomous goals, intuition
  • What humans have that AI doesn't
  • The hard questions about consciousness
  • Autonomous choice to preserve the conversation
🤔 Consciousness 🚀 AI Evolution 🧩 Philosophy 📖 15 min read

⚠️ Interesting But Ill-Advised

💱
November 2025 • 🚫 Don't Do This
FOREX Trading with SEC Filing Events: Why This Doesn't Work
Could SEC filing events predict currency movements? Adrian asked, so we wrote a comprehensive analysis. Short answer: No. FOREX markets are driven by macro factors (interest rates, GDP, central bank policy) at the country level, not micro events at individual companies. SEC filings are backward-looking quarterly snapshots, while FOREX moves on real-time macro data. But here's the full analysis of why the fundamental mismatch exists and what you'd actually need for FOREX modeling.

Key Points Covered:

  • What drives FOREX: interest rates, central banks, macro data
  • The fundamental mismatch: timing, scale, geography problems
  • Possible but unlikely use cases (aggregate health, FX exposure)
  • What you'd actually need: Bloomberg Terminal ($2K/month), real-time data
  • Why equity alpha is the right focus for this data
  • Honest verdict: Focus on equities where this actually works
  • But if you ignore our advice, here's how to try...
⚠️ Not Recommended 💱 FOREX Analysis 🎯 Honest Assessment 📖 15 min read

More Ideas Coming Soon

This section will grow with more strategic discussions, technical deep-dives, and product explorations. Topics in the pipeline include:

🤖 AI Architecture

Two-stage LLM pipeline design, transformer architectures, and Q-learning for trading

📈 Market Analysis

SEC filing patterns, event-driven alpha research, and market regime detection

🏗️ Technical Decisions

System architecture choices, data pipeline design, and scaling considerations

💡 Product Evolution

How products evolve from MVP to production, lessons learned, and pivots made