In ML/AI research, failed experiments are often more valuable than successes. They reveal edge cases, expose flawed assumptions, and guide future work. This is a collection of real experiments—what worked, what didn't, and what we learned.
Philosophy: If an experiment "fails" but teaches you something important about the problem space, it's actually a success. The only true failure is not learning from the attempt.
"I have not failed. I've just found 10,000 ways that won't work." — Thomas Edison
Q-learning worked correctly, but state representation was flawed. 99.8% of predictions fell into one bucket because transformer only produced positive predictions (+1.68% to +10.74%), but bucketing thresholds assumed -10% to +10% range. Lesson: State representation is critical in RL.
Q-learning worked as a diagnostic tool - it proved the algorithm works correctly, but revealed that transformer's confidence levels don't correlate with actual returns. High-confidence predictions don't actually outperform average predictions.
Per-trade metrics (v2's +2.12%) don't tell the full story. Real portfolio simulation with capital concentration and compounding turned the same Q-learning algorithm into +11.20% returns. Agent learned that 20 well-timed trades beat 7,440 average trades.
The +11.18% gap between transformer-based Q-learning (v1) and event-count Q-learning (v4) quantifies the value of transformers. This valuable negative result proves transformers are necessary for extracting trading signals from SEC filings - they're not optional complexity.
Separation of concerns needed: LLM for pattern recognition and conviction scoring (0.0-1.0), deterministic code for portfolio construction and risk management. Also achieved 10x backtesting speedup via pre-computed contexts (59 contexts/sec).