In other words: To predict tomorrow, you only need to know today. Yesterday doesn't matter (if you already know today).
Who was he? Russian mathematician who studied random processes.
His big discovery? Some random processes have a special property: the future depends only on the present, not the past.
Why it matters: This "memoryless" property makes many complex problems tractable! Without it, Q-learning wouldn't work.
Markov initially studied vowel/consonant patterns in Russian poetry to prove independence of events. Later his work became fundamental to probability theory, AI, and machine learning. He probably never imagined his poetry analysis would one day power stock trading systems!
"If I know the current state, does knowing the past help me predict the future?"
If NO: It's Markovian! โ
If YES: Not Markovian, need to expand the state โ
Current board state contains ALL information needed to play next move.
Don't need: How you got to this position (move history)
Only need: Current position
Future moves depend on: Current board
Future moves independent of: Past moves (given current board)
Tomorrow's weather depends on today's weather.
If it's sunny today โ 80% sunny tomorrow
(Regardless of what happened last week)
Knowing yesterday doesn't help if you already know today!
Your decision depends on MORE than current cards.
Current cards (state)
Past betting patterns (reveals information)
Opponent tendencies (learned from history)
Fix: Expand state to include betting history
Tomorrow's return depends on many factors.
Today's price
Today's volume
News sentiment
Historical volatility
Seasonal patterns
Fix: Expand state or accept approximation
What: Sequence of states where transitions follow Markov property.
Key point: Transitions depend ONLY on current state, not how you got there!
An MDP is defined by:
This is Markovian because: Future state depends ONLY on (current state, action chosen). Don't need to know "How did I get to state 0?" Only need: "I'm at state 0, what action should I take?"
ALL of reinforcement learning assumes the environment is (approximately) an MDP!
When we do Q-learning, we assume:
If the environment violated Markov property severely:
Problem: Stock returns depend on historical volatility
Nothing is perfectly Markovian. But as long as state captures "most" relevant information, Q-learning still works!
Problem: State can't include infinite history
Solution: Let neural network learn to remember important history
Question: Given current (prediction, price_change, position), does knowing what happened last week help?
Answer: Probably not much!
โ State is approximately Markovian โ
Markov: "Hey, want to predict the weather tomorrow?"
You: "Sure! Let me check last week's patterns..."
Markov: "Stop! Don't care about last week. Tell me about TODAY."
You: "But the historical data..."
Markov: "ONLY TODAY MATTERS. The present contains all you need!"
You: "That seems... limiting?"
Markov: "Limiting? It's LIBERATING! Infinite past compressed to one moment. Beautiful!"
You: "Okay but what about stock trading? Surely past matters..."
Markov: "If your state is well-designed, past is already IN the present! Price today reflects all past news. Prediction reflects all past patterns. See?"
You: "Huh... so by expanding state, I make it Markovian?"
Markov: "EXACTLY! Now you're thinking like me! ๐ฉ"
Every RL algorithm assumes Markov property:
Questions to ask:
Conclusion: Your state design is approximately Markovian! Q-learning will work well.
Symptoms:
Solutions:
What is Markov Property?
Future depends only on present, not past (given present)
What is Markov Decision Process?
Markov process + Actions + Rewards
Why Does it Matter?
ALL reinforcement learning assumes approximate Markov property!
How to Use It?
1. Design state to include relevant information
2. Test: "Does history help given current state?"
3. If not Markovian: Expand state or use recurrent networks
4. If close enough: Q-learning will work!
Your SEC Filing System:
Approximately Markovian with:
state = (transformer_prediction, confidence, price_change, position)