The Markov Property: The Future Depends Only on the Present

← Back to Ideas

🎯 The Big Idea

"The future depends only on the present, not the past."

In other words: To predict tomorrow, you only need to know today. Yesterday doesn't matter (if you already know today).

👨‍🔬 Meet Andrey Markov (1856-1922)

Who was he? Russian mathematician who studied random processes.

His big discovery? Some random processes have a special property: the future depends only on the present, not the past.

Why it matters: This "memoryless" property makes many complex problems tractable! Without it, Q-learning wouldn't work.

🎭 Fun Fact

Markov initially studied vowel/consonant patterns in Russian poetry to prove independence of events. Later his work became fundamental to probability theory, AI, and machine learning. He probably never imagined his poetry analysis would one day power stock trading systems!

🔍 What is the Markov Property?

Simple Test

Ask Yourself:

"If I know the current state, does knowing the past help me predict the future?"

If NO: It's Markovian! ✅
If YES: Not Markovian, need to expand the state ❌

Examples: Markovian vs Not Markovian

✅ Chess Position (Markovian)

Current board state contains ALL information needed to play next move.

Don't need: How you got to this position (move history)
Only need: Current position

Future moves depend on: Current board
Future moves independent of: Past moves (given current board)

✅ Weather (Simple Model)

Tomorrow's weather depends on today's weather.

If it's sunny today → 80% sunny tomorrow
(Regardless of what happened last week)

Knowing yesterday doesn't help if you already know today!

❌ Poker Hand (Not Markovian)

Your decision depends on MORE than current cards.

Current cards (state)
Past betting patterns (reveals information)
Opponent tendencies (learned from history)

Fix: Expand state to include betting history

❌ Stock Returns (Reality)

Tomorrow's return depends on many factors.

Today's price
Today's volume
News sentiment
Historical volatility
Seasonal patterns

Fix: Expand state or accept approximation

🎲 From Markov Chains to Markov Decision Processes

Markov Chain: Simple Random Walk

What: Sequence of states where transitions follow Markov property.

Example: Weather Model

States: {Sunny, Rainy, Cloudy}

Transition probabilities (given current state):
  Sunny  → Sunny:  0.8
  Sunny  → Rainy:  0.1
  Sunny  → Cloudy: 0.1

  Rainy  → Sunny:  0.2
  Rainy  → Rainy:  0.6
  Rainy  → Cloudy: 0.2

If today is Sunny:
  → 80% chance tomorrow is Sunny
  → 10% chance tomorrow is Rainy
  → 10% chance tomorrow is Cloudy
  (Don't care about yesterday!)
            

Key point: Transitions depend ONLY on current state, not how you got there!

Markov Decision Process (MDP): Markov Chain + Actions + Rewards

MDP = Markov Chain + Actions + Rewards + Goal

An MDP is defined by:

S: Set of states
A: Set of actions
P: Transition function P(s'|s,a) - probability of next state given current state and action
R: Reward function R(s,a,s') - reward for transition
γ: Discount factor (0 ≤ γ ≤ 1)
Goal: Maximize cumulative reward

Example: Gridworld MDP

States (S): 12 positions in grid

[0] [1] [2] [3=Goal]
[4] [5] [6=Trap] [7]
[8] [9] [10] [11]

Actions (A): {UP, DOWN, LEFT, RIGHT}

Rewards R(s,a,s'):
  Any state → Goal (3): +10
  Any state → Trap (6): -10
  Any other transition: -1
            

This is Markovian because: Future state depends ONLY on (current state, action chosen). Don't need to know "How did I get to state 0?" Only need: "I'm at state 0, what action should I take?"

🔗 Why MDPs Matter for Q-Learning

The Critical Connection

ALL of reinforcement learning assumes the environment is (approximately) an MDP!

When we do Q-learning, we assume:

Q(state, action) = reward + γ × max(Q(next_state, next_action))
                    ↑                    ↑
                immediate          depends ONLY on next_state
                                  (not on how we got to current state!)
        

If the environment violated Markov property severely:

Q-learning might not converge
Learned policy might be suboptimal
Would need to expand state to include history

💡 Your SEC Filing System as an MDP

Attempt 1: Simple State (Not Quite Markovian)

State: (return_prediction_3m)

Problem: Future depends on MORE than just prediction!
  - What was the prediction confidence?
  - Has stock price already moved on this news?
  - Are we already holding this stock?

→ Violates Markov property (need more in state)
        

Attempt 2: Expanded State (Closer to Markovian)

State: (return_prediction_3m, confidence, price_change_1d, has_position)

Better! Future now depends mainly on current state.
  - Prediction: What transformer thinks
  - Confidence: How sure it is
  - Price change: Market reaction so far
  - Position: What we currently hold

→ Approximately Markovian (good enough!)
        

Attempt 3: Even More Markovian (If Needed)

State: (
    return_prediction_3m,
    confidence,
    price_change_1d,
    price_change_5d,
    volume_spike,
    has_position,
    days_held,
    unrealized_pnl
)

Very Markovian! Includes:
  - Prediction information
  - Market context (recent history)
  - Position status
  - Time information

→ Highly Markovian (may be overkill)
        

🛠️ How to Make Your Problem Markovian

Strategy 1: Expand the State

Problem: Stock returns depend on historical volatility

Before (Not Markovian)

state = (price_today)

After (Markovian)

state = (
    price_today,
    volatility_30d,
    volume_avg
)
                

Strategy 2: Accept Approximation

Reality Check

Nothing is perfectly Markovian. But as long as state captures "most" relevant information, Q-learning still works!

# Your system
state = (transformer_prediction, market_context, position)

# Not perfectly Markovian because:
#   - Doesn't include macro sentiment
#   - Doesn't include sector trends
#   - Doesn't include Fed policy
#   - ...

# But "good enough" because:
#   - Captures main signal (transformer)
#   - Captures immediate context (price, volume)
#   - Captures position state
#   → Q-learning will still learn useful policy!
            

Strategy 3: Use Recurrent Networks

Problem: State can't include infinite history

Solution: Let neural network learn to remember important history

# LSTM / GRU / Transformer can maintain hidden state
# Effectively learns what history to remember

state_visible = (current_observation)
state_hidden = lstm.hidden_state  # Learned representation of history

# Combined state is Markovian!
        

🧪 Testing if Your MDP is Markovian

Simple Test for Your SEC Filing System

Question: Given current (prediction, price_change, position), does knowing what happened last week help?

Answer: Probably not much!

If prediction is +15%, price is up 2%, we're long
Knowing last week's events doesn't add much (already in prediction)
Knowing last week's price doesn't help (already know today's)

→ State is approximately Markovian ✅

🎭 Hanging Out with Markov Would Be Like...

Markov: "Hey, want to predict the weather tomorrow?"

You: "Sure! Let me check last week's patterns..."

Markov: "Stop! Don't care about last week. Tell me about TODAY."

You: "But the historical data..."

Markov: "ONLY TODAY MATTERS. The present contains all you need!"

You: "That seems... limiting?"

Markov: "Limiting? It's LIBERATING! Infinite past compressed to one moment. Beautiful!"

You: "Okay but what about stock trading? Surely past matters..."

Markov: "If your state is well-designed, past is already IN the present! Price today reflects all past news. Prediction reflects all past patterns. See?"

You: "Huh... so by expanding state, I make it Markovian?"

Markov: "EXACTLY! Now you're thinking like me! 🎩"

📚 Why "Markov" Shows Up Everywhere in RL

Every RL algorithm assumes Markov property:

Q-Learning:

Q(s,a) = Q(s,a) + α[r + γ max Q(s',a') - Q(s,a)]
                               ↑
                    Assumes s' is sufficient (Markovian!)
        

Policy Gradient:

π(a|s) = probability of action a given state s
         ↑
         Assumes s is sufficient (Markovian!)
        

Value Iteration:

V(s) = max_a Σ P(s'|s,a)[R(s,a,s') + γV(s')]
                ↑
         Transition depends only on (s,a), not history!
        

✅ Practical Implications for Your Production System

Goal: Design State to be "Markovian Enough"

Questions to ask:

Does state include transformer prediction? ✅
Does state include market context? ✅
Does state include position status? ✅
Given this state, does history matter? ❌ (Not much!)

Conclusion: Your state design is approximately Markovian! Q-learning will work well.

If State Not Markovian Enough

Symptoms:

Q-learning doesn't converge
Policy is inconsistent
Performance varies wildly with same state

Solutions:

Add more features to state
Use recurrent network (LSTM)
Increase state space (discretize history)
Accept approximation (often works anyway!)

🎯 Summary

Key Takeaways

What is Markov Property?
Future depends only on present, not past (given present)

What is Markov Decision Process?
Markov process + Actions + Rewards

Why Does it Matter?
ALL reinforcement learning assumes approximate Markov property!

How to Use It?
1. Design state to include relevant information
2. Test: "Does history help given current state?"
3. If not Markovian: Expand state or use recurrent networks
4. If close enough: Q-learning will work!

Your SEC Filing System:
Approximately Markovian with:
state = (transformer_prediction, confidence, price_change, position)