Mars Expansion Plan

Scaling from 48K to 1M+ Private Companies

48K → 1M+
Company Scale
$1M-$1.8M
Total Investment
18-24 Months
Timeline

Executive Summary

Mars currently tracks 48,622 companies discovered through funding announcements in news articles (Benzinga). To compete with Crunchbase (millions of companies) and PitchBook (institutional-grade coverage), we need to expand beyond funding announcements to capture the vast majority of private companies that never announce deals.

Key Insight: Crunchbase's millions of companies include many that never did funding deals. We need alternative data sources to discover and track these companies.

Current State

What We Have

  • 48,622 companies tracked via funding announcements
  • 42,700 funding deals from news articles (3 years)
  • 251,611 people/executives
  • Real-time news monitoring (Benzinga)
  • Web scraping infrastructure (Apify, ScrapingBee)
  • LLM extraction pipeline (OpenAI, Claude, Grok)

The Gap

  • Coverage: 48K vs millions for competitors
  • Discovery Method: Limited to companies that announce funding
  • Missing: The majority of private companies that never announce deals

Primary Data Sources: Path to Millions

1. Secretary of State Databases (OpenCorporates)

⭐⭐⭐⭐⭐ Highest Priority

Every incorporated company must register with their state. OpenCorporates aggregates 200M+ companies globally.

Coverage
200M+
Companies globally
Cost
FREE
API available
Timeline
Month 1-2
Quick start

Data Available:

• Company name & DBA
• Formation date
• Registered agent
• Officers/directors
• Business status (active/dissolved)
• Business type (LLC, Corp, etc.)

2. LinkedIn Company Pages

⭐⭐⭐⭐ High Priority

Most active private companies maintain LinkedIn pages with rich company data and growth signals.

Coverage
10M+
Company pages
Cost
LOW
Scraping via Apify
Timeline
Month 3-6
Phase 1

Data Available:

• Employee count (growth signal)
• Location & offices
• Industry classification
• Company description
• Job postings (hiring activity)
• Website URL

3. SEC Filings (Private Company Mentions)

⭐⭐⭐⭐⭐ Highest Priority + Unique

Mine 26M SEC filings to extract private company names from public company disclosures. Already have the infrastructure!

Coverage
50K-100K
Private companies
Cost
FREE*
*LLM processing costs
Timeline
Month 2-4
Add to pipeline

Filing Types to Mine:

• 10-K "Customers" sections
• 10-K "Competition" sections
• S-1 filings (IPO documents)
• 8-K partnership announcements
• 10-Q supplier/customer disclosures
• Acquisition descriptions

🎯 Competitive Advantage: Already processing SEC filings for Agreements system. Can add private company extraction to existing pipeline at minimal incremental cost.

4. Government Contracts (SAM.gov)

⭐⭐⭐⭐ High Priority

Every company doing business with the federal government must register. High-quality, tech-focused companies.

Coverage
300K
Active entities
Cost
FREE
Bulk download
Timeline
Month 1-2
Quick win

Data Available:

• Legal business name & DBAs
• Physical address
• Business type & size
• Industry codes (NAICS)
• Point of contact
• CAGE codes, DUNS numbers

5. USPTO Patent Data

⭐⭐⭐⭐ Already Planned

Patent assignees reveal innovative companies, often pre-funding or in stealth mode. Already in Mars 2026 roadmap (Q1).

Coverage
1M+
Patent assignees
Cost
FREE
USPTO bulk data
Timeline
Month 4-6
Q1 2026

Implementation Roadmap

PHASE 1

Scale & Discovery

6-12 Months
500K
Companies
OpenCorporates API
SAM.gov entity extract
SEC filing mining
LinkedIn scraping
USPTO patent data
Investment
$200K-300K
Infrastructure + Data + Engineering
PHASE 2

Enrichment

12-18 Months
Deep Data
500K Companies
Website scraping (all 500K)
Job posting tracking
News monitoring expansion
Social media presence
Data freshness tracking
Investment
$300K-500K
Scaling + LLM + Engineering
PHASE 3

Market Launch

18-24 Months
1M+
Companies
International expansion
Product UI/UX launch
API & integrations
Sales & marketing
Data quality team
Investment
$500K-1M
GTM + Product + Operations

Mars' Competitive Advantages

Real-Time News (Already Have)

Benzinga newsfeed provides immediate funding announcements. Competitors rely on manual research teams (slower).

SEC Filing Analysis (Building)

26M filings to mine for private company mentions. Competitors don't deeply mine SEC data. Unique differentiator.

AI/LLM Automation (Already Have)

Automated extraction using Claude & Grok vs. manual research teams. Lower cost per company profile, faster scaling.

Low-Cost Data Collection

Free sources (OpenCorporates, SAM.gov, USPTO, SEC) + automated scraping vs. PitchBook's 2000+ researchers.

Can AI Replace PitchBook's 2,000 Analysts?

Deep dive into the strategic question: competing with PitchBook using AI instead of a massive human research team

70-80%
Analyst Work AI Can Automate
96%
Cost Reduction Possible
40-90x
Faster SEC Filing Analysis
Read Full Analysis →

Quick Win Projects (Next 60 Days)

Project 1: OpenCorporates Ingestion (2 weeks)

Build Python script to query OpenCorporates API, filter for US tech companies (active, < 10 years old), insert into MongoDB.

Target: 50K new companies in 2 weeks

Project 2: SAM.gov Import (1 week)

Download SAM.gov entity extract, parse and normalize data, import to MongoDB with deduplication.

Target: 100K-200K government contractors

Project 3: SEC Mining Proof-of-Concept (2 weeks)

Take 1000 recent 10-K filings, use LLM to extract private company names, validate extraction quality.

Target: 5K-10K private company mentions

Success Metrics

Q1 2026
100K
Companies (2x)
Q2 2026
250K
Companies (5x)
Q3 2026
500K
Companies (10x)
Q4 2026
1M+
Companies (20x)

Conclusion

Mars can scale from 48K to 500K-1M companies within 18-24 months using primarily free, scalable data sources. The key is leveraging existing infrastructure (web scraping, LLM extraction, MongoDB) and focusing on sources that provide both breadth (OpenCorporates, SAM.gov) and depth (SEC filings, LinkedIn).

Total investment of $1M-$1.8M positions Mars as a credible Crunchbase competitor with unique advantages in real-time news monitoring and SEC filing analysis.

Next Milestone

100K companies by Q1 2026 using OpenCorporates + SAM.gov (achievable in 60 days)