Scaling from 48K to 1M+ Private Companies
Mars currently tracks 48,622 companies discovered through funding announcements in news articles (Benzinga). To compete with Crunchbase (millions of companies) and PitchBook (institutional-grade coverage), we need to expand beyond funding announcements to capture the vast majority of private companies that never announce deals.
Key Insight: Crunchbase's millions of companies include many that never did funding deals. We need alternative data sources to discover and track these companies.
Every incorporated company must register with their state. OpenCorporates aggregates 200M+ companies globally.
Most active private companies maintain LinkedIn pages with rich company data and growth signals.
Mine 26M SEC filings to extract private company names from public company disclosures. Already have the infrastructure!
🎯 Competitive Advantage: Already processing SEC filings for Agreements system. Can add private company extraction to existing pipeline at minimal incremental cost.
Every company doing business with the federal government must register. High-quality, tech-focused companies.
Patent assignees reveal innovative companies, often pre-funding or in stealth mode. Already in Mars 2026 roadmap (Q1).
Benzinga newsfeed provides immediate funding announcements. Competitors rely on manual research teams (slower).
26M filings to mine for private company mentions. Competitors don't deeply mine SEC data. Unique differentiator.
Automated extraction using Claude & Grok vs. manual research teams. Lower cost per company profile, faster scaling.
Free sources (OpenCorporates, SAM.gov, USPTO, SEC) + automated scraping vs. PitchBook's 2000+ researchers.
Deep dive into the strategic question: competing with PitchBook using AI instead of a massive human research team
Build Python script to query OpenCorporates API, filter for US tech companies (active, < 10 years old), insert into MongoDB.
Download SAM.gov entity extract, parse and normalize data, import to MongoDB with deduplication.
Take 1000 recent 10-K filings, use LLM to extract private company names, validate extraction quality.
Mars can scale from 48K to 500K-1M companies within 18-24 months using primarily free, scalable data sources. The key is leveraging existing infrastructure (web scraping, LLM extraction, MongoDB) and focusing on sources that provide both breadth (OpenCorporates, SAM.gov) and depth (SEC filings, LinkedIn).
Total investment of $1M-$1.8M positions Mars as a credible Crunchbase competitor with unique advantages in real-time news monitoring and SEC filing analysis.
Next Milestone
100K companies by Q1 2026 using OpenCorporates + SAM.gov (achievable in 60 days)