Build on what won't change: an AI startup strategy thesis

The most successful AI companies of 2025–2026 aren't winning on model intelligence. They're winning on product, integrations, and infrastructure. This analysis examines why founders who build on durable human needs compound value while those who bet on model capabilities get disrupted.

Stan Sedberry

•March 27, 2026•15 min read•54 views

Build on what won't change: an AI startup strategy thesis

The most successful AI companies of 2025–2026 aren't winning on model intelligence. They're winning on product, integrations, and infrastructure. Cursor hit $2B+ ARR using off-the-shelf LLMs wrapped in exceptional editor UX. Databricks reached a $134B valuation by becoming the data backbone enterprises can't rip out. Meanwhile, AI wrapper startups fail at twice the rate of regular tech companies, with 94% never reaching $1M in revenue. The pattern is unmistakable: founders who bet on rapidly changing model capabilities get steamrolled by the next release cycle, while those who build on durable human needs , intuitive workflows, deep integrations, reliable infrastructure , compound value over years. Jeff Bezos articulated this principle two decades ago. The AI era is proving him right at unprecedented speed.

The ground is shifting at 250x per year

To understand why building on model capabilities is dangerous, you first need to grasp how fast those capabilities change. Context windows expanded 250x in just 15 months, from GPT-3.5's 4,096 tokens in November 2022 to Gemini 1.5 Pro's 1 million tokens in February 2024. By April 2025, Meta's Llama 4 Scout reached 10 million tokens. As of March 2026, multiple frontier models offer 1M+ token contexts at standard pricing, with no surcharge.

Cost declines are equally violent. The Stanford HAI 2025 AI Index documented a 280x cost reduction for GPT-3.5-level performance over 24 months , from $20 per million tokens to $0.07. At the GPT-4 capability tier, prices fell from $37.50 per million tokens at launch to $0.14 by August 2025, a 267x decline. Epoch AI's March 2025 analysis found median price decline rates of 200x per year when examining post-January 2024 data, with some capability milestones dropping 900x per year.

Benchmark performance tells the same story of relentless, compounding improvement. On SWE-bench, which tests real-world software engineering against GitHub issues, AI systems jumped from solving 4.4% of problems in 2023 to 71.7% in 2024 , a 16x improvement in a single year. On IMO-qualifying math problems, performance leaped from 9.3% (GPT-4o) to 74.4% (o1) in four months. The reasoning model revolution , from OpenAI's o1 in September 2024 through DeepSeek R1's open-source release in January 2025 to o3 and o4-mini in April 2025 , compressed years of expected progress into quarters.

Perhaps most telling is the convergence phenomenon. The Elo difference between the top-ranked and tenth-ranked model on Chatbot Arena shrank from 11.9% to 5.4% in just one year. The gap between open-weight and closed models narrowed from 8.04% to 1.70%. The US-China gap on MMLU collapsed from 17.5 percentage points to 0.3. Models are becoming commodities faster than anyone predicted, and the smallest model scoring above 60% on MMLU went from requiring 540 billion parameters in 2022 to just 3.8 billion in 2024 , a 142-fold reduction. Any startup whose moat depends on a specific model capability occupies ground that will be undercut within months.

What Bezos understood that most AI founders don't

In a 2007 Harvard Business Review interview, Jeff Bezos laid out a deceptively simple strategic principle: "It helps to base your strategy on things that won't change. At Amazon we're always trying to figure that out, because you can really spin up flywheels around those things. All the energy you invest in them today will still be paying you dividends ten years from now." He elaborated at AWS re:Invent in 2012 with what became his most famous strategic insight: "I very frequently get the question: 'What's going to change in the next 10 years?' I almost never get the question: 'What's not going to change in the next 10 years?' And I submit to you that that second question is actually the more important of the two."

Bezos identified three invariants for Amazon , low prices, fast delivery, vast selection , and invested relentlessly in all three. No customer would ever wake up wanting higher prices or slower shipping. The energy invested in those flywheel effects compounds indefinitely.

The AI equivalent is straightforward. No developer will ever want a code editor that's harder to use. No enterprise will ever want integrations that break. No user will ever prefer a slower, less reliable workflow. The invariants in AI are product quality, workflow integration, infrastructure reliability, and meeting users where they already work. These are the things that won't change in ten years, regardless of whether GPT-7 has a context window of 100 million tokens or reasoning capabilities that surpass PhD-level expertise.

Warren Buffett arrived at a convergent conclusion from a different direction. His "circle of competence" framework and insistence on "durable competitive advantage" historically kept him away from technology companies precisely because technical advantages were too ephemeral. His famous observation , "I have no idea where Microsoft will be in 20 years, but I can say that people will still chew gum" , captures the same insight. The Bezos-Buffett convergence, as Farnam Street noted, is remarkable: the most innovative tech CEO and the most conservative investor both built fortunes by investing in enduring fundamentals rather than chasing what's changing.

The graveyard of companies that built on shifting sand

The evidence for what happens when AI startups build on temporary model limitations is now extensive and devastating. Jasper AI is the canonical cautionary tale. Founded in 2021 as an AI writing tool built on GPT-3, Jasper peaked at $120M ARR and a $1.5 billion valuation after raising $125 million in October 2022. Then ChatGPT launched six weeks later, offering the same underlying capability directly to consumers for $20 per month. Jasper's ARR forecast was revised down by at least 30%. Revenue dropped to an estimated $35–55 million , a decline of more than 50% from peak. Both founders stepped down by September 2023. The company still exists, but its original differentiation , a user-friendly interface over GPT-3 , evaporated the moment OpenAI shipped its own interface.

Inflection AI raised $1.525 billion at a $4 billion valuation to build Pi, an "emotionally intelligent" chatbot. Despite having a DeepMind co-founder and backing from Microsoft, NVIDIA, and Bill Gates, Inflection couldn't compete once ChatGPT, Claude, and Gemini offered comparable conversational abilities. In March 2024, Microsoft effectively acqui-hired the company, paying $650 million to license the technology while poaching the CEO and roughly 70 staff. Co-founder Mustafa Suleyman admitted what the market had already concluded: models were "fundamentally a commodity."

The pattern extends across entire product categories. When OpenAI added native PDF upload to ChatGPT in November 2023, AI researcher Alex Ker posted that "many startups just died today." The "Chat with PDF" category , which had spawned companies like ChatPDF, Humata ($3.5M from Google's Gradient Ventures), and AskYourPDF , saw its collective value proposition erased by a single feature update. The DEV Community documented 73 clones launching the same week, only for users to realize they could do the same thing for free in ChatGPT.

The broader statistics are grim. According to CB Insights, 78% of AI startups launched in 2024 were essentially API wrappers. These companies exhibit a 65% churn rate within 90 days , nearly double the SaaS average. Only 3–5% surpass $10,000 in monthly revenue. Google VP Darren Mowry, speaking on TechCrunch's Equity podcast in February 2026, declared that LLM wrappers and AI aggregators have their "check engine light on," warning that "the industry does not have a lot of patience for that anymore." SimpleClosure's 2025 shutdown report confirmed the first major wave of AI company closures had arrived, with wrappers and application-layer tools facing the sharpest correction.

Prompt engineering as a standalone discipline met a similar fate. Fast Company reported that by mid-2025, 68% of firms provided prompt engineering as standard training across all roles rather than hiring specialists. Microsoft's survey of 31,000 workers ranked "Prompt Engineer" second to last among new roles companies planned to add. The field didn't vanish. It evolved into what Andrej Karpathy termed "context engineering," but the companies and tools built around artisanal prompt crafting lost their reason to exist.

Get insights like this in your inbox

Join our newsletter for deep dives on AI, technology, and building the future. No spam, unsubscribe anytime.

Why Cursor, Replit, and Databricks keep winning

The contrast with companies building on durable foundations could not be sharper. Cursor uses off-the-shelf LLMs , Claude, GPT, Gemini , yet reached $100M ARR faster than any SaaS company in history and crossed $2B+ in annualized revenue by early 2026. Its valuation trajectory tells the story: $2.6 billion in January 2025, $9.9 billion by June, $29.3 billion in November, and discussions at $50 billion as of March 2026. The magic isn't the model. Cursor's moat is a VS Code fork with zero learning curve, Composer's multi-file diff-viewer UX, codebase-level semantic search that understands relationships across files, and multi-agent orchestration running up to eight parallel agents with isolated workspaces. As one analysis noted: "The real innovation is how it lets humans talk to the machine: directly, intuitively."

Replit followed the same playbook from a different angle. Rather than starting with AI and looking for an application, Replit built an entire development environment , IDE, hosting, databases, authentication, one-click deployment , and then layered AI on top. Revenue exploded from roughly $2.8M ARR in early 2024 to $240 million in 2025, with a target of $1 billion by end of 2026. Valuation tripled from $3 billion to $9 billion in six months. The infrastructure is the moat: Replit integrates Anthropic, OpenAI, and Google models, routing workloads to whichever performs best for each task. When models improve, Replit gets better automatically instead of getting disrupted.

Vercel's v0 demonstrates the ecosystem integration thesis. The AI UI generation tool succeeds not because it has a superior model (it runs Claude, Grok, Gemini, and a custom AutoFix model simultaneously) but because it's the only tool where the AI agent and the production infrastructure are the same company. Generate code in v0, deploy to Vercel in one click, import any GitHub repo, auto-pull environment variables. The team explicitly stated their philosophy: "Your product's moat cannot be your system prompt." Their moat is the composite engineering pipeline plus deployment infrastructure.

Databricks represents the infrastructure play at scale. With $5.4 billion in annualized revenue growing 65% year-over-year, a $134B valuation, and more than 60% of the Fortune 500 as customers, Databricks built a data platform so deeply integrated into enterprise workflows that replacing it would be organizational surgery. AI products alone now account for over $1.4 billion in revenue , 25%+ of total , but they're layered on top of the data backbone, not standalone. Each new AI product expands TAM and drives net retention above 140%.

The common thread is unmistakable. GitHub Copilot reached 15 million users and 90% Fortune 100 adoption not because it had the best model, but because it was embedded directly in VS Code , the world's most popular IDE. Perplexity grew to $148M ARR and a $20B valuation by building a search UX with citations and distribution partnerships (Samsung TVs, Snapchat, Airtel) rather than competing on model quality. Midjourney bootstrapped to $200M+ annual revenue with 40 employees by cultivating a Discord community with a distinctive aesthetic. Every breakout AI company has a non-model moat: editor UX, data platform lock-in, ecosystem integration, community, or distribution.

The Dennard scaling analogy and what comes next

There is a useful historical parallel for what happens when a core technology capability plateaus. From 1986 to 2003, single-core CPU performance improved roughly 52% per year, driven by Dennard scaling , the principle that as transistors shrink, their power density remains constant, enabling continuous clock speed increases. Then, around 2004–2007, Dennard scaling broke down. Leakage current and quantum effects at nanometer scales made power density unmanageable. Clock frequencies stagnated at 4–6 GHz. Intel cancelled its next-generation single-core processors.

The industry's response was not despair but architectural reinvention. Value shifted from faster single cores to multi-core processors, parallelism, system-level optimization, and heterogeneous computing. Performance improvement rates dropped to 23% annually from 2003–2011, then just 7% from 2011–2018 , but total system performance kept climbing through coordination, not raw capability.

The AI analogy is increasingly concrete. Individual model intelligence improvements appear to be encountering diminishing returns on traditional benchmarks , MMLU, GSM8K, and HumanEval are "nearly saturated." The frontier has shifted to harder benchmarks where top models still score poorly: Humanity's Last Exam (best score: 8.80%), FrontierMath (AI solves only 2%), and BigCodeBench (35.5% vs. human 97%). Meanwhile, the emerging scaling axis is parallelism and orchestration. Andrew Ng observed in August 2025 that "parallel agents are emerging as an important new direction for scaling up AI," framing them alongside training data and compute as a fundamental scaling technique.

Google Research provided quantitative evidence in January 2026. Across 180 agent configurations, centralized multi-agent coordination improved performance on parallelizable tasks by 80.9% over a single agent. The implications for startup strategy are direct: as single-model intelligence plateaus, value migrates to the orchestration layer , evaluation, reliability, multi-agent coordination, and infrastructure. The companies building the "multi-core architecture" of AI will capture the next phase of value, just as ARM, AMD, and systems companies captured value in the post-Dennard era rather than the companies optimizing single-core clock speeds.

The agent infrastructure market already reflects this shift. Valued at $7.4 billion in 2025 with a projected CAGR of 45%, the agentic AI infrastructure space attracted $2.8 billion in venture funding in H1 2025 alone. LangChain's 2026 State of AI Agents report found that 57% of organizations now have agents in production, with quality , not capability , cited as the top deployment barrier by 32% of respondents. The infrastructure needs are enormous: evaluation-driven development (analogous to test-driven development in software), observability platforms like Arize AI ($70M Series C), orchestration frameworks like LangGraph and CrewAI, and verification loops that ensure agent reliability. These are picks-and-shovels plays that benefit regardless of which model dominates.

Human behavior is the ultimate invariant

The deepest reason to build on product, UX, and integrations rather than model capabilities is that human behavior changes on geological timescales compared to technology. The QWERTY keyboard, designed in 1868 to prevent mechanical jamming on typewriters, persists on every digital device despite being suboptimal for modern typing. Email, invented in 1971, remains the dominant professional communication channel 55 years later. The spreadsheet metaphor from VisiCalc (1979) is still the organizing principle for data work. File-and-folder hierarchies from physical filing cabinets still structure every operating system.

Jakob Nielsen of Nielsen Norman Group identified a powerful mechanism behind this persistence: mental model inertia. "There's great inertia in users' mental models: stuff that people know well tends to stick, even when it's not helpful." When new design patterns conflict with existing mental models, users experience systematic failures , misinterpreting everything without questioning their basic assumptions. This isn't laziness; it's how human cognition works. We navigate the world through learned patterns, and cognitive switching costs are real and high.

Behavioral science confirms this at a deeper level. Research published in PMC found that methods used to create behavior change "tend to inhibit, rather than erase, the original behavior" , and behavior change is "often specific to the context in which it is learned," meaning people revert to old habits when contexts shift. Hyperbolic discounting means humans systematically overweight the immediate efficiency of familiar tools over the future benefits of new ones. Bentley University UX research found that the most successful digital products are "digitized versions of existing practices" , Uber is hailing a cab, Kindle is reading a book, online shopping is browsing a store. Introducing genuinely new behaviors is dramatically harder than enhancing existing ones.

This has direct implications for AI startup strategy. Forrester Research found that embedded AI , functionality integrated into existing workflows , delivers 50% average time savings and 40% faster time-to-value compared to standalone AI tools. IBM Consulting's Manish Goyal put it bluntly: "You can have all the best AI, but if it's not in the workflow where people use it, it's not going to get adoption." Grove Ventures' enterprise AI playbook reached the same conclusion: "If users have to change the way they work to accommodate AI, they probably won't adopt it."

This is why Copilot in VS Code beats standalone coding assistants. Why Notion AI inside Notion beats standalone AI writing tools. Why v0 deploying to Vercel beats standalone code generators. The integration moat compounds: distribution drives usage, usage generates feedback, feedback improves the product, and accumulated context creates switching costs. As Bessemer Venture Partners noted, "When your product understands a user's world better than anything else, replacing it feels like starting over."

Conclusion: the durable playbook

The venture consensus has crystallized around a single litmus test, articulated by Wing VC's Jake Flomenberg: "If OpenAI launches a model 10x better tomorrow, does this company still have a reason to exist?" Companies that answer yes are building on invariants. Companies that answer no are building on borrowed time.

The evidence assembled here points to a clear strategic framework. Build for how humans actually work, not for what current models can't do. Invest in integrations that embed your product into existing workflows until removing it would be organizational surgery. Create infrastructure that gets better , not obsolete , as models improve. Choose model-agnostic architectures that benefit from commoditization rather than being threatened by it. The AI companies compounding value today share these traits: Cursor's editor UX, Databricks' data platform, Replit's full-stack environment, Perplexity's search experience, Vercel's deployment pipeline.

The rate of change in model capabilities is not slowing down. Context windows, reasoning ability, cost per token, and multimodal fluency will continue improving at rates that make current limitations unrecognizable within months. But the rate of change in human behavior, workflow preferences, and the need for reliable infrastructure will remain roughly constant , slow, predictable, and deeply rooted in cognition. That asymmetry is not a problem to solve. It is the foundation to build on.

Get insights like this in your inbox

Join our newsletter for deep dives on AI, technology, and building the future. No spam, unsubscribe anytime.

Build on what won't change: an AI startup strategy thesis

The ground is shifting at 250x per year

What Bezos understood that most AI founders don't

The graveyard of companies that built on shifting sand

Get insights like this in your inbox

Why Cursor, Replit, and Databricks keep winning

The Dennard scaling analogy and what comes next

Human behavior is the ultimate invariant

Conclusion: the durable playbook

Get insights like this in your inbox

Related Articles

200 Gigabytes of God: AI, Abundance, and the New Feudalism

The Compression Thesis: A Theory of Mind in the Age of AI

AI Agent News: Supply Chain Attacks, $250K Token Budgets, and Claude Takes Over Your Mac

Related Articles

200 Gigabytes of God: AI, Abundance, and the New Feudalism

The Compression Thesis: A Theory of Mind in the Age of AI

AI Agent News: Supply Chain Attacks, $250K Token Budgets, and Claude Takes Over Your Mac