📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI has achieved significant progress in automating engineering tasks, with benchmarks nearing saturation. However, research remains less automated, though evidence suggests this gap may narrow soon.
Recent evidence from multiple AI benchmarks demonstrates that core engineering tasks in AI research are becoming fully automatable, with systems handling reproduction, optimization, and kernel design at near-human levels. Meanwhile, research automation remains less mature but is advancing rapidly, suggesting a potential convergence in the near future.
According to Thorsten Meyer’s analysis of Jack Clark’s recent essay, six key benchmarks measuring AI capabilities in research-related tasks show that AI systems are approaching saturation. For example, the CORE-Bench, which tests research reproduction, improved from 21.5% in September 2024 to 95.5% in December 2025, with the benchmark’s author declaring it ‘solved.’ Similarly, the MLE-Bench, evaluating performance on Kaggle competitions, rose from 16.9% to 64.4% over sixteen months, now surpassing mid-tier human performance.
Clark’s analysis suggests that the bottleneck for AI in engineering—such as reproducing experiments, optimizing code, and designing kernels—is diminishing rapidly. The evidence indicates that AI can now perform many engineering tasks at a level comparable to or exceeding human experts, which could significantly reduce the need for human labor in these areas.
However, Clark notes that research, which involves generating new hypotheses, conceptual breakthroughs, and creative insights, remains less automated. While progress is evident, it is not yet at the same saturation level as engineering tasks. Still, the pace of recent developments hints that the residual gap between engineering and research automation may close faster than previously expected, especially if research itself begins to be viewed as a form of large-scale engineering.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

CLAUDE AI UNLEASHED From First Prompts to Pro: The Complete Guide to Claude AI for Writing, Research, Coding, and Business (The Claude AI Mastery Series)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.

AI Tools for Everyday Tasks: The Complete Beginner’s Guide To Working Smarter with AI
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

Hands-On AI Engineering: Code First Guide to Building Production Grade LLM Systems with Python | Accompanied with GitHub Tutorials | Learn about Transformers Foundation Models & ML Pipelines
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational

AI Workflow Tools for Researchers & Analysts: Automating Literature Reviews, Summaries, and Hypothesis Generation with ChatGPT, Claude, and Perplexity
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications for AI Development and Research Strategies
The automation of core engineering tasks could drastically accelerate AI development cycles, reduce costs, and shift the focus of AI research from engineering to pure scientific inquiry. As AI systems increasingly handle reproduction, optimization, and kernel design, human researchers may focus more on conceptual innovation, potentially transforming the research landscape. This shift could also influence institutional priorities, funding, and the future workforce in AI.
Moreover, if research begins to be automated at scale, it could lead to a rapid acceleration of AI capabilities, possibly reaching a point where human intervention is only needed for high-level strategic decisions. This transition raises questions about the future role of human researchers and the governance of increasingly autonomous AI systems.
Recent Benchmarks and the Progress in AI-Driven Engineering
Recent months have seen a surge in AI capabilities across multiple benchmarks. The CORE-Bench, focused on reproducing research, moved from 21.5% to 95.5%, with the author stating it is ‘solved.’ The MLE-Bench, assessing Kaggle competition performance, improved from 16.9% to 64.4%, with the leaderboard temporarily paused for review amid rapid capability growth. Additionally, advances in kernel design—such as automated GPU kernel generation—demonstrate that AI is producing production-grade infrastructure components.
This pattern suggests a saturation point in AI’s ability to automate core engineering tasks, driven by steady improvements across diverse domains. The convergence of these developments indicates that the engineering side of AI R&D is nearing full automation, while research remains a frontier still under active development.
“Clark’s conclusion is correct and possibly understated for engineering. The residual research question is real but may be less binding than the framing suggests.”
— Thorsten Meyer
Uncertainties About the Future of Research Automation
While current benchmarks show near-complete automation of engineering tasks, the extent to which AI can automate research—particularly creative and hypothesis-driven activities—remains uncertain. Clark leaves open whether research can be fully automated or if it will always require human insight, especially for conceptual breakthroughs. Additionally, the timeline for research automation reaching the same saturation levels as engineering is still unclear.
Next Steps in Monitoring AI Automation Progress
Researchers and institutions will closely monitor ongoing benchmark developments, especially as new versions of AI models are released. The pause on Kaggle leaderboard submissions signals that capabilities are advancing faster than measurement tools can keep up with. Future work will include developing more comprehensive benchmarks for research automation and exploring how AI-generated research impacts scientific progress and human roles.
Expect further announcements on AI’s ability to generate novel hypotheses, design experiments, and contribute to scientific discovery in the coming months, as well as discussions on the ethical and governance implications of increasingly autonomous AI in research.
Key Questions
What does it mean that engineering tasks are now automatable?
It indicates that AI systems can handle tasks like reproducing research, optimizing code, and designing infrastructure components at a level comparable to human experts, reducing the need for manual effort.
Will AI be able to fully automate research soon?
It is not yet clear. While progress is rapid, the automation of creative, hypothesis-driven research remains a challenge, and experts differ on how soon this gap will close.
What are the implications for AI research teams?
Teams may shift focus from engineering to conceptual innovation, with AI systems increasingly taking over routine and technical tasks, potentially speeding up development cycles.
Could this change the pace of scientific discovery?
Yes, if research automation reaches a high level, it could accelerate scientific breakthroughs, but also raises questions about oversight, ethics, and the role of human intuition.
Source: ThorstenMeyerAI.com