Six insights Buried in the Latest AI Research
- Andreas

- Jan 7
- 8 min read
It’s impossible to escape the constant stream of news about Artificial Intelligence. Headlines announce breathtaking breakthroughs and speculate on a future transformed by ever-smarter machines.
This relentless cycle of hype can make it difficult to see the full picture, often oscillating between narratives of unchecked progress and imminent doom.
The reality is that the current trajectory of AI development is running into fundamental, non-obvious limits, not of imagination, but to some degree resources, but more pressing: fairness and utility.
This article distills some of the takeaways from major reports and studies, including the Stanford Institute for Human-Centered AI (HAI) 2025 AI Index Report, plus about 70 other sources I collected via notebookLM. I recommend reading the overview from the above mentioned report as well, if you have not yet already.
The goal for me was to get a clearer picture of where the technology stands and where it might be heading, before delving deeper in what interests me more, the ethics of ai.
The data reveals an industry grappling with imminent data shortages, a scaling paradox that amplifies societal harm, a battle for dominance and money, and a surprising inversion of who actually benefits from AI-driven productivity. Compare for instance what Sam Altman of openai spread with what for instance the researches at deepmind did. One lies and the others won a nobel price
1. The Great Data Shortage: AI May Soon Run Out of Fuel
A common assumption is that AI's growth is limitless, driven by an endless internet of data to feed ever-larger models and the potential to train itself through reinforced learning.
According to an analysis by Epoch AI featured in the Stanford HAI report, we might be running out of high-quality training data. The research team projects with 80% confidence that the current stock of training data will be fully utilized between 2026 and 2032. The reality , therefore is, that the industry may soon hit a wall regarding how it has been operating until now.
The critical implication here is a potential bottleneck for the entire current paradigm of AI development, which relies on scaling up models with more and more data. We can see that also with various LLM learning from sources like reddit, where bots rehash topics that LLMs then learn from. (And we wonder why we have bad data output: Sh*t in, sh*t out.)
This insatiable demand for resources also comes at a significant environmental cost. The Stanford HAI report highlights the staggering growth in carbon emissions from model training. In 2020, training GPT-3 emitted an estimated 588 tons of carbon. By 2024, training Llama 3.1 405B emitted a staggering 8,930 tons. To put that in perspective, the average American emits about 18 tons of carbon per year.
Nvidia and asml, intel and others barely can keep up with chips, and they are using repurposed turbines to power data centers.
All this means, even now, even a small interference in the global logistics sector could potentially send massive ripples through a sector where a lot of companies are overvalued.
2. The Productivity Twist: AI's Biggest Boost Is for Novices, Not Experts
The prevailing narrative often focuses on how AI will augment or replace high-skilled knowledge workers. But multiple studies cited in the Stanford HAI report reveal a surprising twist: AI provides the most significant productivity gains to lower-skilled workers, not experts, effectively leveling the playing field.
A key study by Brynjolfsson et al. on customer support agents found that AI assistance increased the productivity of low-skill workers by an impressive 34%. For high-skill workers, however, the productivity gain was "indistinguishable from zero."
This pattern isn't isolated. Similar findings have been documented in studies on consulting (Dell’Acqua et al., 2023) and software engineering (Cui et al., 2024).
This trend challenges our core assumptions about who benefits most from AI, suggesting its primary role may be in augmenting skills and bridging experience gaps rather than simply replacing top-tier talent. What I see is, that ai reduces repetitive tasks significantly or skills up novice workers to do more complex tasks, which for instance frees up doctors to help patients instead of taking notes and filling out forms because others now can do that.
I think vibecoding is another good example: Only using prompts and a lot of critical thinking it allows non developers with ideas to build and deploy a fully functional app within a few days. (satoricheck is fully vibecoded) An expert Developer however, focusing on IT architecture, will not use such tools, and if only to find code and dependencies faster - repetitive easy tasks.
3. The Scaling Paradox: Bigger Isn't Always Better (or Fairer)
In the world of AI development, "scaling laws" have been a dominant philosophy: the belief that bigger models trained on more data are inherently more capable.
However, the Stanford HAI report highlights a crucial paradox that complicates this push for ever-larger models: scaling can introduce or amplify unintended biases in troubling ways.
One study by Birhane et al. (2024) found that for vision models, increasing the amount of training data consistently raised the likelihood of an image being classified with the label "criminal."
Another study discovered that as models scale, their implicit biases increase, for example, associating men with leadership roles or negative terms with Black individuals. This creates an "illusion of neutrality," where models appear to improve on standard benchmarks while simultaneously reinforcing harmful societal stereotypes under the surface.
What this reveals is the critical need for transparent dataset curation and independent audits to ensure that "more capable" doesn't also mean "more biased." Which are not things the current frontier leaders do, wish or work towards. On contrary.
Thus, we see an increase of agentic automation in the market. Smaller, specialised agents trained on specific data only to intelligently automate what had to be hardcoded previously. And I expect this sector to be the fastest growing in the coming two years, because these models solve actual problems.
4. The Crowded Frontier: The AI Race Is Tighter Than You Think
While public perception often centers on one or two dominant companies, with openai taking a lead (not only in terms of how much money they burn), the data shows that the AI frontier is becoming incredibly competitive.
According to the Stanford HAI report, the performance gap between the top models is shrinking rapidly, challenging the narrative of a winner-take-all market.
On the Chatbot Arena Leaderboard, which ranks models based on human-preference ratings, the Elo score difference between the top-ranked model and the 10th-ranked model narrowed from 11.9% in 2023 to just 5.4% by early 2025. Even more striking is the gap between the top two models, which shrank from 4.9% to a mere 0.7% in the same period. A similar convergence is happening between proprietary closed-weight models and their open-weight counterparts, with the performance gap having "nearly disappeared" by 2024. This intense competition means high-quality models are now available from a growing number of developers, which could democratize access to powerful AI and accelerate innovation. We see this with the variety of models with slightly different foci, and models like claude, gemini, GPT, lama, deepseek, grok, kimi, Mistral, Bloom and about 40 significant other models around the globe all have slightly different approaches and will further specialise. Language and country specific models are catching up and I think midterm we will see a bit a shift, where local models are integrated in daily usage versus business related models that have a more global approach.
5. The Tortoise and the Hare: Why Humans Still Outsmart AI Agents
AI systems are now routinely beating human benchmarks on specific, well-defined tasks. Particularly in perfect information settings. (A setting like chess would be a perfect information setting. You always know everythign and can calculate all outcomes in a fixed manner.)
But when it comes to complex, open-ended problem-solving, a more nuanced reality emerges. Deepmind showed this, how difficult it was for an ai to beat a gamer in the game warcraft.
The Stanford HAI report discusses the RE-Bench benchmark, which evaluates AI agents on complex tasks and reveals a classic "tortoise and the hare" dynamic.
In short time-horizon settings with a two-hour budget, top AI systems (the hare) score four times higher than human experts. However, when the time budget increases to 32 hours, the tables turn dramatically. Human performance (the tortoise) surpasses AI, outscoring it two-to-one. This suggests that while AI excels at rapid, tactical execution, humans still hold a decisive advantage in tasks requiring strategic, long-term planning and deep reasoning. As one academic paper notes:
While AI shows proficiency in observation and communication, it lacks the nuanced judgment and intentionality intrinsic to human cognition.
This finding emphasizes that for now, the most complex forms of reasoning and strategic planning remain uniquely human strengths.
I would go even further and say, until we have general artificial intelligence, this will be not even close. And once we have it, we better test it really, really well in a closed environment.
6. The Ethics Debate Is Surprisingly Shallow
For all the headline-grabbing panels and corporate pledges on AI ethics, a starkly different picture emerges from academic literature: the intellectual foundation for this conversation is surprisingly fragile.
An analysis in the journal MDPI reveals that many ethical debates in research are superficial, often focusing on technical limitations rather than engaging with fundamental principles.
The implication of this focus is that researchers are often debating how to patch symptoms (e.g., reduce a model's toxicity score by 5%) rather than addressing the root cause (e.g., the nature of the data and incentives that create toxicity in the first place). The paper notes that human-centered research on AI's ethical challenges remains "in its infancy," with most work concentrating on theoretical explorations instead of offering practical, grounded solutions.
This reveals a critical gap not just in the public discourse, but in the academic engine meant to guide it. Without a shift from theoretical explorations to grounded, actionable frameworks, we risk building policy and educational initiatives on a foundation of unresolve, and largely unexamined, ethical first principles.
I do find this alarming, yet, not too surprising. Sadly.
I can think of dozens of cases where people bring models to create content that is clearly immoral, harmful, misleading or damaging. Then some news about new guidelines and added rails.
Yes there might be technical difficulties, however, if the current leadership would have wanted to actually reduce harmful content, they would adjust the models, which they clearly can.
For example, one of the tellsignals of ai writing were the long em dashes "—" which I thought are the same dashes I used for inserted subordinated clauses in my papers at Uni back in the day. In November 2025, Open ai was in the news for "finally fixing the em dash"
I believe it is important to start more discussions about ethics of ai. practical. applicable. And then hold companies accountable.
Conclusion: Navigating a More Complex Future
The dominant narratives of AI often miss the bigger picture. The reality is far more nuanced than stories of either unstoppable progress or imminent doom.
Many of the industry's most pressing challenges, from the data shortage and the scaling paradox to the shallow ethics debate, are interconnected symptoms of a "bigger is better" mindset that has overlooked foundational constraints. We hear about this when terms like ai bubble are thrown around. Yes, some companies are overvalued and we will see a correction at some point. But ai has fundamentally added value and will continue to do so. Just think about the impact that alphafold had when releasing their millions of proteins to all researches in the world, effectively fasttracking others research that helps with diseases and illnesses around the globe.
As we move forward, a critical question must be about principles we want to pursue regarding artificial intelligence:
How do we shift our focus from building merely a more powerful AI to building a more understood, sustainable, and equitable one?
I will release the notebook via
once I found some time to do so.
Here's the obligatory dinopreneur picture.




Comments