Why Our AI Agents Are Built for the Real Future — Not the Inflated Bubble of Hyper-scaled LLMs

Nov 26, 2025
6 min read

The AI industry is filled with warnings about a looming bubble, and the concerns are legitimate. Valuations have reached extreme levels, data-center spending has exploded (Microsoft, Google, Meta, and Amazon alone have committed more than $320 billion for 2025), and yet many enterprise AI pilots still fail to produce clear, measurable returns on investment. Some analysts compare the situation to the dot-com mania, while others see a genuine infrastructure build-out similar to the early internet or railroads. As of late 2025, Nvidia’s order backlogs stretch years into the future, yet companies like CoreWeave are already cutting capital-expenditure guidance because power-grid constraints cannot keep pace.

These worries are valid, but the bubble is not in AI as a whole. It is in one specific bet: that endlessly scaling a single giant language model (or thin wrappers around one) will somehow deliver artificial general intelligence, profitable products, and transformative economics. That bet is showing deep cracks. Recent research backs this up. A 2025 analysis found that transformer models hit a hard mathematical ceiling on creativity: they can only remix past data, and they can’t be both highly novel and highly accurate at the same time. The more “creative” they get, the less reliable they become. This isn’t a tuning issue, it’s baked into the architecture itself.

Large language models are remarkable pattern-matchers that predict the next token with impressive fluency because they have internalized statistical correlations across trillions of tokens. They are not, however, on a path to general intelligence, any more than a mechanical calculator was on a path to becoming a human mathematician, no matter how large the gears became. Prominent researchers including Yann LeCun, François Chollet, and Gary Marcus have long argued that pure transformer scaling faces architectural limits. By late 2025 even insiders at frontier labs acknowledge that pre-training gains have largely flattened and that most apparent progress now comes from test-time compute and elaborate scaffolding rather than from raw scale.

The current frontier strategy pursued by OpenAI, xAI, Anthropic, Google, and others rests on a single hypothesis: pour exponentially more data and compute into one monolithic model and AGI will eventually emerge. This is the modern equivalent of believing that if the ENIAC team had simply built a room-sized computer the size of a city block, or put it in orbit for better cooling, the iPhone would have arrived in 1955.

The economics of this approach are equally unsustainable. Frontier-model inference already consumes hundreds to thousands of watts per query, training runs devour energy equivalent to the lifetime emissions of dozens of cars, and serving millions of users requires data-center power demands that scale linearly with usage. These systems remain viable today only because of heavy subsidies in the form of cheap GPUs, tax credits, and venture capital. When those dry up, subscription pricing cannot possibly cover the marginal cost of inference, especially for tools that still hallucinate or fail on roughly one in ten complex tasks and require constant cloud connectivity.

A second issue is architectural: generative AI is fundamentally bounded by its statistical nature. Recent work demonstrates that these models reach only “amateur-level creativity,” unable to generate truly original or expert-grade reasoning because they operate within the limits of known token distributions. That ceiling may be acceptable for chatbots, marketing copy, or entertainment apps, but not for surveillance, security, ISR, autonomous drones, or contested environments where rare events, edge cases, and ambiguous signals define the reality of the mission.

We reject the dogma that bigger is always better. Instead, we draw inspiration from the true computing revolutions: the shift from room-sized mainframes to personal computers in the 1970s and 1980s, and from centralized servers to smartphones in every pocket starting in 2007. Real scaling has always come through radical efficiency gains, not brute-force size.

Timeline of miniaturization and integration of digital technology

That is why our AI agents are designed from the ground up to be modular, efficient, and edge-first. They combine lightweight language models with specialized convolutional networks, reinforcement learners, and symbolic modules, each optimized for a single function with near-perfect reliability in its domain. Generative models also collapse under edge conditions because their entire design optimizes for a statistical average, not for operational certainty. The research community now acknowledges a built-in tension between “novel” and “correct” output. When visibility degrades, sensors fail, or noise increases, these models lose coherence rapidly. In contrast, our agents lean on deterministic primitives and physics-driven modules that do not degrade into guesswork under pressure. The entire system runs locally on device or on-premise hardware with latency below 50 milliseconds, no data exfiltration, full offline capability, and total power consumption in the 5-to-50-watt range, comparable to the human brain rather than a small power plant. When additional compute and inference time is available, the models can make use of that, expanding their reasoning horizon, integrating multi-sensor context more deeply, and refining outputs with higher-order logic and temporal analysis.

But when compute is limited or the environment is degraded, the system contracts gracefully, falling back to fast, deterministic primitives that preserve mission-critical reliability. This elastic architecture ensures that whether the agent is running on a drone in heavy smoke, a security camera during a power event, or a ground station with full servers behind it, performance remains stable, predictable, and operationally useful. The result is an AI system that scales with the environment, not the other way around, something traditional end-to-end vision models simply cannot do. This is the opposite approach of generative models, which degrade unpredictably under stress because their outputs are governed by probability distributions, not operational constraints. The mathematical ceiling documented in recent research explains why: these models cannot maintain both reliability and novelty, so when pushed outside of familiar distributions, they default to hallucination.

Ownership is outright rather than rented: a one-time purchase or enterprise license, with the ability to fine-tune models locally without ever sending proprietary data to the cloud. These agents are built specifically for missions where privacy, speed, and deterministic behavior are non-negotiable, surveillance, security, defense, autonomous drones, secure facilities, and contested environments. In these settings, an agent that hallucinates a threat or misses one because of statistical confusion is not merely inconvenient; it is unacceptable.

By late 2025 the hardware ecosystem for this vision is mature. Apple, Qualcomm, Nvidia, and Intel all ship system-on-chip designs with 40 to more than 100 TOPS of dedicated AI acceleration. Gartner reports that roughly seventy-five percent of enterprise data is already processed at the edge. The market demand is clear and growing.

Let’s address some common counter-arguments. Some claim that scaling has delivered every leap so far, from GPT-4 to o3-style models. In reality, most recent gains come from test-time compute and scaffolding, not pure pre-training scale; the low-hanging fruit is exhausted and marginal returns are collapsing. Others insist edge devices can never match cloud performance. They do not need to match token throughput; they already outperform massive cloud models on latency, cost, privacy, and resilience for real-world embodied tasks. Still others argue that defense and surveillance require massive central compute. On the contrary, the hardest challenges, perception, real-time decision loops, and operation in denied environments, are inherently edge problems, while centralized models create single points of failure and potential backdoors.

The coming years will separate two very different futures for AI. One future consolidates into a handful of cloud monopolies dependent on unlimited energy and constant connectivity, carrying severe geopolitical and privacy risks. The other future distributes intelligence efficiently to the edge, putting reliable, private, and affordable AI into every device and facility that needs it.

We are building for the second future, AI that works reliably when lives and security are on the line, without subscriptions to distant oracles that might hallucinate at the worst possible moment. This approach has the added benefit of helping to ease public worries of a fictional HAL-3000, or SKYNET capable of gaining total control across all hardware.

The bubble is not in AI itself but in the belief that one architecture, trained on everything, can solve anything. The past year’s research now shows that generative AI has hard limits in accuracy, originality, and reliability, limits that cannot be solved with more data or more GPUs. True intelligence for the real world requires modularity, determinism, and architectures built to handle uncertainty rather than collapse under it.

Written by the Absentia Leadership Team

Absentia Tech builds AI systems for video enhancement and analysis, processing footage from security cameras, drones, and autonomous vehicles—both in real-time for live monitoring and post-mission for incident investigation. Everything runs on NVIDIA hardware.

References

Cropley, D. H. (2025). “The Cat Sat on the …?” Why Generative AI Has Limited Creativity. Journal of Creative Behavior. DOI: 10.1002/jocb.70077 (as reported in news outlets) EurekAlert!+2Tech Xplore+2

Dolan, E. W. (2025, November 24). A mathematical ceiling limits generative AI to amateur-level creativity. PsyPost. PsyPost - Psychology News

de Rooij, A., & Biskjaer, M. M. (2025, August 30). Despite the hype, generative AI hasn’t outshined humans in creative idea generation. PsyPost. PsyPost - Psychology News

Haase, J., Hanel, P. H. P., & Pokutta, S. (2025). Has the Creativity of Large-Language Models Peaked? An analysis of inter- and intra-LLM variability. arXiv. arXiv

Keon, M., Karim, A., Lohana, B., Nguyen, T., Hamilton, T., Abbas, A., … & Nguyen, T. (2025). Galton’s Law of Mediocrity: Why Large Language Models Regress to the Mean and Fail at Creativity in Advertising. arXiv. arXiv

Schapiro, S., Shashidhar, S., Gladstone, A., Black, J., Moon, R., Hakkani-Tur, D., & Varshney, L. R. (2025). Combinatorial Creativity: A New Frontier in Generalization Abilities. arXiv. arXiv

Rondini, S., Álvarez-Martín, C., Angermair-Barkai, P., Penacchio, O., Paz, M., Pelowski, M., … & Cerda-Company, X. (2025). Stable diffusion models reveal a persisting human and AI gap in visual creativity. arXiv. arXiv

Why Our AI Agents Are Built for the Real Future — Not the Inflated Bubble of Hyper-scaled LLMs

References

Recent Posts

Comments