"AI agents are integrating into operating systems and writing much of the code, but execution costs are exploding. How to manage the agentic revolution without draining corporate budgets."
The transition from reactive chatbots to autonomous executive agents has ceased to be a theoretical speculation and has become the new infrastructure of our workflows. Observing the dynamics that emerged in recent days, a clear split in the market is evident: on one hand, the deep integration of artificial intelligence at the operating system level, and on the other, the harsh reality of computational costs that are putting old business models in crisis.
The approach based on the pure brute force of giant language models is showing its physical and economic limits. The real technical challenge is no longer having the absolute smartest model, but orchestrating fleets of specialized agents in a sustainable, secure, and measurable way.
OpenAI's move to transform Codex into a native autonomous agent for Windows 11 radically changes the rules of desktop automation. We are no longer talking about an assistant that suggests code snippets in an editor, but a system capable of taking control of the graphical interface to execute complex tasks without continuous supervision.
The model navigates through software, tests applications, and finds bugs operating exactly like a human user. The ability to start and monitor these routines via the ChatGPT mobile app allows delegating entire testing cycles, leaving the machine to work in the background. This approach makes many traditional automation tools based on rigid scripts obsolete, cutting down idle times in software development. Obviously, delegating the use of the UI to a machine raises important infrastructural questions, forcing a careful evaluation of the real risks when AI controls terminals in non-isolated production environments.
In parallel, Microsoft is reorganizing its strategy under the internal motto "Delivering one Copilot". The goal is to consolidate a currently fragmented ecosystem into a single super app guided by Scout, an always-on agent. From the rumors, two fundamental operational concepts emerge for those managing business processes.
The first novelty concerns the introduction of the "Routines" function within the GitHub Copilot environment, designed to schedule code-related tasks in silent execution. The second is the "Cowork" section, a hub that proactively aggregates data from calendars, emails, and corporate documents to prepare meetings or extract insights. Integrating everything into a single interface solves the cognitive disorientation caused by the simultaneous use of dozens of different tools, shifting the focus from simple chat to an automated and measurable engineering system.
Financial data leaked this week offers a ruthless snapshot of the sustainability of generative artificial intelligence. On one hand, Anthropic is preparing for a historic initial public offering, driven by an annualized revenue of 47 billion dollars and a strong focus on the business sector with its Claude Code. On the other, OpenAI records negative operating margins of -122%, crushed by the titanic costs necessary for training and inferencing frontier models.
This discrepancy demonstrates that the "one-size-fits-all" approach no longer works. Blindly relying on a single provider with out-of-control structural costs represents a continuity risk for any company. This is why AI flat rates are disappearing in favor of increasingly granular consumption-based billing, as also demonstrated by recent developer protests over GitHub Copilot's switch to token pricing.
The problem reflects directly on operational budgets, particularly in marketing departments. The explosion of agentic artificial intelligence is burning through funds approved for 2026 at an alarming rate.
A standard chatbot consumes tokens for a single interaction. An autonomous agent, tasked with creating a brief or extracting SEO data, executes dozens of iterations in the background, multiplying costs by 10 or 50 times. Using frontier models like GPT-4o or Claude 3.5 Sonnet to generate simple social media texts, when an open source model would do the same job at a fraction of the cost, is a strategic mistake that drains resources.
It becomes essential to implement multi-provider architectures equipped with intelligent routing logic, capable of diverting simple tasks to cheaper models and reserving expensive computing power only for complex reasoning. Tracking costs per single workflow is the only way to demonstrate the real return on investment of automation.

Security and code development are undergoing an equally extreme bidirectional transformation. On one hand, researchers at the University of Toronto have demonstrated the feasibility of an autonomous worm guided by "open-weight" models. This malware analyzes targets, identifies vulnerabilities, takes control of the machine, and clones itself by stealing local computing power to feed its own logic.
The experiment highlights the dark side of language model accessibility. If a malicious agent manages to pivot in real time by adapting its strategy, classic perimeter firewalls become ineffective. Treating machine-to-machine communications as potential attack vectors requires the adoption of extremely rigid zero-trust protocols, to understand how security and operations change when artificial intelligence has direct access to network resources.
On the other hand, internal data from Anthropic reveals that over 80%, with peaks of 90%, of the new code merged into their production codebase is generated entirely autonomously by Claude. The feature shipping rate has increased eightfold compared to the past.
Delegating almost the entirety of code writing to a model overturns the engineering process: humans are no longer programmers, but reviewers of machine-generated architectures.
This scenario materializes the concept of recursive self-improvement. Configuring local servers to let agents work at night, finding completed features and resolved bugs in the morning, is already an achievable practice. The real technical hurdle shifts entirely to quality control and the deterministic orchestration of flows, abandoning the idea of writing functions by hand to focus on managing fleets of autonomous agents.
The ecosystem moves at a speed that makes it difficult to separate noise from concrete signals. Below is a reasoned summary of the most relevant tools and market dynamics for those building solutions based on artificial intelligence.
Market dynamics and infrastructure:
Model and agent evolution:
Operational tools and frameworks:

My practical AI guide focused on real everyday work tasks: emails, reports, slides, data, and automation. Practical examples and ready-to-use prompts to save time and work better right away.

Gartner reports and IT benchmarks shatter the illusions of absolute autonomy. While Claude Mythos disrupts cybersecurity, the market must refocus on governance and reliability.

OpenAI and Google are redesigning platforms around autonomous agents, while the elimination of the 'language tax' revolutionizes communication between models. In Italy, the AI market explodes to 1.8 billion.

Data confirms the acceleration of artificial intelligence in Italy, requiring a rapid update of skills. Meanwhile, the native integration of agents on Notion and Android definitively transforms how we orchestrate data and apps.
AI Audio Version
Listen while driving or coding.
As an AI Solutions Architect I design digital ecosystems and autonomous workflows. Almost 10 years in digital marketing, today I integrate AI into business processes: from Next.js and RAG systems to GEO strategies and dedicated training. I like to talk about AI and automation, but that's not all: I've also written a book, "Work Better with AI", a practical handbook with 12 chapters and over 200 ready-to-use prompts for those who want to use ChatGPT and AI without programming. My superpower? Looking at a manual process and already seeing the automated architecture that will replace it.