Are we ready to entrust operating systems and corporate budgets to autonomous agents?

The transition from reactive chatbots to autonomous executive agents has ceased to be a theoretical speculation and has become the new infrastructure of our workflows. Observing the dynamics that emerged in recent days, a clear split in the market is evident: on one hand, the deep integration of artificial intelligence at the operating system level, and on the other, the harsh reality of computational costs that are putting old business models in crisis.

The approach based on the pure brute force of giant language models is showing its physical and economic limits. The real technical challenge is no longer having the absolute smartest model, but orchestrating fleets of specialized agents in a sustainable, secure, and measurable way.

Is artificial intelligence taking control of our operating systems?

OpenAI's move to transform Codex into a native autonomous agent for Windows 11 radically changes the rules of desktop automation. We are no longer talking about an assistant that suggests code snippets in an editor, but a system capable of taking control of the graphical interface to execute complex tasks without continuous supervision.

The model navigates through software, tests applications, and finds bugs operating exactly like a human user. The ability to start and monitor these routines via the ChatGPT mobile app allows delegating entire testing cycles, leaving the machine to work in the background. This approach makes many traditional automation tools based on rigid scripts obsolete, cutting down idle times in software development. Obviously, delegating the use of the UI to a machine raises important infrastructural questions, forcing a careful evaluation of the real risks when AI controls terminals in non-isolated production environments.

In parallel, Microsoft is reorganizing its strategy under the internal motto "Delivering one Copilot". The goal is to consolidate a currently fragmented ecosystem into a single super app guided by Scout, an always-on agent. From the rumors, two fundamental operational concepts emerge for those managing business processes.

The first novelty concerns the introduction of the "Routines" function within the GitHub Copilot environment, designed to schedule code-related tasks in silent execution. The second is the "Cowork" section, a hub that proactively aggregates data from calendars, emails, and corporate documents to prepare meetings or extract insights. Integrating everything into a single interface solves the cognitive disorientation caused by the simultaneous use of dozens of different tools, shifting the focus from simple chat to an automated and measurable engineering system.

Have giant language models become a black hole for budgets?

Financial data leaked this week offers a ruthless snapshot of the sustainability of generative artificial intelligence. On one hand, Anthropic is preparing for a historic initial public offering, driven by an annualized revenue of 47 billion dollars and a strong focus on the business sector with its Claude Code. On the other, OpenAI records negative operating margins of -122%, crushed by the titanic costs necessary for training and inferencing frontier models.

This discrepancy demonstrates that the "one-size-fits-all" approach no longer works. Blindly relying on a single provider with out-of-control structural costs represents a continuity risk for any company. This is why AI flat rates are disappearing in favor of increasingly granular consumption-based billing, as also demonstrated by recent developer protests over GitHub Copilot's switch to token pricing.

The problem reflects directly on operational budgets, particularly in marketing departments. The explosion of agentic artificial intelligence is burning through funds approved for 2026 at an alarming rate.

A standard chatbot consumes tokens for a single interaction. An autonomous agent, tasked with creating a brief or extracting SEO data, executes dozens of iterations in the background, multiplying costs by 10 or 50 times. Using frontier models like GPT-4o or Claude 3.5 Sonnet to generate simple social media texts, when an open source model would do the same job at a fraction of the cost, is a strategic mistake that drains resources.

It becomes essential to implement multi-provider architectures equipped with intelligent routing logic, capable of diverting simple tasks to cheaper models and reserving expensive computing power only for complex reasoning. Tracking costs per single workflow is the only way to demonstrate the real return on investment of automation. A pragmatic way to get there is an AI cost audit on the last four weeks of logs: real numbers instead of estimates, with every API call traceable back to the workflow that triggered it.

Insight Tecnico

Can we trust software that writes and spreads itself?

Security and code development are undergoing an equally extreme bidirectional transformation. On one hand, researchers at the University of Toronto have demonstrated the feasibility of an autonomous worm guided by "open-weight" models. This malware analyzes targets, identifies vulnerabilities, takes control of the machine, and clones itself by stealing local computing power to feed its own logic.

The experiment highlights the dark side of language model accessibility. If a malicious agent manages to pivot in real time by adapting its strategy, classic perimeter firewalls become ineffective. Treating machine-to-machine communications as potential attack vectors requires the adoption of extremely rigid zero-trust protocols, to understand how security and operations change when artificial intelligence has direct access to network resources.

On the other hand, internal data from Anthropic reveals that over 80%, with peaks of 90%, of the new code merged into their production codebase is generated entirely autonomously by Claude. The feature shipping rate has increased eightfold compared to the past.

Delegating almost the entirety of code writing to a model overturns the engineering process: humans are no longer programmers, but reviewers of machine-generated architectures.

This scenario materializes the concept of recursive self-improvement. Configuring local servers to let agents work at night, finding completed features and resolved bugs in the morning, is already an achievable practice. The real technical hurdle shifts entirely to quality control and the deterministic orchestration of flows, abandoning the idea of writing functions by hand to focus on managing fleets of autonomous agents.

What are the tools and news worth noting this week?

The ecosystem moves at a speed that makes it difficult to separate noise from concrete signals. Below is a reasoned summary of the most relevant tools and market dynamics for those building solutions based on artificial intelligence.

Market dynamics and infrastructure:

SoftBank has announced colossal investments of 75 billion euros in data centers in France, surpassing Toyota as the most valuable Japanese company precisely thanks to the AI infrastructure boom.
Alphabet is seeking 80 billion dollars on the market to finance its computing infrastructure, confirming that hardware competition is the real bottleneck of the sector.
The US Secretary of Defense has sanctioned Anthropic, blocking the use of Claude in the military sector: the reason is clear - the company's refusal to allow the use of its models for weapons targeting.
The United States has closed the last loopholes for exporting the most powerful Nvidia chips to China, intensifying the technological cold war.

Model and agent evolution:

Google has released the Gemma 4 12B family, introducing an encoder-free multimodal architecture capable of natively unifying the management of text, audio, and video.
Nvidia has unveiled Nemotron 3 Ultra, significantly raising the benchmarks for US open models, and has invested 20 billion to acquire the team of the startup Groq.
A Salesforce report demonstrates the practical effectiveness of automation: their agents reduced a complex software migration from 231 days to just two weeks.
Recent research highlights two critical failure modes: search agents tend to confirm their own biases rather than objectively exploring the web, and excessive training to make chatbots "helpful" drastically reduces their ability to simulate natural human behavior.

Operational tools and frameworks:

Ellf AI: a specialized platform to empower programming agents in the rapid development of complex NLP solutions.
Hermes Desktop: an open source application released by Nous Research that allows running custom agents entirely locally.
Pure Python MCP Server: a Model Context Protocol server written in Python to provide agents with direct access to project files without depending on complex frameworks.
LoCoMo Memory: a local memory system designed to integrate with Claude Code and Cursor, capable of retrieving context with latencies under 70 milliseconds.
LangGraph for Sales: an advanced framework to create agentic workflows capable of qualifying leads and updating CRMs in total autonomy.
Qdrant TurboQuant: a new vector quantization system that drastically reduces the size of search data while keeping the original geometry intact, fundamental for scaling vector databases.
Roboflow Offline: a practical solution for deploying computer vision models locally, ensuring minimal latency and strict respect for the privacy of acquired data.
TextGrad: a framework that implements textual autograd mechanics for optimizing code and structured reasoning directly on LLMs.

Is artificial intelligence taking control of our operating systems?

Have giant language models become a black hole for budgets?

Insight Tecnico

Can we trust software that writes and spreads itself?

Delegating almost the entirety of code writing to a model overturns the engineering process: humans are no longer programmers, but reviewers of machine-generated architectures.

What are the tools and news worth noting this week?

Market dynamics and infrastructure:

SoftBank has announced colossal investments of 75 billion euros in data centers in France, surpassing Toyota as the most valuable Japanese company precisely thanks to the AI infrastructure boom.
Alphabet is seeking 80 billion dollars on the market to finance its computing infrastructure, confirming that hardware competition is the real bottleneck of the sector.
The US Secretary of Defense has sanctioned Anthropic, blocking the use of Claude in the military sector: the reason is clear - the company's refusal to allow the use of its models for weapons targeting.
The United States has closed the last loopholes for exporting the most powerful Nvidia chips to China, intensifying the technological cold war.

Model and agent evolution:

Google has released the Gemma 4 12B family, introducing an encoder-free multimodal architecture capable of natively unifying the management of text, audio, and video.
Nvidia has unveiled Nemotron 3 Ultra, significantly raising the benchmarks for US open models, and has invested 20 billion to acquire the team of the startup Groq.
A Salesforce report demonstrates the practical effectiveness of automation: their agents reduced a complex software migration from 231 days to just two weeks.
Recent research highlights two critical failure modes: search agents tend to confirm their own biases rather than objectively exploring the web, and excessive training to make chatbots "helpful" drastically reduces their ability to simulate natural human behavior.

Operational tools and frameworks:

Ellf AI: a specialized platform to empower programming agents in the rapid development of complex NLP solutions.
Hermes Desktop: an open source application released by Nous Research that allows running custom agents entirely locally.
Pure Python MCP Server: a Model Context Protocol server written in Python to provide agents with direct access to project files without depending on complex frameworks.
LoCoMo Memory: a local memory system designed to integrate with Claude Code and Cursor, capable of retrieving context with latencies under 70 milliseconds.
LangGraph for Sales: an advanced framework to create agentic workflows capable of qualifying leads and updating CRMs in total autonomy.
Qdrant TurboQuant: a new vector quantization system that drastically reduces the size of search data while keeping the original geometry intact, fundamental for scaling vector databases.
Roboflow Offline: a practical solution for deploying computer vision models locally, ensuring minimal latency and strict respect for the privacy of acquired data.
TextGrad: a framework that implements textual autograd mechanics for optimizing code and structured reasoning directly on LLMs.

Are we ready to entrust operating systems and corporate budgets to autonomous agents?

Is artificial intelligence taking control of our operating systems?

Have giant language models become a black hole for budgets?

Can we trust software that writes and spreads itself?

What are the tools and news worth noting this week?

Lavora Meglio con l'Intelligenza Artificiale

Before you go, I recommend you also read these insights.

Are Chinese open-weight models and multi-agent swarms redefining artificial intelligence infrastructure?

Will the collapse of inference costs and new autonomous agents make artificial intelligence scalable?

Are dynamic routing and low-cost models the real solution for scaling autonomous agents?

Are we ready to entrust operating systems and corporate budgets to autonomous agents?

Listen to the Insight

Is artificial intelligence taking control of our operating systems?

Have giant language models become a black hole for budgets?

Can we trust software that writes and spreads itself?

What are the tools and news worth noting this week?

Lavora Meglio con l'Intelligenza Artificiale

Before you go, I recommend you also read these insights.

Are Chinese open-weight models and multi-agent swarms redefining artificial intelligence infrastructure?

Will the collapse of inference costs and new autonomous agents make artificial intelligence scalable?

Are dynamic routing and low-cost models the real solution for scaling autonomous agents?

Fabrizio Mazzei

Listen to the Insight

Fabrizio Mazzei

Are we ready to entrust operating systems and corporate budgets to autonomous agents?

Is artificial intelligence taking control of our operating systems?

Have giant language models become a black hole for budgets?

Can we trust software that writes and spreads itself?

What are the tools and news worth noting this week?

Found it useful? I have more like this.

Lavora Meglio con l'Intelligenza Artificiale

Before you go, I recommend you also read these insights.

Are Chinese open-weight models and multi-agent swarms redefining artificial intelligence infrastructure?

Will the collapse of inference costs and new autonomous agents make artificial intelligence scalable?

Are dynamic routing and low-cost models the real solution for scaling autonomous agents?

Are we ready to entrust operating systems and corporate budgets to autonomous agents?

Listen to the Insight

Is artificial intelligence taking control of our operating systems?

Have giant language models become a black hole for budgets?

Can we trust software that writes and spreads itself?

What are the tools and news worth noting this week?

Found it useful? I have more like this.

Lavora Meglio con l'Intelligenza Artificiale

Before you go, I recommend you also read these insights.

Are Chinese open-weight models and multi-agent swarms redefining artificial intelligence infrastructure?

Will the collapse of inference costs and new autonomous agents make artificial intelligence scalable?

Are dynamic routing and low-cost models the real solution for scaling autonomous agents?

Fabrizio Mazzei

Listen to the Insight

Fabrizio Mazzei