Understanding AI-native cloud: from microservices to model-serving 29 Dec 2025, 12:12 pm

Cloud computing has fundamentally transformed the way enterprises operate. Initially built for more basic, everyday computing tasks, its capabilities have expanded exponentially with the advent of new technologies (such as machine learning and analytics).

But AI — particularly generative AI and the emerging class of AI agents — presents all-new challenges for cloud architectures. It is resource-hungry, demands ultra-fast latency, and requires new compute pathways and data access. These capabilities can’t simply be bolted on to existing cloud infrastructures.

Simply put, AI has upended the traditional cloud computing paradigm, leading to a new category of infrastructure: AI-native cloud.

Understanding AI-native cloud

AI-native cloud, or cloud-native AI, is still a new concept, but it is broadly understood as an extension of cloud native. It is infrastructure built with AI and data as cornerstones, allowing forward-thinking enterprises to infuse AI into their operations, strategies, analysis, and decision-making processes from the very start.

Differences between AI-native and traditional cloud models

Cloud computing has become integral to business operations, helping enterprises scale and adopt new technologies. In recent years, many organizations have shifted to a ‘cloud native’ approach, meaning they are building and running apps directly in the cloud to take full advantage of its benefits and capabilities. Many of today’s modern applications live in public, private, and hybrid clouds.

According to the Cloud Native Computing Foundation (CNCF), cloud native approaches incorporate containers, service meshes, microservices, immutable infrastructure, and declarative APIs. “These techniques enable loosely coupled systems that are resilient, manageable, and observable,” CNCF explains.

5 things you need to know about AI-native cloud

  1. AI is the core technology: In a traditional cloud, AI is an add-on. In an AI-native cloud, every layer—from storage to networking — is designed to handle the high-throughput, low-latency demands of large models.
  2. GPU-first orchestration: AI-native clouds prioritize GPUs and TPUs. This requires advanced orchestration tools like Kubernetes for AI to manage distributed training and inference economics
  3. The vector foundation: Data modernization is the price of entry. AI-native clouds rely on vector databases to provide long-term memory for AI models, allowing them to access proprietary enterprise data in real-time without hallucinating
  4. Rise of neoclouds: 2026 will seei the rise of specialized neocloud providers (like CoreWeave or Lambda) that offer GPU-centric infrastructure that hyperscalers are often struggling to match in terms of raw performance and cost.
  5. From AIOps to agenticops: The goal isn’t just a faster system; it’s a self-operating one. AI-native cloud allows for agentic AI can autonomously manage network traffic, resolve IT tickets, and optimize cloud spend.

AI-native cloud is an evolution of this strategy, applying cloud-native patterns and principles to build and deploy scalable, repeatable AI apps and workloads. This can help devs and builders overcome key challenges and limitations when it comes to building, running, launching, and monitoring AI workloads with traditional infrastructures.

The challenges with AI in the cloud

The cloud is an evolution of legacy infrastructures, but it was largely built with software-as-a-service (SaaS) and other as-a-service models in mind. In this setting, AI, ML, and advanced analytics become just another workload, as opposed to a core, critical component.

But AI is much  more demanding than traditional workflows, which, when run in the cloud, can lead to higher computing costs, data bottlenecks, hampered performance, and other critical issues.

Generative AI, in particular, requires the following:

  • Specialized hardware and significant computational power
  • Infrastructure that is scalable and flexible
  • Massive and diverse datasets for iterative training
  • High-performance storage, high bandwidth and throughput, diverse data sets, and low-latency access to data

AI data needs are significant and continue to escalate as systems become more complex; data must be processed, handled, managed, transferred, and analyzed rapidly and accurately to ensure the success of AI projects. Distributed computing, parallelism (splitting AI tasks across multiple CPUs or GPUs), ongoing training and iteration, and efficient data handling are essential — but traditional cloud infrastructures can struggle to keep up.

Existing infrastructure simply lacks the flexibility demanded by more intense, complex AI and ML workflows. It can also fragment the user experience, meaning devs and builders have to move back and forth between numerous interfaces, instead of a unified plane.

Essential components of AI-native cloud

Rather than the traditional “lift and shift” cloud migration strategy — where apps and workloads are quickly moved to the cloud “as-is” without redesign — AI-native cloud requires a fundamental redesign and rewiring of infrastructures for a clean slate.

This refactoring involves many of the key principles of cloud-native builds, but in a way that supports the development of AI applications. It requires the following:

  • Microservices architecture
  • Containerized packaging and orchestration
  • Continuous integration/continuous delivery (CI/CD) DevOps practices
  • Observability tools
  • Dedicated data storage
  • Managed services and cloud-native products (like Kubernetes, Terraform, or OpenTelemetry)
  • More complex infrastructures like vector databases

Data modernization is critical for AI; systems require data flow in real time from data lakes, lakehouses or other stores, the ability to connect data and provide context for models, and clear rules for how to use and manage data.

AI workloads must be built in from the start, with training, iteration, deployment, monitoring, and version control capabilities all part of the initial cloud setup. This allows models to be managed just like any other service.

AI-native cloud infrastructures must also support continuous AI evolution. Enterprises can incorporate AIOps, MLOps, and FinOps practices to support efficiency, flexibility, scalability, and reliability. Monitoring tools can flag issues with models (like drift, or performance degradation over time), and security and governance guardrails can support encryption, identity verification, regulatory compliance, and other safety measures.

According to CNCF, AI-native cloud infrastructures can use the cloud’s underlying computing network (CPUs, GPUs, or Google’s TPUs) and storage capabilities to accelerate AI performance and reduce costs.

Dedicated, built-in orchestration tools can do the following:

  • Automate model delivery via CI/CD pipelines
  • Enable distributed training
  • Support scalable data science to automate ML
  • Provide infrastructure for model serving
  • Facilitate data storage via vector databases and other data architectures
  • Enhance model, LLM, and workload observability

The benefits of AI-native cloud and business implications

There are numerous benefits when AI is built in from the start, including the following:

  • Automation of routine tasks
  • Real-time data processing and analytics
  • Predictive insights and predictive maintenance
  • Supply chain management
  • Resource optimization
  • Operational efficiency and scalability
  • Hyper-personalization at scale for tailored services and products
  • Continuous learning, iteration and improvement through ongoing feedback loops.

Ultimately, AI-native cloud allows enterprises to embed AI from day one, unlocking automation, real-time intelligence, and predictive insights to support efficiency, scalability, and personalized experiences.

Paths to the AI-native cloud

Like any technology, there is no one-size-fits-all for AI-native cloud infrastructures.

IT consultancy firm Forrester identifies five “paths” to the AI-native cloud that align with key stakeholders including business leaders, technologists, data scientists, and governance teams. These include:

The open-source AI ecosystem

The cloud embedded Kubernetes into enterprise IT, and what started out as an open-source container orchestration system has evolved into a “flexible, multilayered platform with AI at the forefront,” according to Forrester.

The IT firm identifies different domains in open-source AI cloud, including model-as-a-service, and predicts that devs will shift from local compute to distributed Kubernetes clusters, and from notebooks to pipelines. This “enables direct access to open-source AI innovation.”

AI-centric neo-PaaS

Cloud platform-as-a-service (PaaS) streamlined cloud adoption. Now, Kubernetes-based PaaS provides access to semifinished or prebuilt platforms that abstract away “much or all” of the underlying infrastructure, according to Forrester. This supports integration with existing data science workflows (as well as public cloud platforms) and allows for flexible self-service AI development.

Public cloud platform-managed AI services

Public clouds have taken a distinctly enterprise approach, bringing AI “out of specialist circles into the core of enterprise IT,” Forrester notes. Initial custom models have evolved into widely-used platforms including Microsoft Azure AI Foundry, Amazon Bedrock, Google Vertex, and others. These provided early, easy entry points for exploration, and now serve as the core of many AI-native cloud strategies, appealing to technologists, data scientists, and business teams.

AI infrastructure cloud platforms (neocloud)

AI cloud platforms, or neoclouds, are providing platforms that minimize the use of CPU-based cloud tools (or eliminate it altogether). This approach can be particularly appealing for AI startups and enterprises with “aggressive AI programs,” according to Forrester, and is also a draw for enterprises with strong and growing data science programs.

Data/AI cloud platforms

Data infrastructure providers like Databricks and Snowflake have been using cloud infrastructures from leading providers to hone their own offerings. This has positioned them to provide first-party gen AI tools for model building, fine-tuning, and deployment. This draws on the power of public cloud platforms while insulating customers from those complex infrastructures. This “data/AI pure play” is attractive to enterprises looking to more closely align their data scientists and AI devs with business units, Forrester notes.

Ultimately, when pursuing AI-native cloud options, Forrester advises the following:

  • Start with your primary cloud vendor: Evaluate their AI services and develop a technology roadmap before switching to another provider. Consider adding new vendors if they “dangle a must-have AI capability” your enterprise can’t afford to wait for. Also, tap your provider’s AI training to grow skills throughout the enterprise.
  • Resist the urge of “premature” production deployments: Projects can go awry without sufficient reversal plans, so adopt AI governance that assesses model risk in the context of a particular use case.
  • Learn from your AI initiatives: Take stock of what you’ve done and assess whether your technology needs a refresh or an “outright replacement,” and generalize lessons learned to share across the business.
  • Scale AI-native cloud incrementally based on success in specific domains: Early adoption focused on recommendation and information retrieval and synthesis; internal productivity-boosting apps have since proved advantageous. Start with strategy and prove that the technology can work in a particular area and be translated elsewhere.
  • Take advantage of open-source AI: Managed services platforms like AWS Bedrock, Azure OpenAI, Google Vertex, and others were early entrants in the AI space, but they also offer various open-source opportunities that enterprises of different sizes can customize to their particular needs.

Conclusion

AI-native cloud represents a whole new design philosophy for forward-thinking enterprises. The limits of traditional cloud architectures are becoming increasingly clear, and tomorrow’s complex AI systems can’t be treated as “just another workload.” Next-gen AI-native cloud infrastructures put AI at the core and allow systems to be managed, governed, and improved just like any other mission-critical service.

(image/jpeg; 0.29 MB)

React2Shell: Anatomy of a max-severity flaw that sent shockwaves through the web 29 Dec 2025, 3:03 am

The React 19 library for building application interfaces was hit with a remote code vulnerability, React2Shell, about a month ago. However, as researchers delve deeper into the bug, the larger picture gradually unravels.

The vulnerability enables unauthenticated remote code execution through React Server Components, allowing attackers to execute arbitrary code on affected servers via a crafted request. In other words, a foundational web framework feature quietly became an initial access vector.

What followed was a familiar but increasingly compressed sequence. Within hours of disclosure, multiple security firms confirmed active exploitation in the wild. Google’s Threat Intelligence Group (GTIG) and AWS both reported real-world abuse, collapsing the already-thin gap between vulnerability awareness and compromise.

“React2Shell is another reminder of how fast exploitation timelines have become,” said Nathaniel Jones, field CISO at Darktrace. “The CVE drops, a proof-of-concept is circulating, and within hours you’re already seeing real exploitation attempts.”

That speed matters because React Server Components are not a niche feature. They are embedded into default React and Next.js deployments across enterprise environments, meaning organizations inherited this risk simply by adopting mainstream tooling.

Different reports add new signals

While researchers agreed on the root cause, multiple individual reports have emerged, sharpening the overall picture.

For instance, early analysis by cybersecurity firm Wiz demonstrated how easily an unauthenticated input can traverse the React Server Components pipeline and reach dangerous execution paths, even in clean, default deployments. Unit 42 has expanded on this by validating exploit reliability across environments and emphasizing the minimal variation attackers needed to succeed.

Google and AWS have added operational context by confirming exploitation by multiple threat categories, including state-aligned actors, shortly after disclosure. That validation moved React2Shell out of the “potentially exploitable” category and into a confirmed active risk.

A report from Huntress has shifted focus by documenting post-exploitation behavior. Rather than simple proof-of-concept shells, attackers were observed deploying backdoors and tunneling tools, signalling that React2Shell was already being used as a durable access vector rather than a transient opportunistic hit, the report noted.

However, not all findings amplified urgency. Patrowl’s controlled testing showed that some early exposure estimates were inflated due to version-based scanning and noisy detection logic.

Taken together, the research painted a clearer, more mature picture within days (not weeks) of disclosure.

What the research quickly agreed on

Across early reports from Wiz, Palo Alto Networks’ Unit 42, Google AWS, and others, there was a strong alignment on the core mechanics of React2Shell. Researchers independently confirmed that the flaw lives inside React’s server-side rendering pipeline and stems from unsafe deserialization in the protocol used to transmit component data between client and server.

Multiple teams confirmed that exploitation does not depend on custom application logic. Applications generated using standard tools were vulnerable by default, and downstream frameworks such as Next.js inherited the issue rather than introducing it independently. That consensus reframed React2Shell from a “developer mistake” narrative into a framework-level failure with systemic reach.

This was the inflection point. If secure-by-design assumptions no longer hold at the framework layer, the defensive model shifts from “find misconfigurations” to “assume exposure.”

Speed-to-exploit as a defining characteristic

One theme that emerged consistently across reports was how little time defenders had to react. Jones said Darktrace’s own honeypot was exploited in under two minutes after exposure, strongly suggesting attackers had automated scanning and exploitation workflows ready before public disclosure. “Threat actors already had scripts scanning for the vulnerability, checking for exposed servers, and firing exploits without any humans in the loop,” he said.

Deepwatch’s Frankie Sclafani framed this behavior as structural rather than opportunistic. The rapid mobilization of multiple China-linked groups, he noted, reflected an ecosystem optimized for immediate action. In that model, speed-to-exploit is not a secondary metric but a primary measure of operational readiness. “When a critical vulnerability like React2Shell is disclosed, these actors seem to execute pre-planned strategies to establish persistence before patching occurs,” he said.

This matters because it undercuts traditional patch-response assumptions. Even well-resourced enterprises rarely patch and redeploy critical systems within hours, creating an exposure window that attackers now reliably expect.

What exploitation looked like in practice

Almost immediately after the December 3 public disclosure of React2Shell, active exploitation was observed by multiple defenders. Within hours, automated scanners and attacker tools probed internet-facing React/Next.js services for the flaw.

Threat intelligence teams confirmed that China-nexus state-aligned clusters, including Earth Lumia and Jackpot Panda, were among the early actors leveraging the defect to gain server access and deploy follow-on tooling. Beyond state-linked activity, reports from Unit42 and Huntress detailed campaigns deploying Linux backdoors, reverse proxy tunnels, cryptomining kits, and botnet implants against exposed targets. This was a sign that both espionage and financially motivated groups are capitalizing on the bug.

Data from Wiz and other responders indicates that dozens of distinct intrusion efforts have been tied to React2Shell exploitation, with compromised systems ranging across sectors and regions. Despite these confirmed attacks and public exploit code circulating, many vulnerable deployments remain unpatched, keeping the window for further exploitation wide open.

The lesson React2Shell leaves behind

React2Shell is ultimately less about React than about the security debt accumulating inside modern abstractions. As frameworks take on more server-side responsibility, their internal trust boundaries become enterprise attack surfaces overnight.

The research community mapped this vulnerability quickly and thoroughly. Attackers moved even faster. For defenders, the takeaway is not just to patch, but to reassess what “default safe” really means in an ecosystem where exploitation is automated, immediate, and indifferent to intent.

React2Shell is rated critical, carrying a CVSS score of 10.0, reflecting its unauthenticated remote code execution impact and broad exposure across default React Server Components deployments. React maintainers and downstream frameworks such as Next.js have released patches, and researchers broadly agree that affected packages should be updated immediately.

Beyond patching, they warn that teams should assume exploitation attempts may already be underway. Recommendations consistently emphasize validating actual exposure rather than relying on version checks alone, and actively hunting for post-exploitation behavior such as unexpected child processes, outbound tunneling traffic, or newly deployed backdoors. The message across disclosures is clear: React2Shell is not a “patch when convenient” flaw, and the window for passive response has already closed.

The article first appeared on CSO.

(image/jpeg; 3.85 MB)

AI’s trust tax for developers 29 Dec 2025, 1:00 am

Andrej Karpathy is one of the few people in this industry who has earned the right to be listened to without a filter. As a founding member of OpenAI and the former director of AI at Tesla, he sits at the summit of AI and its possibilities. In a recent post, he shared a view that is equally inspiring and terrifying: “I could be 10X more powerful if I just properly string together what has become available over the last ~year,” Karpathy wrote. “And a failure to claim the boost feels decidedly like [a] skill issue.”

If you aren’t ten times faster today than you were in 2023, Karpathy implies that the problem isn’t the tools. The problem is you. Which seems both right…and very wrong. After all, the raw potential for leverage in the current generation of LLM tools is staggering. But his entire argument hinges on a single adverb that does an awful lot of heavy lifting:

“Properly.”

In the enterprise, where code lives for decades, not days, that word “properly” is easy to say but very hard to achieve. The reality on the ground, backed by a growing mountain of data, suggests that for most developers, the “skill issue” isn’t a failure to prompt effectively. It’s a failure to verify rigorously. AI speed is free, but trust is incredibly expensive.

A vibes-based productivity trap

In reality, AI speed only seems to be free. Earlier this year, for example, METR (Model Evaluation and Threat Research) ran a randomized controlled trial that gave experienced open source developers tasks to complete. Half used AI tools; half didn’t. The developers using AI were convinced the LLMs had accelerated their development speed by 20%. But reality bites: The AI-assisted group was, on average, 19% slower.

That’s a nearly 40-point gap between perception and reality. Ouch.

How does this happen? As I recently wrote, we are increasingly relying on “vibes-based evaluation” (a phrase coined by Simon Willison). The code looks right. It appears instantly. But then you hit the “last mile” problem. The generated code uses a deprecated library. It hallucinates a parameter. It introduces a subtle race condition.

Karpathy can induce serious FOMO with statements like this: “People who aren’t keeping up even over the last 30 days already have a deprecated worldview on this topic.” Well, maybe, but as fast as AI is changing, some things remain stubbornly the same. Like quality control. AI coding assistants are not primarily productivity tools; they are liability generators that you pay for with verification. You can pay the tax upfront (rigorous code review, testing, threat modeling), or you can pay it later (incidents, data breaches, and refactoring). But you’re going to pay sooner or later.

Right now, too many teams think they’re evading the tax, but they’re not. Not really. Veracode’s GenAI Code Security Report found that 45% of AI-generated code samples introduced security issues on OWASP’s top 10 list. Think about that.

Nearly half the time you accept an AI suggestion without a rigorous audit, you are potentially injecting a critical vulnerability (SQL injection, XSS, broken access control) into your codebase. The report puts it bluntly: “Congrats on the speed, enjoy the breach.” As Microsoft developer advocate Marlene Mhangami puts it, “The bottleneck is still shipping code that you can maintain and feel confident about.”

In other words, with AI we’re accumulating vulnerable code at a rate manual security reviews cannot possibly match. This confirms the “productivity paradox” that SonarSource has been warning about. Their thesis is simple: Faster code generation inevitably leads to faster accumulation of bugs, complexity, and debt, unless you invest aggressively in quality gates. As the SonarSource report argues, we’re building “write-only” codebases: systems so voluminous and complex, generated by non-deterministic agents, that no human can fully understand them.

We increasingly trade long-term maintainability for short-term output. It’s the software equivalent of a sugar high.

Redefining the skills

So, is Karpathy wrong? No. When he says he can be ten times more powerful, he’s right. It might not be ten times, but the performance gains savvy developers gain from AI are real or have the potential to be so. Even so, the skill he possesses isn’t just the ability to string together tools.

Karpathy has the deep internalized knowledge of what good software looks like, which allows him to filter the noise. He knows when the AI is likely to be right and when it is likely to be hallucinating. But he’s an outlier on this, bringing us back to that pesky word “properly.”

Hence, the real skill issue of 2026 isn’t prompt engineering. It’s verification engineering. If you want to claim the boost Karpathy is talking about, you need to shift your focus from code creation to code critique, as it were:

  • Verification is the new coding. Your value is no longer defined by lines of code written, but by how effectively you can validate the machine’s output.
  • “Golden paths” are mandatory. As I’ve written, you cannot allow AI to be a free-for-all. You need golden paths: standardized, secured templates. Don’t ask the LLM to write a database connector; ask it to implement the interface from your secure platform library.
  • Design the security architecture yourself. You can’t just tell an LLM to “make this secure.” The high-level thinking you embed in your threat modeling is the one thing the AI still can’t do reliably.

“Properly stringing together” the available tools doesn’t just mean connecting an IDE to a chatbot. It means thinking about AI systematically rather than optimistically. It means wrapping those LLMs in a harness of linting, static application security testing (SAST), dynamic application security testing (DAST), and automated regression testing.

The developers who will actually be ten times more powerful next year aren’t the ones who trust the AI blindly. They are the ones who treat AI like a brilliant but very junior intern: capable of flashes of genius, but requiring constant supervision to prevent them from deleting the production database.

The skill issue is real. But the skill isn’t speed. The skill is control.

(image/jpeg; 19.86 MB)

4 New Year’s resolutions for devops success 29 Dec 2025, 1:00 am

It has been a dramatic and challenging year for developers and engineers working in devops organizations. More companies are using AI and automation for both development and IT operations, including for writing requirements, maintaining documentation, and vibe coding. Responsibilities have also increased, as organizations expect devops teams to improve data quality, automate AI agent testing, and drive operational resiliency.

AI is driving new business expectations and technical capabilities, and devops engineers must keep pace with the speed of innovation. At the same time, many organizations are laying off white-collar workers, including more than 120,000 tech layoffs in 2025.

Devops teams are looking for ways to reduce stress and ensure team members remain positive through all the challenges. At a recent event I hosted on how digital trailblazers reduce stress, speakers suggested several stress reduction mechanisms, including limiting work in progress, bringing humor into the day, and building supportive relationships.

As we head into the new year, now is also a good time for engineers and developers to set goals for 2026. I asked tech experts what New Year’s resolutions they would recommend for devops teams and professionals.

1. Fully embrace AI-enabled software development

Developers and automation engineers have had their world rocked over the last two years, with the emergence of AI copilots, code generators, and vibe coding. Developers typically spend time deepening their knowledge of coding languages and broadening their skills to work across different cloud architectures. In 2026, more of this time should be dedicated to learning AI-enabled software development.

“Develop a growth mindset that AI models are not good or bad, but rather a new nondeterministic paradigm in software that can both create new issues and new opportunities,” says Matthew Makai, VP of developer relations at DigitalOcean. “It’s on devops engineers and teams to adapt to how software is created, deployed, and operated.”

Concrete suggestions for this resolution involve shifting both mindset and activities:

  • Makai suggests automating code reviews for security issues and technical defects, given the rise in AI coding tools that generate significantly more code and can transfer technical debt across the codebase.
  • Nic Benders, chief technical strategist at New Relic, says everyone needs to gain experience with AI coding tools. “For those of us who have been around a while, think of vibe coding as the Perl of today. Go find an itch, then have fun knocking out a quick tool to scratch it.”
  • John Capobianco, head of developer relations at Selector, suggests devops teams should strive to embrace vibe-ops. “We can take the principles and the approach that certain software engineers are using with AI to augment software development in vibe-ops and apply those principles, much like devops to net-devops and devops to vibe-ops, getting AI involved in our pipelines and our workflows.”
  • Robin Macfarlane, president and CEO of RRMac Associates, suggests engineers begin to rethink their primary role not as code developers but as code orchestrators, whether working on mainframes or in distributed computing. “This New Year, resolve to learn the programming language you want AI to code in, resolve to do your own troubleshooting, and become the developer who teaches AI instead of the other way around.”

Nikhil Mungel, director of AI R&D at Cribl, says the real AI skill is learning to review, challenge, and improve AI-generated work by spotting subtle bugs, security gaps, performance issues, and incorrect assumptions. “Devops engineers who pair frequent AI use with strong review judgment will move faster and deliver more reliable systems than those who simply accept AI suggestions at face value.”

Mungel recommends that devops engineers commit to the following practices:

  • Tracing the agent decision graph, not just API calls.
  • Building AI-aware security observability around OWASP LLM Top 10 and MCP risks.
  • Capturing A-specific lineage and incidents in CI/CD and ops runbooks.

Resolution: Develop the skills required to use AI for solving development and engineering challenges.

2. Strengthen knowledge of outcome-based, resilient operations

While developers focus on AI capabilities, operational engineers should target resolutions focused on resiliency. The more autonomous systems are in responding to and recovering from issues, the fewer priority incidents devops teams will have to manage, which likely means fewer instances where teams have to join bridge calls in the middle of the night.

A good place to start is improving observability across APIs, applications, and automations.

“Developers should adopt an AI-first, prevention-first mindset, using observability and AIops to move from reactive fixes to proactive detection and prevention of issues,” says Alok Uniyal, SVP and head of process consulting at Infosys. “Strengthen your expertise in self-healing systems and platform reliability, where AI-driven root-cause analysis and autonomous remediation will increasingly define how organizations meet demanding SLAs.”

As more businesses become data-driven organizations and invest in AI as part of their future of work strategy, another place to start building resiliency is in dataops and data pipelines.

“In 2026, devops teams should get serious about understanding the systems they automate, especially the data layer,” says Alejandro Duarte, developer relations engineer at MariaDB. “Too many outages still come from pipelines that treat databases as black boxes. Understanding multi-storage-engine capabilities, analytical and AI workload support, native replication, and robust high availability features will make the difference between restful weekends and late-night firefights.”

At the infrastructure layer, engineers have historically focused on redundancy, auto-scaling, and disaster recovery. Now, engineers should consider incorporating AI agents to improve resiliency and performance.

“For devops engineers, the resolution shouldn’t be about learning another framework, but about mastering the new operating model—AI-driven self-healing infrastructure,” says Simon Margolis, associate CTO AI and ML at SADA. “Your focus must shift from writing imperative scripts to creating robust observability and feedback loops that can enable an AI agent to truly take action. This means investing in skills that help you define intent and outcomes—not steps—which is the only way to unlock true operational efficiency and leadership growth.”

Rather than learning new AI tools, experts suggest reviewing opportunities to develop new AI capabilities within the platforms already used by the organization.

“A sound resolution for the new year is to stop trying to beat the old thing into some new AI solution and start using AI to augment and improve what we already have,” says Brett Smith, distinguished software engineer at SAS. “We need to finally stop chasing the ‘I can solve this with AI’ hype and start focusing on ‘How can AI help me solve this better, faster, cheaper?’”

Resolution: Shift the operating mindset from problem detection, resolution, and root-cause analysis to resilient, self-healing operations.

3. Learn new technology disciplines

It’s one thing to learn a new product or technology, and it’s a whole other level of growth to learn a new discipline. If you’re an application developer, one new area that requires more attention is understanding accessibility requirements and testing methodologies for improving applications for people with disabilities.

“Integrating accessibility into the devops pipeline should be a top resolution, with accessibility tests running alongside security and unit tests in CI as automated testing and AI coding tools mature,” says Navin Thadani, CEO of Evinced. “As AI accelerates development, failing to fix accessibility issues early will only cause teams to generate inaccessible code faster, making shift-left accessibility essential. Engineers should think hard about keeping accessibility in the loop, so the promise of AI-driven coding doesn’t leave inclusion behind.”

Data scientists, architects, and system engineers should also consider learning more about the Model Context Protocol for AI agent-to-agent communications. One place to start is learning the requirements and steps to configure a secure MCP server.

“Devops should focus on mastering MCP, which is set to create an entirely new app development pipeline in 2026,” says Rishi Bhargava, co-founder of Descope. “While it’s still early days for production-ready AI agents, MCP has already seen widespread adoption. Those who learn to build and authenticate MCP-enabled applications now securely will gain a major competitive edge as agentic systems mature over the next six months.”

Resolution: Embrace being a lifelong learner: Study trends and dig into new technologies that are required for compliance or that drive innovation.

4. Develop transformation leadership skills

In my book, Digital Trailblazer, I wrote about the need for transformation leaders, what I call digital trailblazers, “who can lead teams, evolve sustainable ways of working, develop technologies as competitive differentiators, and deliver business outcomes.”

Some may aspire to CTO roles, while others should consider leadership career paths in devops. For engineers, there is tremendous value in developing communication skills and business acumen.

Yaad Oren, managing director of SAP Labs U.S. and global head of research and innovation at SAP, says leadership skills matter just as much as technical fundamentals. “Focus on clear communication with colleagues and customers, and clear instructions with AI agents. Those who combine continuous learning with strong alignment and shared ownership will be ready to lead the next chapter of IT operations.”

For engineers ready to step up into leadership roles but concerned about taking on direct reports, consider mentoring others to build skills and confidence.

“There is high-potential talent everywhere, so aside from learning technical skills, I would challenge devops engineers to also take the time to mentor a junior engineer in 2026,” says Austin Spires, senior director of developer enablement at Fastly. “Guiding engineers early in their career, whether on hard skills like security or soft skills like communication and stakeholder management, is fulfilling and allows them to grow into one of your best colleagues.”

Another option, if you don’t want to manage people, is to take on a leadership role on a strategic initiative. In a complex job market, having agile program leadership skills can open up new opportunities.

Christine Rogers, people and operations leader at Sisense, says the traditional job description is dying. Skills, not titles, will define the workforce, she says. “By 2026, organizations will shift to skills-based models, where employees are hired and promoted based on verifiable capabilities and adaptability, often demonstrated through real projects, not polished resumes.”

Resolution: Find an avenue to develop leadership confidence, even if it’s not at work. There are leadership opportunities at nonprofits, local government committees, and even in following personal interests.

Happy New Year, everyone!

(image/jpeg; 7.56 MB)

High severity flaw in MongoDB could allow memory leakage 26 Dec 2025, 12:12 pm

Document database vendor MongoDB has advised customers to update immediately following the discovery of a flaw that could allow unauthenticated users to read uninitialized heap memory.

Designated CVE-2025-14847, the bug, mismatched length fields in zlib compressed protocol headers, could allow an attacker to execute arbitrary code and potentially seize control of a device.

The flaw affects the following MongoDB and MongoDB Server versions:

  • MongoDB 8.2.0 through 8.2.3
  • MongoDB 8.0.0 through 8.0.16
  • MongoDB 7.0.0 through 7.0.26
  • MongoDB 6.0.0 through 6.0.26
  • MongoDB 5.0.0 through 5.0.31
  • MongoDB 4.4.0 through 4.4.29
  • All MongoDB Server v4.2 versions
  • All MongoDB Server v4.0 versions
  • All MongoDB Server v3.6 versions

In its advisory, MongoDB “strongly suggested” that users upgrade immediately to the patched versions of the software: MongoDB 8.2.3, 8.0.17, 7.0.28, 6.0.27, 5.0.32, or 4.4.30.

However, it said, “if you cannot upgrade immediately, disable zlib compression on the MongoDB Server by starting mongod or mongos with a networkMessageCompressors or a net.compression.compressors option that explicitly omits zlib.”

MongoDB, one of the most popular NoSQL document databases for developers, says it currently has more than 62,000 customers worldwide, including 70% of the Fortune 100.

(image/jpeg; 8.92 MB)

Reader picks: The most popular Python stories of 2025 26 Dec 2025, 1:00 am

Python 3.14 was the star of the show in 2025, bringing official support for free-threaded builds, a new all-in-one installation manager for Windows, and subtler perks like the new template strings feature. Other great updates this year included a growing toolkit of Rust-backed Python tools, several new options for packaging and distributing Python applications, and a sweet little trove of third-party libraries for parallel processing in Python. Here’s our list of the 10 best and most-read stories for Python developers in 2025. Enjoy!

What is Python? Powerful, intuitive programming
Start here, with a top-down view of what makes Python a versatile powerhouse for modern software development, from data science and machine learning to web development and systems automation.

The best new features and fixes in Python 3.14
Released in October 2025, the latest edition of Python makes free-threaded Python an officially supported feature, adds experimental JIT powers, and brings new tools for managing Python versions.

Get started with the new Python Installation Manager
The newest versions of Python on Microsoft Windows come packaged with this powerful all-in-one tool for installing, updating, and managing multiple editions of Python on the same system.

How to use template strings in Python 3.14
One of Python 3.14’s most powerful new features delivers a whole new mechanism for formatting data in strings, more programmable and powerful than the existing “f-string” formatting system.

PyApp: An easy way to package Python apps as executables
This Rust-powered utility brings to life a long-standing dream in the Python world: It turns hard-to-package Python programs into self-contained click-to-runs.

The best Python libraries for parallel processing
Python’s getting better at doing more than one thing at once, and that’s thanks to its “no-GIL” edition. But these third-party libraries give you advanced tools for distributing Python workloads across cores, processors, and multiple machines.

Amp your Python superpowers with ‘uv run’ | InfoWorld
Astral’s uv utility lets you set up and run Python packages with one command, no setup, no fuss, and nothing to clean up when you’re done.

3 Python web frameworks for beautiful front ends
Write Python code on the back end and generate good-looking HTML/CSS/JavaScript-driven front ends, automatically. Here are three ways to Python-code your way to beautiful front ends.

How to boost Python program performance with Zig
The emerging Zig language, making a name as a safer alternative to C, can also be coupled closely with Python—the better to create Python libraries that run at machine-native speed.

PythoC: A new way to generate C code from Python
This new project lets you use Python as a kind of high-level macro system to generate C-equivalent code that can run as standalone programs, and with some unique memory safety features you won’t find in C.

(image/jpeg; 0.33 MB)

A small language model blueprint for automation in IT and HR 25 Dec 2025, 1:00 am

Large language models (LLMs) have grabbed the world’s attention for their seemingly magical ability to instantaneously sift through endless data, generate responses, and even create visual content from simple prompts. But their “small” counterparts aren’t far behind. And as questions swirl about whether AI can actually generate meaningful returns (ROI), organizations should take notice. Because, as it turns out, small language models (SLMs), which use far fewer parameters, compute resources, and energy than large language models to perform specific tasks, have been shown to be just as effective as their much larger counterparts.

In a world where companies have invested ungodly amounts of money on AI and questioned the returns, SLMs are proving to be an ROI savior. Ultimately, SLM-enabled agentic AI delivers the best of both SLMs and LLMs together — including higher employee satisfaction and retention, improved productivity, and lower costs. And given a report from Gartner that said over 40% of agentic AI projects will be cancelled by the end of 2027 due to complexities and rapid evolutions that often lead enterprises down the wrong path, SLMs can be an important tool in any CIO’s chest.

Take information technology (IT) and human resources (HR) functions for example. In IT, SLMs can drive autonomous and accurate resolutions, workflow orchestration, and knowledge access. And for HR, they’re enabling personalized employee support, streamlining onboarding, and handling routine inquiries with privacy and precision. In both cases, SLMs are enabling users to “chat” with complex enterprise systems the same way they would a human representative.

Given a well-trained SLM, users can simply write a Slack or Microsoft Teams message to the AI agent (“I can’t connect to my VPN,” or “I need to refresh my laptop,” or “I need proof of employment for a mortgage application”), and the agent will automatically resolve the issue. What’s more, the responses will be personalized based on user profiles and behaviors and the support will be proactive and anticipatory of when issues might occur.

Understanding SLMs

So, what exactly is an SLM? It’s a relatively ill-defined term, but generally it is a language model with somewhere between one billion and 40 billion parameters, versus 70 billion to hundreds of billions for LLMs. They can also exist as a form of open source where you have access to their weights, biases, and training code.

There are also SLMs that are “open-weight” only, meaning you get access to model weights with restrictions. This is important because a key benefit with SLMs is the ability to fine-tune or customize the model so you can ground it in the nuance of a particular domain. For example, you can use internal chats, support tickets, and Slack messages to create a system for answering customer questions. The fine-tuning process helps to increase the accuracy and relevance of the responses.

Agentic AI will leverage SLMs and LLMs

It’s understandable to want to use state-of-the-art models for agentic AI. Consider that the latest frontier models score highly on math, software development and medical reasoning, just to name a few categories. Yet the question every CIO should be asking: do we really need that much firepower in our organization? For many enterprise use cases, the answer is no.

And even though they are small, don’t underestimate them. Their small size means they have lower latency, which is critical for real-time processing. SLMs can also operate on small form factors, like edge devices or other resource-constrained environments. 

Another advantage with SLMs is that they are particularly effective with handling tasks like calling tools, API interactions, or routing. This is just what agentic AI was meant to do: carry out actions. Sophisticated LLMs, on the other hand, may be slower, engage in overly reasoned handling of tasks, and consume large amounts of tokens.

In IT and HR environments, the balance among speed, accuracy, and resource efficiency for both employees and IT or HR teams matters. For employees, agentic assistants built on SLMs provide fast, conversational help to solve problems faster. For IT and HR teams, SLMs reduce the burden of repetitive tasks by automating ticket handling, routing, and approvals, freeing staff to focus on higher-value strategic work. Furthermore, SLMs also can provide substantial cost savings as these models use relatively smaller levels of energy, memory, and compute power. Their efficiency can prove enormously beneficial when using cloud platforms. 

Where SLMs fall short

Granted, SLMs are not silver bullets either. There are certainly cases where you need a sophisticated LLM, such as for highly complex multi-step processes. A hybrid architecture — where SLMs handle the majority of operational interactions and LLMs are reserved for advanced reasoning or escalations — allows IT and HR teams to optimize both performance and cost. For this, a system can leverage observability and evaluations to dynamically decide when to use an SLM or LLM. Or, if an SLM fails to get a good response, the next step could then be an LLM. 

SLMs are emerging as the most practical approach to achieving ROI with agentic AI. By pairing SLMs with selective use of LLMs, organizations can create balanced, cost-effective architectures that scale across both IT and HR, delivering measurable results and a faster path to value. With SLMs, less is more.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 4.8 MB)

Microsoft is not rewriting Windows in Rust 24 Dec 2025, 7:15 am

A job posting by a Microsoft engineer sparked excitement about a project “to eliminate every line of C and C++ from Microsoft by 2030”, replacing it with Rust — but alas for fans of the memory-safe programming language, it turns out this is a personal goal, not a corporate one, and Rust isn’t necessarily even the final target.

Microsoft Distinguished Engineer Galen Hunt posted about his ambitious goal on LinkedIn four days ago, provoking a wave of excitement and concern.

Now he’s been forced to clarify: “My team’s project is a research project. We are building tech to make migration from language to language possible,” he wrote in an update to his LinkedIn post. His intent, he said, was to find like-minded engineers, “not to set a new strategy for Windows 11+ or to imply that Rust is an endpoint.”

Hunt’s project is to investigate how AI can be used to assist in the translation of code from one language to another at scale. “Our North Star is ‘1 engineer, 1 month, 1 million lines of code’,” he wrote.

He’s recruiting an engineer to help build the infrastructure to do that, demonstrating the technology using Rust as the target language and C and C++ as the source.

The successful candidate will join the Future of Scalable Software Engineering team in Microsoft’s CoreAI group, building static analysis and machine learning tools for AI-assisted translation and migration.

Pressure to ditch C and C++ in favor of memory-safe languages such as Rust comes right from the top, with research by Google and Microsoft showing that around 70 percent of all security vulnerabilities in software are caused by memory safety issues.

However, using AI to rewrite code, even in a memory-safe language, may not make things more secure: AI-generated code typically contains more issues than code written by humans, according to research by CodeRabbit.

That’s not stopping some of the biggest software developers pushing ahead with AI-powered software development, though. Already, AI writes 30% of Microsoft’s new code, Microsoft CEO Satya Nadella said in April.

(image/jpeg; 20.13 MB)

Get started with Python’s new native JIT 24 Dec 2025, 1:00 am

JITing, or “just-in-time” compilation, can make relatively slow interpreted languages much faster. Until recently, JITting was available for Python only in the form of specialized third-party libraries, like Numba, or alternate versions of the Python interpreter, like PyPy.

A native JIT compiler has been added to Python over its last few releases. At first it didn’t provide any significant speedup. But with Python 3.15 (still in alpha but available for use now), the core Python development team has bolstered the native JIT to the point where it’s now showing significant performance gains for certain kinds of program.

Speedups from the JIT range widely, depending on the operation. Some programs show dramatic performance improvements, others not at all. But the work put into the JIT is beginning to pay off, and users can start taking advantage of it if they’re willing to experiment.

Activating the Python JIT

By default, the native Python JIT is disabled. It’s still considered an experimental feature, so it has to be manually enabled.

To enable the JIT, you set the PYTHON_JIT environment variable, either for the shell session Python is running in, or persistently as part of your user environment options. When the Python interpreter starts, it checks its runtime environment for the variable PYTHON_JIT. If PYTHON_JIT is unset or set to anything but 1, the JIT is off. If it’s set to 1, the JIT is enabled.

It’s probably not a good idea to enable PYTHON_JIT as a persistent option. If you’re doing this with a user environment where you’re only running Python with the JIT enabled, it might be useful. But for the most part, you’ll want to set PYTHON_JIT manually — for instance, as part of a shell script to configure the environment.

Verifying the JIT is working

For versions of Python with the JIT (Python 3.13 and above), the sys module in the standard library has a new namespace, sys._jit. Inside it are three utilities for inspecting the state of the JIT, all of which return either True or False. The three utilities:

  • sys._jit.is_available(): Lets you know if the current build of Python has the JIT. Most binary builds of Python shipped will now have the JIT available, except the “free-threaded” or “no-GIL” builds of Python.
  • sys._jit.is_enabled(): Lets you know if the JIT is currently enabled. It does not tell you if running code is currently being JITted, however.
  • sys._jit.is_active(): Lets you know if the topmost Python stack frame is currently executing JITted code. However, this is not a reliable way to tell if your program is using the JIT, because you may end up executing this check in a “cold” (non-JITted) path. It’s best to stick to performance measurements to see if the JIT is having any effect.

For the most part, you will want to use sys._jit.is_enabled() to determine if the JIT is available and running, as it gives you the most useful information.

Python code enhanced by the JIT

Because the JIT is in its early stages, its behavior is still somewhat opaque. There’s no end-user instrumentation for it yet, so there’s no way to gather statistics about how the JIT handles a given piece of code. The only real way to assess the JIT’s performance is to benchmark your code with and without the JIT.

Here’s an example of a program that demonstrates pretty consistent speedups with the JIT enabled. It’s a rudimentary version of the Mandelbroit fractal:

from time import perf_counter
import sys

print ("JIT enabled:", sys._jit.is_enabled())

WIDTH = 80
HEIGHT = 40
X_MIN, X_MAX = -2.0, 1.0
Y_MIN, Y_MAX = -1.0, 1.0
ITERS = 500

YM = (Y_MAX - Y_MIN)
XM = (X_MAX - X_MIN)

def iter(c):
    z = 0j
    for _ in range(ITERS):
        if abs(z) > 2.0:
            return False
        z = z ** 2 + c
    return True

def generate():
    start = perf_counter()
    output = []

    for y in range(HEIGHT):
        cy = Y_MIN + (y / HEIGHT) * YM
        for x in range(WIDTH):
            cx = X_MIN + (x / WIDTH) * XM
            c = complex(cx, cy)
            output.append("#" if iter(c) else ".")
        output.append("\n")
    print ("Time:", perf_counter()-start)
    return output

print("".join(generate()))

When the program starts running, it lets you know if the JIT is enabled and then produces a plot of the fractal to the terminal along with the time taken to compute it.

With the JIT enabled, there’s a fairly consistent 20% speedup between runs. If the performance boost isn’t obvious, try changing the value of ITERS to a higher number. This forces the program to do more work, so should produce a more obvious speedup.

Here’s a negative example — a simple recursively implemented Fibonacci sequence. As of Python 3.15a3 it shows no discernible JIT speedup:

import sys
print ("JIT enabled:", sys._jit.is_enabled())
from time import perf_counter

def fib(n):
    if n 1:
        return n
    return fib(n-1) + fib(n-2)

def main():
    start = perf_counter()
    result = fib(36)
    print(perf_counter() - start)

main()

Why this isn’t faster when JITted isn’t clear. For instance, you might be inclined to think using recursion makes the JIT less effective, but even a non-recursive version of the algorithm doesn’t provide any speedup either.

Using the experimental Python JIT

Because the JIT is still considered experimental, it’s worth approaching it in the same spirit as the “free-threaded” or “no-GIL” builds of Python also now being shipped. You can conduct your own experiments with the JIT to see if provides any payoff for certain tasks, but you’ll always want to be careful about using it in any production scenario. What’s more, each alpha and beta revision of Python going forward may change the behavior of the JIT. What was once performant might not be in the future, or vice versa!

(image/jpeg; 7.1 MB)

AI power tools: 6 ways to supercharge your terminal 24 Dec 2025, 1:00 am

The command line has always been the bedrock of the developer’s world. Since time immemorial, the CLI was a static place defined by the REPL (read-evaluate-print-loop). But now modern AI tools are changing that.

The CLI tells you in spartan terms what is happening with your program, and it does exactly what you tell it to. The lack of frivolity and handholding is both the command-line’s power and its one major drawback. Now, a new class of AI tools seeks to preserve the power of the CLI while upgrading it with a more human-friendly interface.

These tools re-envision the REPL (the read-evaluate-print-loop) as a reason-evaluate loop. Instead of telling your operating system what to do, you just give it a goal and set it loose. Rather than reading the outputs, you can have them analyzed with AI precision. For the lover of the CLI—and everyone else who programs—the AI-powered terminal is a new and fertile landscape.

Gemini CLI

Gemini CLI is an exceptionally strong agent that lets you run AI shell commands. Able to analyze complex project layouts, view outputs, and undertake complex, multipart goals, Gemini CLI isn’t flawless, but it warms the command line-enthusiast’s heart.

A screenshot of the Google Gemini CLI.
Google’s Gemini comes to the command line.

Matthew Tyson

Gemini CLI recently added in-prompt interactivity support, like running vi inside the agent. This lets you avoid dropping out of the AI (or launching a new window) to do things like edit a file or run a long, involved git command. The AI doesn’t retain awareness during your interactions (you can use Ctrl-f to shift focus back to it), but it does observe the outcome when you are done, and may take appropriate actions such as running unit tests after closing vi.

Copilot is rumored to have better Git integration, but I’ve found Gemini performs just fine with git commands.

Like every other AI coding assistant, Gemini CLI can get confused, spin in circles, and spawn regressions, but the actual framing and prompt console are among the best. It feels fairly stable and solid. It does require some adjustments, such as being unable to navigate the file system (e.g., cd /foo/bar) because you’re in the agent’s prompt and not a true shell.

GitHub Copilot CLI

Copilot’s CLI is just as solid as Gemini’s. It handled complex tasks (like “start a new app that lets you visit endpoints that say hello in different languages”) without a hitch. But it’s just as nice to be able to do simple things quickly (like asking, “what process is listening on port 8080?”) without having to refresh system memory.

A screenshot of the GitHub Copilot CLI.
The ubiquitous Copilot VS Code extension, but for the terminal environment.

Matthew Tyson

There are still drawbacks, of course, and even simple things can go awry. For example, if the process listening on 8080 was run with systemctl, Copilot would issue a simple kill command.

Copilot CLI’s ?? is a nice idea, letting you provide a goal to be turned into a prompt—?? find the largest file in this directory yields find . -type f -exec du -h {} + 2>/dev/null | sort -rh | head -10— but I found the normal prompt worked just as well.

I noticed at times that Copilot seemed to choke and hang (or take inordinately long to complete) on larger steps, such as Creating Next.js project (Esc to cancel · 653 B).

In general, I did not find much distinction between Gemini and Copilot’s CLIs; both are top-shelf. That’s what you would expect from the flagship AI terminal tools from Google and Microsoft. The best choice likely comes down to which ecosystem and company you prefer.

Ollama

Ollama is the most empowering CLI in this bunch. It lets you install and run pre-built, targeted models on your local machine. This puts you in charge of everything, eliminates network calls, and discards any reliance on third-party cloud providers (although Ollama recently added cloud providers to its bag of tricks).

A screenshot of the Ollama CLI.
The DIY AI engine.

Matthew Tyson

Ollama isn’t an agent itself but is the engine that powers many of them. It’s “Docker for LLMs”—a simple command-line tool that lets you download, manage, and run powerful open source models like Llama 3 and Mistral directly on your own machine. You run ollama pull llama3 and then ollama run llama3 "..." to chat. (Programmers will especially appreciate CodeLlama.)

Incidentally, if you are not in a headless environment (like Windows) Ollama will install a simple GUI for managing and interacting with installed models (both local and cloud).

Ollama’s killer feature is privacy and offline access. Since the models run entirely locally, none of your prompts or code ever leaves your machine. It’s perfect for working on sensitive projects or in secure environments.

Ollama is an AI server, which gives you an API so that other tools (like Aider, OpenCode, or NPC Shell) can use your local models instead of paying for a cloud provider. The Ollama chat agent doesn’t compete with interactive CLIs like Gemini, Copilot, and Warp (see below); it’s more of a straight REPL.

The big trade-off is performance. You are limited by your own hardware, and running the larger models requires powerful (preferably Nvidia) GPUs. The choice comes down to power versus privacy: You get total control and security, but you’re responsible for bringing the horsepower. (And, in case you don’t know, fancy GPUs are expensive—even provisioning a decent one on the cloud can cost hundreds of dollars per month.)

Aider

Aider is a “pair-programming” tool that can use various providers as the AI back end, including a locally running instance of Ollama (with its variety of LLM choices). Typically, you would connect to an OpenRouter account to provide access to any number of LLMs, including free-tier ones.

A screenshot of the Aider CLI.
The agentic layer.

Matthew Tyson

Once connected, you tell Aider what model you want to use when launching it; e.g., aider --model ollama_chat/llama3.2:3b. That will launch an interactive prompt relying on the model for its brains. But Aider gives you agentic power and will take action for you, not just provide informative responses.

Aider tries to maintain a contextual understanding of your filesystem, the project files, and what you are working on. It also is designed to understand git, suggesting that you init a git project, committing as you go, and providing sensible commit messages. The core capability is highly influenced by the LLM engine, which you provide.

Aider is something like using Ollama but at a higher level. It is controlled by the developer; provides a great abstraction layer with multiple model options; and layers on a good deal of ability to take action. (It took me some wrangling with the Python package installations to get everything working in Aider, but I have bad pip karma.)

Aider is something like Roo Code, but for the terminal, adding project-awareness for any number of models. If you give it a good model engine, it will do almost everything that the Gemini or Copilot CLI does, but with more flexibility. The biggest drawback compared to those tools is probably having to do more manual asset management (like using the /add command to bring files into context).

AI Shell

Built by the folks at Builder.io, AI Shell focuses on creating effective shell commands from your prompts. Compared to the Gemini and Copilot CLIs, it’s more of a quick-and-easy utility tool; something to keep the terminal’s power handy without having to type out commands.

A screenshot of AI Shell.
The natural-language commander.

Matthew Tyson

AI Shell will take your desired goal (e.g., “$ ai find the process using the most memory right now and kill it”) and offer working shell commands in response. It will then ask if you want to run it, edit it, copy, or cancel the command. This makes AI Shell a simple place to drop into, as needed, from the normal command prompt. You just type “ai” followed by whatever you are trying to do.

Although it’s a handy tool, the current version of AI Shell can only use an OpenAI API, which is a significant drawback. There is no way to run AI Shell in a free tier, since OpenAI no longer offers free API access.

Warp

Warp started life as a full-featured terminal app. Its killer feature is that it gives you all the text and control niceties in a cross-platform, portable setup. Unlike the Gemini and Copilot CLI tools, which are agents that run inside an existing shell, Warp is a full-fledged, standalone GUI application with AI integrated at its core.

A screenshot of the Warp CLI.
The terminal app, reimagined with AI.

Matthew Tyson

Warp is a Rust-based, modern terminal that completely reimagines the user experience, moving away from the traditional text stream to a more structured, app-like interface.

Warp’s AI is not a separate prompt but is directly integrated with the input block. It has two basic modes: The first is to type # followed by a natural language query (e.g., “# find all files over 10 megs in this dir”), which Warp AI will translate into the correct command.

The second mode is the more complex, multistep agent mode (“define a cat-related non-blocking endpoint using netty”), which you enter with Ctrl-space.

An interesting feature, Warp Workflows are parameterized commands that you can save and share. You can ask the AI to generate a workflow for a complex task (like a multistage git rebase) and then supply it with arguments at runtime.

The main drawback for some CLI purists is that Warp is not a traditional CLI. It’s a block-based editor, which treats inputs and outputs as distinct chunks. That can take some getting used to—though some find it an improvement. In this regard, Warp breaks compatibility with many traditional terminal multiplexers like tmux/screen. Also, its AI features are tied to user accounts and a cloud back end, which likely raises privacy and offline-usability concerns for some developers.

All that said, Warp is a compelling AI terminal offering, especially if you’re looking for something different in your CLI. Aside from its AI facet, Warp is somewhere between a conventional shell (like Bash) and a GUI.

Conclusion

If you currently don’t like using a shell, these tools will make your life much easier. You will be able to do many of the things that previously were painful enough to make you think, “there must be a better way.” Now there is, and you can monitor processes, sniff TCP packets, and manage perms like a pro.

If you, like me, do like the shell, then these tools will make the experience even better. They give you superpowers, allowing you to romp more freely across the machine. If you tend (like I do) to do much of your coding from the command line, checking out these tools is an obvious move.

Each tool has its own idiosyncrasies of installation, dependencies, model access, and key management. A bit of wrestling at first is normal—which most command-line jockeys won’t mind.

(image/jpeg; 22.45 MB)

Deno adds tool to run NPM and JSR binaries 23 Dec 2025, 5:16 pm

Deno 2.6, the latest version of the TypeScript, JavaScript, and WebAssembly runtime, adds a tool, called dx, to run binaries from NPM and JSR (JavaScript Registry) packages.

The update to the Node.js rival was announced December 10; installation instructions can be found at docs.deno.com. Current users can upgrade by running the deno upgrade command in their terminal.

In Deno 2.6, dx is an equivalent to the npx command. With dx, users should find it easier to run package binaries in a familiar fashion, according to Deno producer Deno Land. Developers can enjoy the convenience of npxwhile leveraging Deno’s robust security model and performance optimizations, Deno Land said.

Also featured in Deno 2.6 is more granular control over permissions, with --ignore-read and --ignore-env flags for selectively ignoring certain file reads or environment variable access. Instead of throwing a NotCapable error, users can direct Deno to return a NotFounderror and undefined respectively.

Deno 2.6 also integrates tsgo, an experimental type checker for TypeScript written in Go. This type checker is billed as being significantly faster than the previous implementation, which was written in TypeScript.

Other new capabilities and improvements in Deno 2.6:

  • For dependency management, developers can control the minimum age of dependencies, ensuring that a project only uses dependencies that have been vetted. This helps reduce the risk of using newly published packages that may contain malware or breaking changes shortly after release.
  • A deno audit subcommand helps identify security vulnerabilities in dependencies by checking the GitHub CVE database. This command scans and generates a report for both JSR and NPM packages.
  • The--lockfile-onlyflag for deno install allows developers to update a lockfile without downloading or installing the actual packages. This is particularly useful in continuous integration environments where users want to verify dependency changes without modifying their node_modules or cache.
  • The deno approve-scripts flag replaces the deno install --allow-scripts flag, enabling more ergonomic and granular control over which packages can run these scripts.
  • Deno’s Node.js compatibility layer continues to mature in Deno 2.6, with improvements across file operations, cryptography, process management, and database APIs, according to Deno Land.

(image/jpeg; 14.91 MB)

Rust vision group seeks enumeration of language design goals 23 Dec 2025, 2:55 pm

To help the Rust language continue scaling across domains and usage levels, the Rust Vision Doc group recommends enumerating the design goals for evolving the language while also improving the crates package system.

These suggestions were made in a December 19 blog post titled, “What do people love about Rust?” The group made the following specific recommendations:

  • Enumerate and describe Rust design goals and integrate them into processes, helping to ensure these are observed by future language designers and the broader ecosystem.
  • Double down on extensibility, introducing the ability for crates to influence the development experience and the compilation pipeline.
  • Help users to navigate the crates.io ecosystem and enable smoother interop.

In seeking to explain developers’ strong loyalty to Rust, the vision doc group found that, based on interviews of Rust users, developers love Rust for its balance of virtues including reliability, efficiency, low-level control, supportive tooling, and extensibility. Additionally, one of the most powerful aspects of Rust cited by developers is the way that its type system allows modeling aspects of the application domain. This prevents bugs and makes it easier to get started with Rust, the Rust vision doc group said.

The group said that each of these attributes was necessary for versatility across domains. However, when taken too far, or when other attributes are missing, they can become an obstacle, the group noted. One example cited was Rust’s powerful type system, which allows modeling the application domain and prevents bugs but sometimes feels more complex than the problem itself. Another example cited was async Rust, which has fueled a huge jump in using Rust to build network systems but feels “altogether more difficult” than sync Rust. A third obstacle, the group said, was the wealth of crates on crates.io, which are a key enabler but also offer a “tyrrany of choice” that becomes overwhelming. Ways are needed to help users navigate the crates.io ecosystem.

The group recommended creating an RFC that defines the goals sought as work is done on Rust. The RFC should cover the experience of using Rust in total (language, tools, and libraries). “This RFC could be authored by the proposed User Research team, though it’s not clear who should accept it—perhaps the User Research team itself, or perhaps the leadership council,” the group said.

(image/jpeg; 7.49 MB)

WhatsApp API worked exactly as promised, and stole everything 23 Dec 2025, 3:38 am

Security researchers have uncovered a malicious npm package that poses as a legitimate WhatsApp Web API library while quietly stealing messages, credentials, and contact data from developer environments.

The package, identified as “lotusbail,” operates as a trojanized wrapper around a genuine WhatsApp client library and had accumulated more than 50k downloads by the time it was flagged by Koi Security.

“With over 56000 downloads and functional code that actually works as advertised, it is the kind of dependency developers install without a second thought,” Koi researchers said in a blog post. “The package has been available on npm for 6 months and is still live at the time of writing.”

Stolen data was encrypted and exfiltrated to attacker-controlled infrastructure, reducing the likelihood of detection by network monitoring tools. Even more concerning for enterprises is the fact that Lotusbail abuses WhatsApp’s multi-device pairing to maintain persistence on compromised accounts even after the package is removed.

Legitimate API uses a proxy for threat

According to the researchers, lotusbail initially didn’t appear to be anything more than a helpful fork of the legitimate “@whiskeysockets/baileys” library used for interacting with WhatsApp via WebSockets. Developers could install it, send messages, receive messages, and never notice anything wrong.

Further probing, however, revealed an issue.

The package wrapped the legitimate WhatsApp WebSocket client in a malicious proxy layer that transparently duplicated every operation, including the ones involving sensitive data. During authentication, the wrapper captured session tokens and keys. Every message flowing through the application was intercepted, logged, and prepared for covert transmission to attacker-controlled infrastructure.

Additionally, the stolen information was protected en route. Rather than sending credentials and messages in plaintext, the malware employs a custom RSA encryption layer and multiple obfuscation strategies, making detection by network monitoring tools harder and allowing exfiltration to proceed under the radar.

“The exfiltration server URL is buried in encrypted configuration strings, hidden inside compressed payloads,” the researchers noted. “The malware uses four layers of obfuscation: Unicode variable manipulation, LZString compression, Base-91 encoding, and AES encryption. The server location isn’t hardcoded anywhere visible.”

Backdoor sticks around even after package removal

Koi said the most significant component of the attack was its persistence. WhatsApp allows users to link multiple devices to a single account through a pairing process involving an 8-character code. The malicious lotusbail package hijacked this mechanism by embedding a hardcoded pairing code that effectively added the attacker’s device as a trusted endpoint on the user’s WhatsApp account.

Even if developers or organizations later uninstalled the package, the attacker’s linked device remained connected. This allowed the attack to persist until the WhatsApp user manually unlinked all devices from the settings panel.

Persistent access allows the attackers to continue reading messages, harvesting contacts, sending messages on behalf of victims, and downloading media long after the initial exposure.

What must developers and defenders do?

Koi disclosure noted that traditional safeguards, based on reputation metrics, metadata checks, or static scanning, fail when malicious logic mimics legitimate behavior.

“The malware hides in the gap between ‘this code works’ and ‘this code does only what it claims’,” the researchers said, adding that such supply-chain threats require monitoring package behavior at runtime rather than relying on static checks alone. They recommended looking for (or relying on tools that can) warning signs, such as custom RSA encryption routines and dozens of embedded anti-debugging mechanisms in the malicious code.

The package remains available on npm, with its most recent update published just five days ago. GitHub, which has owned npm since 2020, did not immediately respond to CSO’s request for comment.

(image/jpeg; 1.18 MB)

When is an AI agent not really an agent? 23 Dec 2025, 1:00 am

If you were around for the first big wave of cloud adoption, you’ll remember how quickly the term cloud was pasted on everything. Anything with an IP address and a data center suddenly became a cloud. Vendors rebranded hosted services, managed infrastructure, and even traditional outsourcing as cloud computing. Many enterprises convinced themselves they had modernized simply because the language on the slides had changed. Years later, they discovered the truth: They hadn’t transformed their architecture; they had just renamed their technical debt.

That era of “cloudwashing” had real consequences. Organizations spent billions on what they believed were cloud-native transformations, only to end up with rigid architectures, high operational overhead, and little of the promised agility. The cost was not just financial; it was strategic. Enterprises that misread the moment lost time they could never recover.

We are now repeating the pattern with agentic AI, this time faster.

What ‘agentic’ is supposed to mean

If you believe today’s marketing, everything is an “AI agent.” A basic workflow worker? An agent. A single large language model (LLM) behind a thin UI wrapper? An agent. A smarter chatbot with a few tools integrated? Definitely an agent. The issue isn’t that these systems are useless. Many are valuable. The problem is that calling almost anything an agent blurs an important architectural and risk distinction.

In a technical sense, an AI agent should exhibit four basic characteristics:

  • Be able to pursue a goal with a degree of autonomy, not merely follow a rigid, prescripted flow
  • Be capable of multistep behavior, meaning it plans a sequence of actions, executes them, and adjusts along the way
  • Adapt to feedback and changing conditions rather than failing outright on the first unexpected input
  • Be able to act, not just chat, by invoking tools, calling APIs, and interacting with systems in ways that change state

If you have a system that simply routes user prompts to an LLM and then passes the output to a fixed workflow or a handful of hardcoded APIs, it could be useful automation. However, calling it an agentic AI platform misrepresents both its capabilities and its risks. From an architecture and governance perspective, that distinction matters a lot.

When hype becomes misrepresentation

Not every vendor using the word agent is acting in bad faith. Many are simply caught in the hype cycle. Marketing language is always aspirational to some degree, but there’s a point where optimism crosses into misrepresentation. If a vendor knows its system is mainly a deterministic workflow plus LLM calls but markets it as an autonomous, goal-seeking agent, buyers are misled not just about branding but also about the system’s actual behavior and risk.

That type of misrepresentation creates very real consequences. Executives may assume they are buying capabilities that can operate with minimal human oversight when, in reality, they are procuring brittle systems that will require substantial supervision and rework. Boards may approve investments on the belief that they are leaping ahead in AI maturity, when they are really just building another layer of technical and operational debt. Risk, compliance, and security teams may under-specify controls because they misunderstand what the system can and cannot do.

Whether or not this crosses the legal threshold for fraud, treat it as a fraud-level governance problem. The risk to the enterprise is similar: misallocated capital, misaligned strategy, and unanticipated exposure.

Signs of ‘agentwashing’

In practice, agentwashing tends to follow a few recognizable patterns. Be wary when you realize that a vendor cannot explain, in clear technical language, how their agents decide what to do next. They talk vaguely about “reasoning” and “autonomy,” but when pressed, everything trickles down to prompt templates and orchestration scripts.

Take note if an architecture often relies on a single LLM call with minimal glue code wrapped around it, especially if the slides imply a dynamic society of cooperating agents planning, delegating, and adapting in real time. If you strip away the branding, does it resemble traditional workflow automation combined with stochastic text generation?

Listen carefully for promises of “fully autonomous” processes that still require humans to monitor, approve, and correct most critical steps. There is nothing wrong with keeping humans in the loop—it’s essential in most enterprises. However, misleading language can suggest a false sense of autonomy.

These gaps between story and reality are not cosmetic. They directly affect how you design controls, structure teams, and measure success or failure.

Be laser-focused on specifics

At the time, we did not challenge cloudwashing aggressively enough. Too many boards and leadership teams accepted labels in place of architecture. Today, agentic AI will have an even greater impact on core business processes, regulatory scrutiny, and complex security and safety implications. It also carries significantly higher long-term costs if the architecture is wrong.

This time around, enterprises need to be much more disciplined.

First, name the behavior. Call it agentwashing when a product labeled as agentic is merely orchestration, an LLM, and some scripts. The language you use internally will shape how seriously people treat the issue.

Second, demand evidence instead of demos. Polished demos are easy to fake, but architecture diagrams, evaluation methods, failure modes, and documented limitations are harder to counterfeit. If a vendor can’t clearly explain how their agents reason, plan, act, and recover, that should raise suspicion.

Third, tie vendor claims directly to measurable outcomes and capabilities. That means contracts and success criteria should be framed around quantifiable improvements in specific workflows, explicit autonomy levels, error rates, and governance boundaries, rather than vague goals like “autonomous AI.”

Finally, reward vendors that are precise and honest about the technology’s actual state. Some of the most credible solutions in the market today are intentionally not fully agentic. They might be supervised automation with narrow use cases and clear guardrails. That is perfectly acceptable and, in many cases, preferable, as long as everyone is clear about what is being deployed.

Agentwashing is a red flag

Whether regulators eventually decide that certain forms of agentwashing meet the legal definition of fraud remains an open question. Enterprises do not need to wait for that answer.

From a governance, risk, or architectural perspective, treat agentwashing as a serious red flag. Scrutinize it with the same rigor you would apply to financial representations. Challenge it early, before it becomes embedded in your strategic road map. Refuse to fund it without technical proof and clear alignment with business outcomes.

The most important financial lessons we learned in the cloud era usually related to cloudwashing during its initial implementation. We’re on a similar trajectory with agentic AI, but the potential blast radius is larger. As with cloud conversions, the enterprises that have the most success with agentic AI will insist, from the start, on technical and ethical honesty from vendors and internal staff.

This time around, it’s even more important to know what you’re buying.

(image/jpeg; 1.57 MB)

Stop letting ‘urgent’ derail delivery. Manage interruptions proactively 22 Dec 2025, 8:00 pm

As engineers and managers, we all have been interrupted by those unplanned, time-sensitive requests (or tasks) that arrive outside normal planning cadences. An “urgent” Slack, a last-minute requirement or an exec ask is enough to nuke your standard agile rituals. Apart from randomizing your sprint, it causes thrash for existing projects and leads to developer burnout. This is even more critical now in the AI-accelerated landscape, where overall volatility has increased with improved developer productivity. Randomizations are no longer edge cases; they are the norm.

Google’s DORA 2025 report found that “AI’s primary role in software development is that of an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.” Teams that are not equipped to manage the increased volatility end up in chaos and their engineers pay the price. The fix isn’t heroics; rather, it is simple strategies that must be applied consistently and managed head-on. 

Recognize the pitfalls and avoid them!

Existing team-level mechanisms like mid-sprint checkpoints provide teams the opportunity to “course correct”; however, many external randomizations arrive with an immediacy. This results in preempted work, fragmented attention and increases the delivery risk for teams. Let’s look at how some existing team practices fail:

  • We’ll cross that bridge when we get there. I have often seen teams shoot themselves in the foot by planning to use 100% capacity in their regular planning cycles, only to scramble when they need some triage bandwidth. This leaves no runway for immediate triage when external randomizations land mid-cycle.
  • The squeaky wheel gets the grease. Another common pitfall is that the loudest voice wins by default. Randomizations arrive through inconsistent channels like emails, chat pings, hallway conversations, etc. Sometimes I have seen that the loudest voice uses all available channels at the same time! Just because someone’s the loudest, does not mean their request is the top priority.
  • A self-fulfilling prophecy. Treating everything as “urgent” or randomization also dilutes the concept. We must understand that backlog reshuffling (say, during team planning sessions), planned handoffs or context switches, etc., do not need teams to pivot abruptly and should not be considered as randomizations. 

Here are a few ideas on how to avoid these pitfalls:

  • Reserve dedicated triage bandwidth: Teams must be deliberate about randomizations. Teams should consider managing external randomizations as a swim lane with dedicated capacity. Teams that experience variable demand should reserve 5–10% of capacity as a buffer. These can be tuned monthly.
  • Streamline Intake: Teams need not spend their time reconciling competing narratives across different channels; instead, they should create a single intake channel backed by a lightweight form (ex. Jira tickets). The form includes all information needed for triage, like change/feature needed, impact, affected customers and owner.
  • Determine priority: There are several ways to determine the priority of tasks. For our team, the Eisenhower Matrix turned out to be the most effective at identifying priorities. It classifies work by urgency (time sensitivity) and importance (business/customer impact), making prioritization decisions straightforward. Items that are both urgent and important (“Do now”) are immediately scheduled, while everything else gets deferral treatment.

How can this be operationalized sustainably?

The above ideas form a baseline for how to process the randomizations as they come in. However, teams often fail to consistently follow these practices. Below ideas will help teams make this baseline repeatable and sustainable:

Make it intentional (cultural shift)

Let’s ensure we understand that randomizations are part of serving evolving business priorities; they are not noise. Teams benefit from a mindset shift where randomizations are not seen as a friction to eliminate but as signals to be handled with intent. 

A few years back, our team’s monthly retrospectives found Job Satisfaction nosediving for a few months, until we identified its correlation to an increase in randomizations (and corresponding thrash). I invited an Agile Coach to discuss this issue, where we ultimately realized our cultural and mechanism gaps. With that mindset shift, the team was able to resolve the concerns by intentionally formalizing the randomization management flow: Intake → Triage → Prioritize → Execute (Rinse & Repeat). Where needed, promptly communicate to leadership about changes to existing commitments. 

Be frugal with time (bounded execution)

Even well-triaged items can spiral into open-ended investigations and implementations that the team cannot afford. How do we manage that? Time-box it. Just a simple “we’ll execute for two days, then regroup” goes a long way in avoiding rabbit-holes.

The randomization is for the team to manage, not for an individual. Teams should plan for handoffs as a normal part of supporting randomizations. Handoffs prevent bottlenecks, reduce burnout and keep the rest of the team moving. Use of well-defined stopping points, assumptions log, reproduction steps and spike summaries are some ideas for teams to make hand-offs easier.

Escalate early

In cases where there are disagreements on priority, teams should not delay asking for leadership help. For instance, Stakeholder B came up with a higher priority ask, but Stakeholder A is not aligned with their existing task to be deprioritized. This does not mean the team needs to complete both. I have seen such delays lead to quiet stretching, slipped dates and avoidable burnout. The goal is not to push problems upward, but to enable timely decisions, so that the team works on business priorities. A formal escalation mechanism on our team reduced the % unplanned work per sprint by around 40% when we implemented it.

Instrument, review and improve

Without making it a heavy lift, teams should capture and periodically review health metrics. For our team, % unplanned work, interrupts per sprint, mean time to triage and periodic sentiment survey helped a lot. Teams should review these within their existing mechanisms (ex., sprint retrospectives) for trend analysis and adjustments. 

Thankfully, a good part of this measurement and tracking can now be automated with AI agents. Teams can use a “sprint companion” that can help classify intake, compute metrics, summarize retrospectives and prompt follow-ups to keep consistent discipline.

Final thoughts

When teams treat randomizations as a managed class of work, interrupts can be handled in hours, avoiding multi-day churn! It helps transform chaos into clarity, protects delivery, reduces burnout and builds trust with stakeholders. I have seen this firsthand in our teams, and I encourage you to make it part of your playbook.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

(image/jpeg; 2.76 MB)

Microsoft previews C++ code editing tools for GitHub Copilot 22 Dec 2025, 3:02 pm

Microsoft is providing early access to C++ code editing tools for GitHub Copilot via the Visual Studio 2026 Insiders channel. These C++ tools allow GitHub Copilot to go beyond file searches and unlock greater context-aware refactoring that enables changes across multiple files and sections, according to Microsoft.

Public availability was announced December 16, with the blog also offering instructions on getting started with the tools. The C++ code editing tools for Copilot had been made available in a private preview on November 12.

Microsoft said the C++ code editing tools offer rich context for any symbol in a project, enabling Copilot agent mode to view all references across a code base, understand metadata such as type, scope, and declaration, visualize class inheritance hierarchies, and trace function call chains. These capabilities help Copilot accomplish complex C++ editing tasks with greater accuracy and speed.

Future plans call for expanding the C++ editing tools support to other GitHub Copilot surfaces, such as Visual Studio Code, to further empower agent-driven edits for C++. Additionally, Microsoft seeks feedback on how to improve the C++ tools experience. Users can report problems or suggest improvements through the Visual Studio feedback icon.

(image/jpeg; 8.37 MB)

Cursor owner Anysphere agrees to buy Graphite code review tool 22 Dec 2025, 7:16 am

Anysphere, the developer of AI coding assistant Cursor, is adding some code review and debugging skills to its portfolio with the acquisition of Graphite, TechCrunch reported Friday.

The output of AI coding tools often requires extensive debugging, something the company sought to address with new code review capabilities in Cursor 2.0.

Through the acquisition, Anysphere will be able to add new features such as Graphite’s “stacked pull request,” enabling developers to work simultaneously on multiple dependent changes.

(image/jpeg; 6.37 MB)

Building AI agents the safe way 22 Dec 2025, 1:00 am

If you want to know what is actually happening in generative AI—not what vendors pretend in their press releases, but what developers are actually building—Datasette founder Simon Willison gets you pretty close to ground truth. As Willison has been cataloguing for years on his blog, we keep making the same key mistake building with AI as we did in the web 2.0 era: We treat data and instructions as if they are the same thing. That mistake used to give us SQL injection. Now it gives us prompt injection, data exfiltration, and agents that happily (confidently!) do the wrong thing at scale.

Based on Willison’s field notes, here is why your AI agent strategy is probably a security nightmare, and how to fix it with some boring, necessary engineering.

Prompt injection is the new SQL injection

Willison wrote about a talk he gave in October on running Claude Code “dangerously.“ It’s a perfect case study in why agents are both thrilling and terrifying. He describes the productivity boost of “YOLO mode,“ then pivots to why you should fear it: Prompt injection remains “an incredibly common vulnerability.” Indeed, prompt injection is the SQL injection of our day. Willison has been banging the drum on what he calls the lethal trifecta of agent vulnerability. If your system has these three things, you are exposed:

  • Access to private data (email, docs, customer records)
  • Access to untrusted content (the web, incoming emails, logs)
  • The ability to act on that data (sending emails, executing code)

This is not theoretical. It’s not even exotic. If your agent can read a file, scrape a web page, open a ticket, send an email, call a webhook, or push a commit, you have created an automation system that is vulnerable to instruction injection through any untrusted input channel. You can call it “prompt injection“ or “indirect prompt injection“ or “confused deputy.“ The name doesn’t matter. The shape does.

This is where using AI to detect AI attacks starts to look like magical thinking. The security community has been warning for a year that many proposed defenses fail under adaptive attack. One June 2025 paper puts it bluntly: When researchers tune attacks to the defense, they bypass a pile of recent approaches with success rates above 90% in many cases.

In other words, we’re currently building autonomous systems that are essentially confused deputies waiting to happen. The enterprise fix isn‘t better prompts. It‘s network isolation. It‘s sandboxing. It‘s assuming the model is already compromised.

It is, in short, the same old security we used to focus on before AI distracted us from proper hygiene.

Context as a bug, not a feature

There is a lazy assumption in developer circles that more context is better. We cheer when Google (Gemini) or Anthropic (Claude) announces a two-million token window because it means we can stuff an entire codebase into the prompt.

Awesome, right? Well, no.

As I’ve written before, context is not magic memory; it’s a dependency. Every token you add to the context window increases the surface area for confusion, hallucination, and injection attacks. Willison notes that context is not free; it is a vector for poisoning.

The emerging best practice is better architecture, not bigger prompts. Think scoped tools, contexts that are small and explicit, isolated workspaces, and persistent state that lives somewhere designed for persistent state. Context discipline, in other words, means we build systems that aggressively prune what the model sees. In this way, we treat tokens as necessary but dangerous to store in bulk.

Memory is a database problem (again)

Willison calls this “context offloading,” and it’s similar to an argument I keep making: AI memory is just data engineering. For Willison, context offloading is the process of moving state out of the unpredictable prompt and into durable storage. Too many teams are doing this via “vibes,” throwing JSON blobs into a vector store and calling it memory. Notice what happens when we combine these threads:

  • Willison says context is not free, so you must offload state.
  • Offloading state means you are building a memory store (often a vector store, sometimes a hybrid store, sometimes a relational database with embeddings and metadata).
  • That store becomes both the agent’s brain and the attacker’s prize.

Most teams are currently bolting memory onto agents the way early web apps bolted SQL onto forms: quickly, optimistically, and with roughly the same level of input sanitization (not much). That is why I keep insisting memory is just another database problem. Databases have decades of scar tissue, such as least privilege, row-level access controls, auditing, encryption, retention policies, backup and restore, data provenance, and governance.

Agents need the same scar tissue.

Also, remember that memory is not just “What did we talk about last time?” It is identity, permissions, workflow state, tool traces, and a durable record of what the system did and why. As I noted recently, if you can’t replay the memory state to debug why your agent hallucinated, you don’t have a system; you have a casino.

Making ‘vibes’ pay

Willison is often caricatured as an AI optimist because he genuinely loves using these tools to write code. But he distinguishes between “vibe coding“ (letting the AI write scripts and hoping they work) and “vibe engineering.” The difference? Engineering.

In his “JustHTML” project, Willison didn‘t just let the LLM spit out code. He wrapped the AI in a harness of tests, benchmarks, and constraints. He used the AI to generate the implementation, but he used code to verify the behavior.

This tracks with the recent METR study that showed that developers using AI tools often took longer to complete tasks because they spent so much time debugging the AI‘s mistakes. This is, in part, because of the phenomenon I’ve called out where AI-driven code is “almost right,“ and that “almost“ takes a disproportionate amount of time to fix.

The takeaway for the enterprise is clear: AI doesn‘t replace the loop of “write, test, debug.” It just accelerates the “write” part, which means you need to double down on the “test” part.

The boring path forward

The easy days of “wrap an API and ship it” are over, if they were ever real at all. We are moving from the demo phase to the industrial phase of AI, which means that developers need to focus on evals (unit tests, etc.) as the real work. According to Hamel Husain, you should be spending 60% of your time on evaluations. Developers also need to spend much more time getting their architecture right and not simply honing their prompting skills.

The irony is that the “most pressing issues“ in genAI are not new. They’re old. We’re relearning software engineering and security fundamentals in a world where the compiler occasionally makes things up, your code can be socially engineered through a markdown file, and your application’s “state” is a bag of tokens.

So, yes, AI models are magical. But if you want to use them in the enterprise without inadvertently exposing your customer database, you need to stop treating them like magic and start treating them like untrusted, potentially destructive components.

Because, as Willison argues, there’s no “vibe engineering” without serious, boring, actual engineering.

(image/jpeg; 0.94 MB)

6 AI breakthroughs that will define 2026 22 Dec 2025, 1:00 am

The most significant advances in artificial intelligence next year won’t come from building larger models but from making AI systems smarter, more collaborative, and more reliable. Breakthroughs in agent interoperability, self-verification, and memory will transform AI from isolated tools into integrated systems that can handle complex, multi-step workflows. Meanwhile, open-source foundation models will break the grip of AI giants and accelerate innovation.

Here are six predictions for how AI capabilities will evolve in 2026.

Open-source models will break the hold of AI giants

By 2026, the power of foundation models will no longer be limited to a handful of companies. The biggest breakthroughs are now occurring in the post-training phase, where models are refined with specialized data. This shift will enable a wave of open-source models that can be customized and fine-tuned for specific applications. This democratization will allow nimble startups and researchers to create powerful, tailored AI solutions on a shared, open foundation—effectively breaking the monopoly and accelerating a new wave of distributed AI development.

Improvements in context windows and memory will drive agentic innovation

With improvements in foundation models slowing, the next frontier is agentic AI. In 2026, the focus will be on building intelligent, integrated systems that have capabilities such as context windows and human-like memory. While new models with more parameters and better reasoning are valuable, models are still limited by their lack of working memory. Context windows and improved memory will drive the most innovation in agentic AI next year, by giving agents the persistent memory they need to learn from past actions and operate autonomously on complex, long-term goals. With these improvements, agents will move beyond the limitations of single interactions and provide continuous support.

Self-verification will start to replace human intervention

In 2026, the biggest obstacle to scaling AI agents—the build up of errors in multi-step workflows—will be solved by self-verification. Instead of relying on human oversight for every step, AI will be equipped with internal feedback loops, allowing them to autonomously verify the accuracy of their own work and correct mistakes. This shift to self-aware, “auto-judging” agents will allow for complex, multi-hop workflows that are both reliable and scalable, moving them from a promising concept to a viable enterprise solution.

English will become the hottest new programming language

The single most important proving ground for AI’s reasoning capabilities is in coding. An AI’s ability to generate and execute code provides a critical bridge from the statistical, non-deterministic world of large language models to the deterministic, symbolic logic of computers. This is unlocking a new era of English language programming, where the primary skill is not knowing a specific syntax like Go or Python, but being able to clearly articulate a goal to an AI assistant. By 2026, the bottleneck in building new products will no longer be the ability to write code, but the ability to creatively shape the product itself. This shift will democratize software development, leading to a tenfold increase in the number of creators who can now build applications and do higher-value, creative work.

The AI arms race will shift from bigger models to smarter ones

The era of adding more compute and data to build ever-larger foundation models is ending. In 2025, we hit a wall with established scaling laws like the Chinchilla formula. The industry is running out of high-quality pre-training data, and the token horizons needed for training have become unmanageably long. That means the race to build the biggest models will finally slow down. Instead, innovation is rapidly shifting to post-training techniques, where companies are dedicating an increasing portion of their compute resources. This means the focus in 2026 won’t be on sheer size of AI models, but on refining and specializing models with techniques like reinforcement learning to make them dramatically more capable for specific tasks.

Agent interoperability will unlock the next wave of AI productivity

Today, most AI agents operate in walled gardens, unable to communicate or collaborate with agents from other platforms. That’s about to change. By 2026, the next major frontier in enterprise AI will be interoperability—the development of open standards and protocols that allow disparate AI agents to speak to one another. Just as the API economy connected different software services, an “agent economy” will allow agents from different platforms to autonomously discover, negotiate, and exchange services with one another. Solving this challenge will unlock compound efficiencies and automate complex, multi-platform workflows that are impossible today, ushering in the next wave of AI-driven productivity.

The new technical priorities for 2026

Rather than pursuing raw scale, the industry is solving the practical problems that prevent AI from working reliably in production. Self-verification eliminates error accumulation in multi-step workflows. Improved memory transforms one-off interactions into continuous partnerships.

Advances like these mark a maturation of the field. The organizations that can best capitalize on them will recognize that the era of “bigger is better” has given way to an era of “smarter is essential.” Technical progress in AI isn’t slowing, it’s getting more sophisticated.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 3.15 MB)

8 old programming languages developers won’t quit 22 Dec 2025, 1:00 am

The computer revolution has always been driven by the new and the next. The hype-mongers have trained us to assume that the latest iteration of ideas will be the next great leap forward. Some, though, are quietly stepping off the hype train. Whereas the steady stream of new programming languages once attracted all the attention, lately it’s more common to find older languages like Ada and C reclaiming their top spots in the popular language indexes. Yes, these rankings are far from perfect, but they’re a good litmus test of the respect some senior (even ancient) programming languages still command.

It’s also not just a fad. Unlike the nostalgia-driven fashion trends that bring back granny dresses or horn-rimmed glasses, there are sound, practical reasons why an older language might be the best solution for a problem.

For one thing, rewriting old code in some shiny new language often introduces more bugs than it fixes. The logic in software doesn’t wear out or rot over time. So why toss away perfectly debugged code just so we can slurp up the latest syntactic sugar? Sure, the hipsters in their cool startups might laugh, but they’ll burn through their seed round in a few quarters, anyway. Meanwhile, the megacorps keep paying real dividends on their piles of old code. Now who’s smarter?

Sticking with older languages doesn’t mean burying our heads in the sand and refusing to adopt modern principles. Many old languages have been updated with newer versions that add modern features. They add a fresh coat of paint by letting you do things like, say, create object-oriented code.

The steady devotion of teams building new versions of old languages means developers don’t need to chase the latest trend or rewrite our code to conform to some language hipster’s fever dream. We can keep our dusty decks running, even while replacing punch-card terminals with our favorite new editors and IDEs.

Here are eight older languages that are still hard at work in the trenches of modern software development.

COBOL

COBOL is the canonical example of a language that seems like it ought to be long gone, but lives on inside countless blue-chip companies. Banks, insurance companies, and similar entities rely on COBOL for much of their business logic. COBOL’s syntax dates to 1959, but there have been serious updates. COBOL-2002 delivered object-oriented extensions, and COBOL-2023 updated its handling of common database transactions. GnuCOBOL brings COBOL into the open source folds, and IDEs like Visual COBOL and isCOBOL make it easy to double-check whether you’re using COBOL’s ancient syntax correctly.

Perl

Python has replaced Perl for many basic jobs, like writing system glue code. But for some coders, nothing beats the concise and powerful syntax of one of the original scripting languages. Python is just too wordy, they say. The Comprehensive Perl Archive Network (CPAN) is a huge repository of more than 220,000 modules that make handling many common programming chores a snap. In recent months, Perl has surged in the Tiobe rankings, hitting number 10 in September 2025. Of course, this number is in part based on search queries for Perl-related books and other products listed on Amazon. The language rankings use search queries as a proxy for interest in the language itself.

Ada

Development on Ada began in the 1970s, when the US Department of Defense set out to create one standard computer language to unify its huge collection of software projects. It was never wildly popular in the open market, but Ada continues to have a big following in the defense industries, where it controls critical systems. The language has also been updated over the years to add better support for features like object-oriented code in 1995, and contract-based programming in 2012, among others. The current standard, called Ada 2022, embraces new structures for stable, bug-free parallel operations.

Fortran

Fortran dates to 1953, when IBM decided it wanted to write software in a more natural way approximating mathematical formulae instead of native machine code. It’s often called the first higher-level language. Today, Fortran remains popular in hard sciences that need to churn through lots of numerical computations like weather forecasts or simulations of fluid dynamics. More modern versions have added object-oriented extensions (2003) and submodules (2008). There are open source versions like GNU Fortran and companies like Intel continue to support their own internal version of the language.

C, C++, etc.

While C itself might not top the list of popular programming languages, that may be because its acolytes are split between variants like plain C, C++, C#, or Objective C. And, if you’re just talking about syntax, some languages like Java are also pretty close to C. With that said, there are significant differences under the hood, and the code is generally not interoperable between C variants. But if this list is meant to honor programming languages that won’t quit, we must note the popularity of the C syntax, which sails on (and on) in so many similar forms.

Visual Basic

The first version of BASIC (Beginner’s All-purpose Symbolic Instruction Code) was designed to teach school children the magic of for loops and GOSUB (go to subroutine) commands. Microsoft understood that many businesses needed an intuitive way to inject business logic into simple applications. Business users didn’t need to write majestic apps with thousands of classes split into dozens of microservices; they just needed some simple code that would clean up data mistakes or address common use cases. Microsoft created Visual Basic to fill that niche, and today many businesses and small-scale applications continue on in the trenches. VB is still one of the simplest ways to add just a bit of intelligence to a simple application. A few loops and if-then-else statements, just like in the 1960s, but this time backed by the power of the cloud and cloud-hosted services like databases and large language models. That’s still a powerful combination, which is probably why Visual Basic still ranks on the popular language charts.

Pascal

Created by Niklaus Wirth as a teaching language in 1971, Pascal went on to become one of the first great typed languages. But only specific implementations really won over the world. Some old programmers still get teary-eyed when they think about the speed of Turbo Pascal while waiting for some endless React build cycle to finish. Pascal lives on today in many forms, both open source and proprietary. The most prominent version may be Delphi’s compiler, which can target all the major platforms. The impatient among us will love the fact that this old language still comes with the original advertising copy promising that Delphi can “Build apps 5x faster.”

Python

Python is one of the newest languages in this list, with its first public release in 1991. But many die-hard Python developers are forced to maintain older versions of the language. Each new version introduces just enough breaking changes to cause old Python code to fail in some way if you try to run it with the new version. It’s common for developers to set up virtual environments, used to lock-in ancient versions of Python and common libraries. Some of my machines have three or four venvs—like time capsules that let me revisit the time before Covid, or Barack Obama, or even the Y2K bug craze. While Python is relatively young compared to the other languages on this list, the same spirit of devotion to the past lives on in the hearts and minds of Python developers tirelessly supporting old code.

(image/jpeg; 0.18 MB)

Page processed in 0.269 seconds.

Powered by SimplePie 1.3, Build 20180209064251. Run the SimplePie Compatibility Test. SimplePie is © 2004–2025, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.