Back to the future: The most popular JavaScript stories and themes of 2025 2 Jan 2026, 1:00 am

Artificial intelligence and its promise to revolutionize programming—and possibly overthrow human sovereignty—is a central story of the post-Covid world. But for JavaScript developers, it is only one of the forces contending for center stage.

Among the notable trends in 2025 was the emergence of increasing power, importance, and stability on the server side. This was partially driven by the universal expansion of front-end frameworks into full stack ones. It also reflected the maturation of various general-purpose and use-case specific server-side tools.

Alongside this was a move toward simplicity in JavaScript programming, perhaps best typified by tools like HTMX and Hotwire. The elaborate complex of solutions driven by web requirements (UX, performance, state management, etc.) met an articulate counterpoint in new tools that maximize developer experience.

The big question about AI is how it will redefine the applications we build, and not just how we build them. Currently, AI still seems to be just another feature, rather than a radical upsetting of the tech industry apple cart.

Rain or snow, disruption or iteration, JavaScript developers have one of the best seats on the roller coaster ride of software innovation. Here’s a look at some of the most important moments that defined our last year.

7 stories that defined JavaScript development in 2025

If 2025 was a journey, these were the landmarks. Our most popular JavaScript features and tutorials this year tracked with the three major shifts we felt on the ground: Thirst for simplicity, solidification of the server, and the accelerating adoption of AI.

HTMX and Alpine.js: How to combine two great, lean front ends
JavaScript developers started the year looking for a way to simplify without sacrificing power. Many found a solution in combining HTMX with Alpine.js.

ECMAScript 2025: The best new features in JavaScript
The annual JavaScript language update focused on performance and precision. From the lazy evaluation of the new Iterator object to the AI-ready Float16Array, ECMAScript 2025 showed JavaScript is still a living language, evolving to meet modern demands.

10 JavaScript concepts you need to succeed with Node
As JavaScript’s server side evolves toward stability and power, understanding the runtime becomes non-negotiable. This knowledge isn’t just about syntax but the mechanics of the engine. From the nuances of the event loop to the proper handling of streams and buffers, mastering core JavaScript concepts is the difference between writing Node code and architecting scalable, high-performance systems.

Intro to Nitro: The server engine built for modern JavaScript
Do you ever wonder how modern meta-frameworks run effortlessly on everything from a traditional Node server to a Cloudflare Edge Worker? The answer is Nitro. More than just an HTTP server, Nitro is a universal deployment engine. By abstracting away the runtime, Nitro delivers a massive leap in server-side portability, effectively coalescing the fractured landscape of deployment targets into a single, unified interface.

Intro to Nest.js: Server-side JavaScript development on Node
While Nitro handles the runtime, Nest handles the architecture. Nest has emerged as the gold standard for serious, scalable back-end engineering. By moving beyond the “assemble it yourself” mode of Express middleware, and toward a structured development platform, Nest empowers teams to build large-scale apps in JavaScript.

Comparing Angular, React, Vue, and Svelte: What you need to know
The so-called framework wars have gradually evolved to something else. “Framework collaboration” is far less sensational but rings true over the past few years. All the major frameworks (and many less prominent ones) have attained feature parity, mainly by openly influencing and inspiring each other. Choosing a framework is still a meaningful decision, but the difficulty now is in making the best choice among good ones.

Just say no to JavaScript
Lest you think I am a complete JavaScript fanboy, I offer this popular critique by fellow InfoWorld columnist Nick Hodges. Here, he takes aim at JavaScript and sings the praises of TypeScript, while speculating as to why more developers have not yet taken the leap.

(image/jpeg; 1.74 MB)

Enterprise Spotlight: Setting the 2026 IT agenda 1 Jan 2026, 2:00 am

IT leaders are setting their operations strategies for 2026 with an eye toward agility, flexibility, and tangible business results. 

Download the January 2026 issue of the Enterprise Spotlight from the editors of CIO, Computerworld, CSO, InfoWorld, and Network World and learn about the trends and technologies that will drive the IT agenda in the year ahead.

(image/jpeg; 2.07 MB)

What’s next for Azure containers? 1 Jan 2026, 1:00 am

The second part of Azure CTO Mark Russinovich’s “Azure Innovations” Ignite 2025 presentation covered software and a deeper look at the platforms he expects developers will use to build cloud-native applications.

Azure was born as a platform-as-a-service (PaaS) environment, providing the plumbing for your applications so you didn’t have to think about infrastructure, as it was all automated and hidden by APIs and configured through a web portal. Over the years, things have evolved, and Azure now supports virtual infrastructures and a command line where you can manage applications as well as its own infrastructure-as-code (IaC) development tools and language.

Despite all this, the vision of a serverless Azure has been a key driver for many of its innovations, from its Functions on-demand compute platform, to the massive data environment of Fabric and the hosted scalable orchestration platform that underpins the microservice Azure Container Instances. This vision is key to many of the new tools and services Russinovich talked about, delivering a platform that allows developers to concentrate on code.

That approach doesn’t stop Microsoft from working on new hardware and infrastructure features for Azure; they remain essential for many workloads and are key to supporting the new cloud-native models. It’s important to understand what underlies the abstractions we’re using, as it defines the limits of what we can do with code.

Serverless containers at scale

One of the key serverless technologies in Azure is Azure Container Instances. ACI is perhaps best thought of as a way to get many of the benefits of Kubernetes without having to manage and run a Kubernetes environment. It hosts and manages containers for you, handling scaling and container life cycles. In his infrastructure presentation, Russinovich talked about how new direct virtualization tools made it possible to give ACI-hosted containers access to Azure hardware such as GPUs.

Microsoft is making a big bet on ACI, using it to host many elements of key services across Azure and Microsoft 365. These include Excel’s Python support, the Copilot Actions agents, and Azure’s deployment and automation services, with many more under development or in the middle of migrating to the platform. Russinovich describes it as “the plan of record for Microsoft’s internal infrastructure.”

ACI development isn’t only happening at the infrastructure level. It’s also happening in the orchestration services that manage containers. One key new feature is a tool called NGroups, which lets you define fleets of a standard container image that can be scaled up and burst out as needed. This model supports the service’s rapid scaling standby pools which can be deployed in seconds, applying customization as needed.

With ACI needing to support multitenant operations, there’s a requirement for fair managed resource sharing between containers. Otherwise it would be easy for a hostile container to quickly take all the resources on a server. However, there’s still a need for containers within a subscription to be able to share resources as necessary, a model that Russinovich calls “resource oversubscription.”

This is related to a new feature that builds on the direct virtualization capabilities being added to Azure: stretchable instances. Here you can define the minimum and maximum for CPU and memory and adjust as load changes. Where traditionally containers have scaled out, stretchable instances can also scale up and down within the available headroom on a server.

Improving container networking with managed Cilium

Container networking, another area I’ve touched on in the past, is getting upgrades, with improvements to Azure’s support for eBPF and specifically for the Cilium network observability and security tools. Extended Berkeley Packet Filters let you put probes and rules down into the kernel securely without affecting operations, both in Linux and Windows. It’s a powerful way of managing networking in Kubernetes, where Cilium has become an important component of its security stack.

Until now, even though Azure has had deep eBPF support, you’ve had to bring your own eBPF tools and manage them yourself, which does require expertise to run at scale. Not everyone is a Kubernetes platform engineer, and with tools like AKS providing a managed environment for cloud-native applications, having a managed eBPF environment is an important upgrade. The new Azure Managed Cilium tool provides a quick way of getting that benefit in your applications, using it for host routing and significantly reducing the overhead that comes with iptables-based networking.

You’ll see the biggest improvements in pod-to-pod routing with small message sizes. This shouldn’t be a surprise: the smaller the message, the bigger the routing overhead using iptables. Understanding how this can affect your applications can help you design better messaging, and where small messages get delivered three times faster, it’s worth optimizing applications to take advantage of these performance boosts.

By integrating Cilium with Azure’s AKS, it now becomes the default way to manage container networking on a pod host (38% faster over a bring-your-own install), working as part of the familiar Advanced Container Networking Services tools. On top of that, Microsoft will ensure your Cilium instance is up to date and will provide support that a bring-your-own instance won’t get.

Even though you are unlikely to interact directly with Azure’s hardware, many of the platform innovations Russinovich talks about depend on the infrastructure changes he discussed in a previous Ignite session, especially on things like the network accelerator in Azure Boost.

This underpins upgrades to Azure Container Storage, working with both local NVMe storage and remote storage using Azure’s storage services. One upgrade here is a distributed cache that allows a Kubernetes cluster to share data using local storage rather than download it to every pod every time you need to use it—a problem that’s increasingly an issue for applications that spin up new pods and nodes to handle inferencing. Using the cache, a download that might take minutes is now a local file access that takes seconds.

Securing containers at an OS level

It’s important to remember that Azure (and other hyperscalers) isn’t in the business of giving users their own servers; its model uses virtual machines and multiple tenants to get as much use out of its hardware as possible. That approach demands a deep focus on security, hardening images and using isolation to separate virtual infrastructures. In the serverless container world, especially with the new direct virtualization features, we need to lock down even more than in a VM, as our ACI-hosted containers are now sharing the same host OS.

Declarative policies let Azure lock down container features to reduce the risk of compromised container images affecting other users. At the same time, it’s working to secure the underlying host OS, which for ACI is Linux. SELinux allows Microsoft to lock that image down, providing an immutable host OS. However, those SELinux policies don’t cross the boundary into containers, leaving their userspace vulnerable.

Microsoft has been adding new capabilities to Linux that can verify the code running in a container. This new feature, Integrity Policy Enforcement, is now part of what Microsoft calls OS Guard, along with another new feature: dm-verity. Device-mapper-verity is a way to provide a distributed hash of the containers in a registry and the layers that go into composing a container, from the OS image all the way up to your binaries. This allows you to sign all the components of a container and use OS Guard to block containers that aren’t signed and trusted.

Delivering secure hot patches

Having a policy-driven approach to security helps quickly remediate issues. If, say, a common container layer has a vulnerability, you can build and verify a patch layer and deploy it quickly. There’s no need to patch everything in the container, only the relevant components. Microsoft has been doing this for OS features for some time now as part of its internal Project Copacetic, and it’s extending the process to common runtimes and libraries, building patches with updated packages for tools like Python.

As this approach is open source, Microsoft is working to upstream dm-verity into the Linux kernel. You can think of it as a way to deploy hot fixes to containers between building new immutable images, quickly replacing problematic code and keeping your applications running while you build, test, and verify your next release. Russinovich describes it as rolling out “a hot fix in a few hours instead of days.”

Providing the tools needed to secure application delivery is only part of Microsoft’s move to defining containers as the standard package for Azure applications. Providing better ways to scale fleets of containers is another key requirement, as is improved networking. Russinovich’s focus on containers makes sense, as they allow you to wrap all the required components of a service and securely run it at scale.

With new software services building on improvements to Azure’s infrastructure, it’s clear that both sides of the Azure platform are working together to deliver the big picture, one where we write code, package it, and (beyond some basic policy-driven configuration) let Azure do the rest of the work for us. This isn’t something Microsoft will deliver overnight, but it’s a future that’s well on its way—one we need to get ready to use.

(image/jpeg; 5.85 MB)

Critical vulnerability in IBM API Connect could allow authentication bypass 31 Dec 2025, 5:49 pm

IBM is urging customers to quickly patch a critical vulnerability in its API Connect platform that could allow remote attackers to bypass authentication.

The company describes API Connect as a full lifecycle application programming interface (API) gateway used “to create, test, manage, secure, analyze, and socialize APIs.”

It particularly touts it as a way to “unlock the potential of agentic AI” by providing a central point of control for access to AI services via APIs. The platform also includes API Agent, which automates tasks across the API lifecycle using AI.

A key component is a customizable self-service portal that allows developers to easily onboard themselves, and to discover and consume multiple types of API, including SOAP, REST, events, ASyncAPIs, GraphQL, and others.

The flaw, tracked as CVE-2025-13915, affects IBM API Connect versions 10.0.8.0 through 10.0.8.5, and version 10.0.11.0, and could give unauthorized access to the exposed applications, with no user interaction required.

An architectural assumption is broken

“CVE-2025-13915 is not best understood as a security bug,” said Sanchit Vir Gogia, chief analyst at Greyhound Research. “It is better understood as a moment where a long standing architectural assumption finally breaks in the open. The assumption is simple and deeply embedded in enterprise design: If traffic passes through the API gateway, identity has been enforced and trust has been established. This vulnerability proves that assumption can fail completely.”

He noted that the classification of the weakness, which maps to CWE-305, is important because it rules out a whole class of what he called comforting explanations. “This is not stolen credentials. It is not role misconfiguration. It is not a permissions mistake,” he said. “The authentication enforcement itself can be circumvented.”

When that happens, he explained, downstream services do not simply face elevated risk, they lose the foundation on which their access decisions were built because they do not revalidate identity. They were never designed to; they inherit trust.

“Once enforcement fails upstream, inherited trust becomes unearned trust, and the exposure propagates silently,” he said. “This class of vulnerability aligns with automation, broad scanning, and opportunistic probing rather than careful targeting.”

Interim fixes provided

IBM said that the issue was discovered during internal testing, and it has provided interim fixes for each affected version of the software, with individual update details for VMware, OCP/CP4I, and Kubernetes.

The only mitigation suggested for the flaw, according to IBM’s security bulletin, is this: “Customers unable to install the interim fix should disable self-service sign-up on their Developer Portal if enabled, which will help minimize their exposure to this vulnerability.”

The company also notes in its installation instructions for the fixes that the image overrides described in the document must be removed when upgrading to the next release or fixpack.

This, said Gogia, further elevates the risk. “That is not a cosmetic detail,” he noted. “Management planes define configuration truth, lifecycle control, and operational authority across the platform. When remediation touches this layer, the vulnerability sits close to the control core, not at an isolated gateway edge. That raises both blast radius and remediation risk.”

This is because errors in these areas can turn into prolonged exposure or service instability. “[Image overrides] also introduce a governance hazard: Image overrides create shadow state; if they are not explicitly removed later, they persist quietly,” he pointed out. “Over time, they drift out of visibility, ownership, and audit scope. This is how temporary fixes turn into long term risk.”

Most valuable outcome: Learning

He added that the operational challenges involved in remediation are not so much in knowing what has to be done, but in doing it fast enough without breaking the business. And, he said, API governance now needs to include up to date inventories of APIs, their versions, dependencies, and exposure points, as well as monitoring of behavior.

“The most valuable outcome here is not closure,” Gogia observed. “It is learning. Enterprises should ask what would have happened if this flaw had been exploited quietly for weeks. Which services would have trusted the gateway implicitly? Which logs would have shown abnormal behavior? Which teams would have noticed first? Those answers reveal whether trust assumptions are visible or invisible. Organizations that stop at patching will miss a rare opportunity to strengthen resilience before the next control plane failure arrives.”

(image/jpeg; 5.89 MB)

What is cloud computing? From infrastructure to autonomous, agentic-driven ecosystems 31 Dec 2025, 4:21 am

Cloud computing continues to be the platform of choice for large applications and a driver of innovation in enterprise technology. Gartner forecasts public cloud spending alone to the public cloud services market alone will reach $1.42 trillion in current U.S. dollars, driven by AI workloads and enterprise modernization.

Driving this growth are the rise of AI and machine learning on the cloud, adoption of edge computing, the maturation of serverless computing, the emergence of multicloud strategies, improved security and privacy, and more sustainable cloud practices.

What is cloud computing?

While often used broadly, the term cloud computing is defined as an abstraction of compute, storage, and network infrastructure assembled as a platform on which applications and systems are deployed quickly and scaled on the fly.

Most cloud customers consume public cloud computing services over the internet, which are hosted in large, remote data centers maintained by cloud providers. The most common type of cloud computing, SaaS (software as service), delivers prebuilt applications to the browsers of customers who pay per seat or by usage, exemplified by such popular apps as Salesforce, Google Docs, or Microsoft Teams.

5 top trends in cloud computing

  1. Agentic cloud ecosystems: The shift from AI as a tool to AI as an autonomous operator within cloud environments.
  2. Sovereign and localized clouds: Meeting strict national data residency and digital sovereignty laws.
  3. Specialized AI hardware access: Navigating the GPU capacity crunch through reserved instances and boutique AI clouds.
  4. Integrated greenOps: Merging cost optimization with mandatory carbon-footprint reporting.
  5. Industry-specific walled gardens: The maturation of vertical clouds into highly regulated, precompliant environments for finance and healthcare.

Next in line is IaaS (infrastructure as a service), which offers vast, virtualized compute, storage, and network infrastructure upon which customers build their own applications, often with the aid of providers’ API-accessible services.

When people refer to the “the cloud” today, they most often mean the big IaaS providers: AWS (Amazon Web Services), Google Cloud Platform, or Microsoft Azure. All three have become ecosystems of services that go way beyond infrastructure and include developer tools, serverless computing, machine learning services and APIs, data warehouses, and thousands of other services. With both SaaS and IaaS, a key benefit is agility. Customers gain new capabilities almost instantly without the capital investment in hardware or software on-premises — and they can instantly scale the cloud resources they consume up or down as needed.

According to Foundry’s Cloud Computing Study, 2025, enterprises are moving to the cloud to improve security and/or governance, increase scalability​, accelerate adoption of artificial intelligence and machine learning and other new technologies, replace on-premises legacy technology, ​improve employee productivity, and ensure disaster recovery and business continuity.

Hyperscalers now dominate cloud services

The largest cloud service providers are often described as hyperscalers, due to their capability to provide large-scale data centers across the globe. Hyperscalers typically offer a wide range of cloud services, including IaaS, PaaS, SaaS, and more.

As mentioned above, notable hyperscalers include Amazon Web Services (AWS), Google Cloud Platform, and Microsoft Azure. They offer the following capabilities.

  • Scalability: Hyperscalers can handle massive workloads and scale resources up or down quickly.
  • Cost-effectiveness: Hyperscalers often offer competitive pricing and economies of scale.
  • Global reach: Hyperscalers operate data centers around the world, providing low-latency access to customers in different regions.
  • Innovation: Hyperscalers are at the forefront of cloud innovation, offering new services and features.

Challenges of working with hyperscalers

  • Vendor lock-in: Relying heavily on a single hyperscaler can create vendor lock-in, making it difficult to switch to another provider and charging large egress fees if you do move.
  • Complexity: Hyperscalers offer a vast array of services, which can be overwhelming for some customers.
  • Security concerns: Because hyperscalers handle sensitive data, security is a major concern.

AI, Agents, and the Sovereign Cloud

The AI-enabled enterprise has moved beyond simple chatbots. The focus has shifted to agentic workflows — autonomous systems that reside in the cloud and possess the authority to execute business processes, manage cloud spend, and self-patch security vulnerabilities without human intervention.

The shift to agentic infrastructure

Cloud providers are no longer just selling compute. They are selling inference-as-a-service. Modern cloud budgets are now dominated by the high cost of specialized GPU clusters (such as Nvidia’s Blackwell architecture). This has led to the rise of boutique AI clouds that compete with hyperscalers by offering bare-metal access to the latest silicon specifically for model training and fine-tuning.

Data sovereignty and private AI

A major shift in late 2025 is the move away from public AI models for sensitive data. Organizations are increasingly using retrieval-augmented generation (RAG) within walled garden environments. This ensures that a company’s proprietary data never leaves their specific cloud instance to train a provider’s base model.

Furthermore, sovereign AI has become a requirement for global operations. Governments now demand that the AI models processing their citizens’ data be hosted on infrastructure that is owned, operated, and governed within their own borders.

The challenges of ghost AI

Just as shadow IT plagued the 2010s, ghost AI—unauthorized AI agents running on corporate cloud accounts — has become a primary security risk. Managing these autonomous entities requires a new layer of AI governance, where the cloud provider automatically audits the intent and permissions of every running agent to prevent runaway costs or data leaks.

Cloud computing definitions

In 2011, NIST posted a PDF that divided cloud computing into three “service models” — SaaS, IaaS, and PaaS (platform as a service) — the latter being a controlled environment within which customers develop and run applications. These three categories have largely stood the test of time, although most PaaS solutions now are made available as services within IaaS ecosystems rather than as dedicated PaaS clouds.

Two evolutionary trends stand out since NIST’s threefold definition. One is the long and growing list of subcategories within SaaS, IaaS, and PaaS, some of which blur the lines between categories. The other is the explosion of API-accessible services available in the cloud, particularly within IaaS ecosystems. The cloud has become a crucible of innovation where many emerging technologies appear first as services, a big attraction for business customers who understand the potential competitive advantages of early adoption.

SaaS (software as a service) definition

This type of cloud computing delivers applications over the internet, typically with a browser-based user interface. Today, most software companies offer their wares via SaaS — if not exclusively, then at least as an option.

The most popular SaaS applications for business are Google’s G Suite and Microsoft’s Office 365. Most enterprise applications, including giant ERP suites from Oracle and SAP, come in both SaaS and on-premises versions. SaaS applications typically offer extensive configuration options as well as development environments that enable customers to code their own modifications and additions. They also enable data integration with on-prem applications.

IaaS (infrastructure as a service) definition

At a basic level, IaaS cloud providers offer virtualized compute, storage, and networking over the internet on a pay-per-use basis. Think of it as a data center maintained by someone else, remotely, but with a software layer that virtualizes all those resources and automates customers’ ability to allocate them with little trouble.

But that’s just the basics. The full array of services offered by the major public IaaS providers is staggering: highly scalable databases, virtual private networks, big data analyticsAI and machine learning services, application platforms, developer tools, devops tools, and so on. Amazon Web Services was the first IaaS provider and remains the leader, followed by Microsoft AzureGoogle Cloud PlatformIBM Cloud, and Oracle Cloud.

PaaS (platform as a service) definition

PaaS provides sets of services and workflows that specifically target developers, who can use shared tools, processes, and APIs to accelerate the development, testing, and deployment of applications. Salesforce’s Heroku and Salesforce Platform (formerly Force.com) are popular public cloud PaaS offerings; Cloud Foundry and Red Hat’s OpenShift can be deployed on premises or accessed through the major public clouds. For enterprises, PaaS can ensure that developers have ready access to resources, follow certain processes, and use only a specific array of services, while operators maintain the underlying infrastructure.

FaaS (function as a service) definition

FaaS, the original and most basic version of serverless computing, adds another layer of abstraction to PaaS, so that developers are insulated from everything in the stack below their code. Instead of futzing with virtual servers, containers, and application runtimes, developers upload narrowly functional blocks of code, and set them to be triggered by a certain event (such as a form submission or uploaded file). All of the major clouds offer FaaS on top of IaaS: AWS LambdaAzure Functions, Google Cloud Functions, and IBM Cloud Functions. A special benefit of FaaS applications is that they consume no IaaS resources until an event occurs, reducing pay-per-use fees.

Private cloud definition

A private cloud downsizes the technologies used to run IaaS public clouds into software that can be deployed and operated in a customer’s data center. As with a public cloud, internal customers can provision their own virtual resources to build, test, and run applications, with metering to charge back departments for resource consumption. For administrators, the private cloud amounts to the ultimate in data center automation, minimizing manual provisioning and management.

VMware remains a force in the private cloud software market, but the acquisition by Broadcom has created confusion and raised concerns among some customers about potential changes in pricing, licensing, and support. This could lead some organizations to explore alternative solutions.

OpenStack continues to be a popular open-source choice for building private clouds. It offers a flexible and customizable platform that can be tailored to specific needs. However, OpenStack can be complex to deploy and manage, and it may require significant expertise to maintain.

Kubernetes, a container orchestration platform that has gained significant traction in recent years, is often used in conjunction with other technologies like OpenStack to build cloud-native applications. Red Hat OpenShift is a comprehensive cloud platform based on Kubernetes that provides a managed experience for deploying and managing container-based, applications.

Many cloud providers offer their own cloud-native platforms and tools, such as AWS Outposts, Azure Stack, and Google Cloud Anthos.

Common factors to consider when evaluating private cloud platforms include the following:

  1. Pricing: The initial cost of deployment and ongoing maintenance costs.
  2. Complexity: The level of technical expertise needed to manage the platform.
  3. Flexibility: The ability to customize the platform to meet specific needs.
  4. Vendor lock-in: The degree to which the organization is tied to a particular vendor.
  5. Security: The security features and capabilities of the platform.
  6. Scalability: The capability to expand the platform to meet future needs.

Hybrid cloud definition

A hybrid cloud is the integration of a private cloud with a public cloud. At its most developed, the hybrid cloud involves creating parallel environments in which applications can move easily between private and public clouds. In other instances, databases may stay in the customer data center and integrate with public cloud applications — or virtualized data center workloads may be replicated to the cloud during times of peak demand. The types of integrations between private and public clouds vary widely, but they must be extensive to earn a hybrid cloud designation.

Public APIs (application programming interfaces) definition

Just as SaaS delivers applications to users over the internet, public APIs offer developers application functionality that can be accessed programmatically. For example, in building web applications, developers often tap into the Google Maps API to provide driving directions; to integrate with social media, developers may call upon APIs maintained by Twitter, Facebook, or LinkedIn. Twilio has built a successful business delivering telephony and messaging services via public APIs. Ultimately, any business can provision its own public APIs to enable customers to consume data or access application functionality.

iPaaS (integration platform as a service) definition

Data integration is a key issue for any sizeable company, but particularly for those that adopt SaaS at scale. iPaaS providers typically offer prebuilt connectors for sharing data among popular SaaS applications and on-premises enterprise applications, though providers may focus more or less on business-to-business and e-commerce integrations, cloud integrations, or traditional SOA-style integrations. iPaaS offerings in the cloud from such providers as Dell Boomi, Informatica, MuleSoft, and SnapLogic also let users implement data mapping, transformations, and workflows as part of the integration-building process.

IDaaS (identity as a service) definition

The most difficult security issue related to cloud computing is managing user identity and its associated rights and permissions across data centers and pubic cloud sites. IDaaS providers maintain cloud-based user profiles that authenticate users and enable access to resources or applications based on security policies, user groups, and individual privileges. The ability to integrate with various directory services (Active Directory, LDAP, etc.) and provide single sign-on across business-oriented SaaS applications is essential.

Leaders in IDaaS include Microsoft, IBM, Google, Oracle, Okta, Capgemini, Okta, Junio Corporation, OneLogin, and JumpCloud.  

Collaboration platforms

Collaboration solutions such as Slack and Microsoft Teams have become vital messaging platforms that enable groups to communicate and work together effectively. Basically, these solutions are relatively simple SaaS applications that support chat-style messaging along with file sharing and audio or video communication. Most offer APIs to facilitate integrations with other systems and enable third-party developers to create and share add-ins that augment functionality.

Vertical clouds

Key providers in such industries as financial services, healthcare, retail, life sciences, and manufacturing provide PaaS clouds to enable customers to build vertical applications that tap into industry-specific, API-accessible services. Vertical clouds can dramatically reduce the time to market for vertical applications and accelerate domain-specific B2B integrations. Most vertical clouds are built with the intent of nurturing partner ecosystems.

Other cloud computing considerations

The most widely accepted definition of cloud computing means that you run your workloads on someone else’s servers, but this is not the same as outsourcing. Virtual cloud resources and even SaaS applications must be configured and maintained by the customer. Consider these factors when planning a cloud initiative.

Cloud computing security considerations

Objections to the public cloud generally begin with cloud security, although the major public clouds have proven themselves much less susceptible to attack than the average enterprise data center.

Of greater concern is the integration of security policy and identity management between customers and public cloud providers. In addition, government regulation may forbid customers from allowing sensitive data off-premises. Other concerns include the risk of outages and the long-term operational costs of public cloud services.

Multicloud management considerations

To enhance their operational efficiency, reduce costs, and improve security, many companies are increasingly turning to multicloud strategies. By distributing workloads across multiple cloud providers, organizations can avoid vendor lock-in, optimize costs, and leverage the best-of-breed services offered by different providers.

This multicloud approach also improves performance and reliability by minimizing downtime and optimizing latency. Additionally, multicloud strategies strengthen security by diversifying the attack surface and facilitating compliance with industry regulations. Finally, by replicating critical workloads across multiple regions and providers, companies can establish robust disaster recovery and business continuity plans, ensuring minimal disruption in the event of catastrophic failures.

The bar to qualify as a multicloud adopter is low: A customer just needs to use more than one public cloud service. However, depending on the number and variety of cloud services involved, managing multiple clouds can become complex from both a cost optimization and a technology perspective.

In some cases, customers subscribe to multiple cloud services simply to avoid dependence on a single provider. A more sophisticated approach is to select public clouds based on the unique services they offer and, in some cases, integrate them. For example, developers might want to use Google’s Vertex AI Studio on Google Cloud Platform to build AI-driven applications, but prefer Jenkins hosted on the CloudBees platform for continuous integration.

To control costs and reduce management overhead, some customers opt for cloud management platforms (CMPs) and/or cloud service brokers (CSBs), which let you manage multiple clouds as if they were one cloud. The problem is that these solutions tend to limit customers to such common-denominator services as storage and compute, ignoring the panoply of services that make each cloud unique.

Edge computing considerations

You often see edge computing incorrectly described as an alternative to cloud computing. Edge computing is about moving compute to local devices in a highly distributed system, typically as a layer around a cloud computing core. There is typically a cloud involved to orchestrate all of the devices and take in their data, then analyze it or otherwise act on it. 

To the cloud and back – why repatriation is real

While public cloud offers scalability and flexibility, some enterprises are opting to return to on-premises infrastructure due to rising costs, data security concerns, performance issues, vendor lock-in, and regulatory compliance challenges. While the public cloud offers scalability and flexibility, on-premises infrastructure provides greater control, customization, and potential cost savings in certain scenarios leading some technology decision-makers to consider repatriation. However, a hybrid cloud approach, combining public and private cloud, often offers the best balance of benefits.

More specific reasons to repatriate including the following:

  • Unanticipated costs, such as data transfer fees, storage charges, and egress fees, can quickly escalate, especially for large-scale cloud deployments.  
  • Inaccurate resource provisioning or underutilization can lead to higher-than-expected costs.
  • Stricter data privacy regulations require organizations to store and process data within specific geographic boundaries.  
  • For highly sensitive data, companies may prefer to maintain greater control over security measures and access permissions. 
  • On-premises infrastructure can offer lower latency, particularly for applications requiring real-time processing or high-performance computing.  
  • Overreliance on a single cloud provider can limit flexibility and increase costs. Repatriation allows organizations to diversify their infrastructure and reduce vendor dependency.  
  • Industries with stringent compliance requirements may find it easier to meet standards with on-premises infrastructure.  
  • On-premises environments offer greater control over hardware, software, and network configurations, allowing for customized solutions.  

Benefits of cloud computing

The cloud’s main appeal is to reduce the time to market of applications that need to scale dynamically. Increasingly, however, developers are drawn to the cloud by the abundance of advanced new services that can be incorporated into applications, from machine learning to internet of things (IoT) connectivity.

Although businesses sometimes migrate legacy applications to the cloud to reduce data center resource requirements, the real benefits accrue to new applications that take advantage of cloud services and “cloud native” attributes. The latter include microservices architecture, Linux containers to enhance application portability, and container management solutions such as Kubernetes that orchestrate container-based services. Cloud-native approaches and solutions can be part of either public or private clouds and help enable highly efficient devops workflows.

Cloud computing, be it public or private or hybrid or multicloud, has become the platform of choice for large applications, particularly customer-facing ones that need to change frequently or scale dynamically. More significantly, the major public clouds now lead the way in enterprise technology development, debuting new advances before they appear anywhere else. Workload by workload, enterprises are opting for the cloud, where an endless parade of exciting new technologies invite innovative use.

SaaS has its roots in the ASP (application service provider) trend of the early 2000s, when providers would run applications for business customers in the provider’s data center, with dedicated instances for each customer. The ASP model was a spectacular failure because it quickly became impossible for providers to maintain so many separate instances, particularly as customers demanded customizations and updates.

Salesforce is widely considered the first company to launch a highly successful SaaS application using multitenancy — a defining characteristic of the SaaS model. Rather than each Salesforce customer getting its own application instance, customers who subscribe to the company’s salesforce automation software share a single, large, dynamically scaled instance of an application (like tenants sharing an apartment building), while storing their data in separate, secure repositories on the SaaS provider’s servers. Fixes can be rolled out behind the scenes with zero downtime and customers can receive UX or functionality improvements as they become available.

(image/jpeg; 1.5 MB)

Intro to Hotwire: HTML over the wire 31 Dec 2025, 1:00 am

If you’ve been watching the JavaScript landscape for a while, you’ve likely noticed the trend toward simplicity in web application development. An aspect of this trend is leveraging HTML, REST, and HATEOAS (hypermedia as the engine of application state) to do as much work as possible. In this article, we’ll look at Hotwire, a collection of tools for building single-page-style applications using HTML over the wire.

Hotwire is a creative take on front-end web development. It’s also quite popular, with more than 33,000 stars on GitHub and 493,000 weekly NPM downloads as of this writing.

Hotwire: An alternative to HTMX

Hotwire is built on similar principles to HTMX and offers an alternative approach to using HTML to drive the web. Both projects strive to eliminate boilerplate JavaScript and let developers do more with simple markup. Both embrace HATEOAS and the original form of REST. The central insight here is that application markup can contain both the state (or data) and the structure of how data is to be displayed. This makes it possible to sidestep the unnecessary logistics of marshaling JSON at both ends.

This concept isn’t new—in fact, it is the essence of representational state transfer (REST). Instead of converting to a special data format (JSON) on the server, then sending that over to the client where it is converted for the UI (HTML), you can just have the server send the HTML.

Technologies like HTMX and Hotwire streamline the process, making it palatable for developers and users who are acclimated to the endless micro-interactions spawned by Ajax.

Hotwire has three primary JavaScript components, but we are mainly interested in the first two:

  • Turbo: Allows for fine-grained control of page updates.
  • Stimulus: A concise library for client-side interactivity.
  • Native: A library for creating iOS- and Android-native apps from Turbo and Stimulus.

In this article, we will look at Turbo and Stimulus. Turbo has several components that make interactivity with HTML more powerful:

  • Turbo Drive avoids full page reloads for links and form submits.
  • Turbo Frame lets you define areas of the UI that can be loaded independently (including lazy loading).
  • Turbo Streams allows for arbitrarily updating specific page segments (using WebSockets, server-side events, or a form response).

Turbo Drive: Merging pages, not loading pages

In standard HTML, when you load a page, it completely obliterates the existing content and paints all the content anew as it arrives from the server. This is incredibly inefficient and makes for a bad user experience. Turbo Drive takes a different approach by dropping in a JavaScript link, which merges the page contents instead of reloading them.

Think of merging like diffing the current page with the coming page. The header information is updated rather than being wholesale replaced. Modern Turbo even “morphs” the and elements, providing a much smoother transition. (For obvious reasons, this approach is especially effective for page reloads.)

All you have to do is include the turbo script in your page:



It is also important to point out that browser actions like back, forward, and reload all work normally. Merging is a low-cost, low-risk way of improving page navigation and reloads in web pages.

Turbo Frames: Granular UI development

The basic idea in Frames is to decompose the layout of a web page into elements. You then update these frames piecemeal, and only as needed. The overall effect is like using JSON responses to drive reactive updates to the UI, but in this case we are using HTML fragments.

Take this page as an example:


  
Links that change the entire page

Thimbleberry (Rubus parviflorus)

A delicate, native berry with large, soft leaves.

Edit this description
Found a large patch by the creek.
The berries are very fragile.
...

Here we have a top navigation pane with links that will affect the entire page (useable with Turbo Drive). Then there are two interior elements that can be modified in place, without the entire page reload.

The elements capture events within them. So, when you click the link to edit the field notes, the server can respond with a chunk to provide an editable form:


  

Field Notes

Found a large patch by the creek.
The berries are very fragile.

This chunk would be rendered as a live form. The user can make updates and submit the new data, and the server would reply with a new fragment containing the updated frame:


  

Field Notes

Found a large patch by the creek.
The berries are very fragile.
Just saw a bear!

Turbo takes the ID on the arriving frame content and ensures it replaces the same frame on the page (so it is essential that the server puts the correct ID on the fragments it sends). Turbo is smart enough to extract and place only the relevant fragment, even if an entire page is received from the server.

Turbo Streams: Compound updates

Turbo Drive is a simple and effective mechanism for handling basic server interactions. Sometimes, we need more powerful updates that interact with multiple portions of the page, or that are triggered from the server side. For that, Turbo has Streams.

The basic idea is that the server sends a stream of fragments, each with the ID of the part of the UI that will change, along with the content needed for the change. For example, we might have a stream of updates for our wilderness log:


  



  



  

Here, we are using streams instead of frames to handle the notes update. The idea is that each section that needs updating, like the new note, the note counter, and the live form section receive their content as a stream item. Notice the stream items each has an “action” and a “target” to describe what will happen.

Streams can target multiple elements by using the targets (notice the plural here) and a CSS selector to identify the elements that will be affected.

Turbo will automatically handle responses from the server (like for a form response) that contain a collection of elements, placing them correctly into the UI. This will handle many multi-change requirements. Notice also that in this case, when you are using streams, you don’t need to use a . In fact, mixing the two is not recommended. As a rule of thumb, you should use frames for simplicity whenever you can, and upgrade to streams (and dispense with frames) only when you need to.

Reusability

A key benefit to both Turbo Frames and Turbo Streams is being able to reuse the server-side templates that render UI elements both initially and for updates. You simply decompose your server-side template (like RoR templates or Thymeleaf or Kotlin DSL or Pug—whatever tool you are using) into the same chunks the UI needs. Then you can just use them to render both the initial and ongoing states of those chunks.

For example, here’s a simple Pug template that could be used as part of the whole page or to generate update chunks:

turbo-frame#field_notes

  h2 Field Notes

  //- 1. The List: Iterates over the 'notes' array
  div#notes_list
    each note in notes
      div(id=`note_${note.id}`)= note.content

  //- 
    2. The Form: On submission, this fragment is re-rendered
    -  by the server, which includes a fresh, empty form.

  form(action="/berries/thimbleberry/notes", method="post")
    
    div
      label(for="note_content") Add a new note:
    
    div
      //- We just need the 'name' attribute for the server
      textarea(id="note_content", name="content")
    
    div
      input(type="submit", value="Save note")

Server push

It’s also possible to provide background streams of events using the element:

This element automatically connects to a back-end API for SSE or WebSocket updates. These broadcast updates would have the same structure as before:




Which will automatically connect to a back end API for SSE or WebSocket updates.  These broadcast updates would have the same structure as before:


  

Client-side magic with Stimulus

HTMX is sometimes paired with Alpine.js, with the latter giving you fancier front-end interactivity like accordions, drag-and-drop functionality, and so forth. In Hotwire, Stimulus serves the same purpose.

In Stimulus, you use HTML attributes to connect elements to “controllers,” which are chunks of JavaScript functionality. For example, if we wanted to provide a clipboard copy button, we could do something like this:

Thimbleberry (Rubus parviflorus)

A delicate, native berry with large, soft leaves.

Notice the data-contoller attribute. That links the element to the clipboard controller. Stimulus uses a filename convention, and in this case, the file would be: clipboard_controller.js, with contents something like this:

import { Controller } from "@hotwired/stimulus"

export default class extends Controller {

  // Connects to data-clipboard-target="source" 
  // and data-clipboard-target="feedback"
  static targets = [ "source", "feedback" ]

  // Runs when data-action="click->clipboard#copy" is triggered
  copy() {
    // 1. Get text from the "source" target
    const textToCopy = this.sourceTarget.textContent
    
    // 2. Use the browser's clipboard API
    navigator.clipboard.writeText(textToCopy)

    // 3. Update the "feedback" target to tell the user
    this.feedbackTarget.textContent = "Copied!"

    // 4. (Optional) Reset the button after 2 seconds
    setTimeout(() => {
      this.feedbackTarget.textContent = "Copy Name"
    }, 2000)
  }
}

The static target member provides those elements to the controller to work with, based on the data-clipboard-target attribute in the markup. The controller then uses simple JavaScript to perform the clipboard copy and a timed message to the UI.

The basic idea is you keep your JavaScript nicely isolated in small controllers that are linked into the markup as needed. This lets you do whatever extra client-side magic to enhance the server-side work in a manageable way.

Conclusion

The beauty of Hotwire is in doing most of what you need with a very small footprint. It does 80% of the work with 20% of the effort. Hotwire doesn’t have the extravagant power of a full-blown framework like React or a full-stack option like Next, but it gives you most of what you’ll need for most development scenarios. Hotwire also works with any back end with typical technologies.

(image/jpeg; 4.65 MB)

Nvidia licenses Groq’s inferencing chip tech and hires its leaders 30 Dec 2025, 7:22 am

Nvidia has licensed intellectual property from inferencing chip designer Groq, and hired away some of its senior executives, but stopped short of an outright acquisition.

“We’ve taken a non-exclusive license to Groq’s IP and have hired engineering talent from Groq’s team to join us in our mission to provide world-leading accelerated computing technology,” an Nvidia spokesman said Tuesday, via email. But, he said, “We haven’t acquired Groq.”

Groq designs and sells chips optimized for AI inferencing. These chips, which Groq calls language processing units (LPUs), are lower-powered, lower-priced devices than the GPUs Nvidia designs and sells, which these days are primarily used for training AI models. As the AI market matures, and usage shifts from the creation of AI tools to their use, demand for devices optimized for inferencing is likely to grow.

The company also rents out its chips, operating an inferencing-as-a-service business called GroqCloud.

Groq itself announced the deal and the executive moves on Dec. 24, saying “it has entered into a non-exclusive licensing agreement with Nvidia for Groq’s inference technology” and that, as part fo the agreement, “Jonathan Ross, Groq’s Founder, Sunny Madra, Groq’s President, and other members of the Groq team will join Nvidia to help advance and scale the licensed technology.”

The deal could be worth as much as $20 billion, TechCrunch reported.

A way out of the memory squeeze?

There’s tension throughout the supply chain for chips used for AI applications, leading to Nvidia’s CFO reporting in its last earnings call that some of its chips are “sold out” or “fully utilized.” One of the factors contributing to this identified by analysts is a shortage of high-bandwidth memory. Finding ways to make their AI operations less dependent on scarce memory chips is becoming a key objective for AI vendors and enterprise buyers alike.

A significant difference between Groq’s chip designs and Nvidia’s is the type of memory each uses. Nvidia’s fastest chips are designed to work with high-bandwidth memory, the price of which – like that of other fast memory technologies — is soaring due to limited production capacity and rising demand in AI-related applications. Groq, meanwhile, integrates static RAM into its chip designs. It says SRAM is faster and less power-hungry than the dynamic RAM used by competing chip technologies — and another advantage is that it’s not (yet) as scarce as the high-bandwidth memory or DDR5 DRAM used elsewhere. Licensing Groq’s technology opens the way for Nvidia to diversify its memory sourcing.

Not an acquisition

By structuring its relationship with Groq as an IP licensing deal, and hiring the engineers it is most interested in rather than buying their employer, Nvidia avoids taking on the GroqCloud service business just as it is reportedly stepping back from its own service business, DGX cloud, and restructuring it as an internal engineering service. It could also escape much of the antitrust scrutiny that would have accompanied a full-on acquisition.

Nvidia did not respond to questions about the names and roles of the former Groq executives it has hired.

However, Groq’s founder, Jonathan Ross, reports on his LinkedIn profile that he is now chief software architect at Nvidia, while that of Groq’s former president, Sunny Madra, says he is now Nvidia’s VP of hardware.

What’s left of Groq will be run by Simon Edwards, formerly CFO at sales automation software vendor Conga. He joined Groq as CFO just three months ago.

This article first appeared on Network World.

(image/jpeg; 0.13 MB)

2026: The year we stop trusting any single cloud 30 Dec 2025, 1:00 am

For more than a decade, many considered cloud outages a theoretical risk, something to address on a whiteboard and then quietly deprioritize during cost cuts. In 2025, this risk became real. A major Google Cloud outage in June caused hours-long disruptions to popular consumer and enterprise services, with ripple effects into providers that depend on Google’s infrastructure. Microsoft 365 and Outlook also faced code failures and notable outages, as did collaboration platforms like Slack and Zoom. Even security platforms and enterprise backbones suffered extended downtime.

None of these incidents, individually, was apocalyptic. Collectively, they changed the tone in the boardroom. Executives who once saw cloud resilience as an IT talking point suddenly realized that a configuration change in someone else’s platform could derail support queues, warehouse operations, and customer interactions in one stroke.

Relying on one provider is risky

The real story is not that cloud platforms failed; it’s that enterprises quietly allowed those platforms to become single points of failure for entire business models. In 2025, many organizations discovered that their digital transformation had traded physical single points of failure for logical ones in the form of a single region, a single provider, or even a single managed database. When a hyperscaler region had trouble, companies learned the hard way that “highly available within a region” is not the same as “business resilient.”

What caught even seasoned teams off guard was the hidden dependency chain. Organizations that thought they were cloud-agnostic because they used a SaaS provider discovered that the SaaS itself was entirely dependent on a single cloud region. When that region faltered, so did the SaaS—and by extension, the business. This is why 2026 will be the year where dependence itself, not just uptime numbers, becomes a primary design concern.

Resilience gets its own budget line

Every downturn and major outage reshapes budgets. The 2025 incidents are doing that right now. I’m seeing CIOs and CFOs move away from the idea that resilience is something you squeeze in if there’s leftover budget after cost optimization. Instead, resilience is getting explicit funding, with line items for multiregion architectures, modernized backup and restore, and cross-cloud or hybrid continuity strategies.

This is a shift in mindset as much as in money. We once justified resilience in terms of compliance or technical best practices. In 2026, we’ll look for direct revenue protection and risk reduction, often backed by concrete numbers from the 2025 outages: lost transactions, missed service-level agreements, overtime for remediation, and reputational damage. Once those numbers are quantified, resilience stops being a nice to have and becomes a board-sanctioned business control.

Relocation is back

For years, enterprises talked about cloud portability and avoiding lock-in. They then deeply embedded themselves in proprietary services for speed and convenience. 2026 is when many of those same organizations will take a second look and start moving selected workloads and data into more portable, resilient architectures. That does not mean a mass exodus from the major clouds; it means being far more deliberate about which workloads live where and why.

Expect to see targeted workload shifts that move critical customer-facing systems from single-region to multi-region or cross-cloud setups, re-architecting data platforms with replicated storage and active-active databases (meaning that we have two running, with one backing up the other). Also, relocating some systems to private or colocation environments based on risk. Systems that could significantly halt revenue or operations will have their placement and dependencies reassessed.

Redundancy stops being a luxury

In the early cloud days, active-active architectures across regions—or worse, across providers—were viewed as exotic and expensive. In 2026, for selected tiers of applications and data, they will be considered baseline engineering hygiene. The outages of 2025 demonstrated that running “hot–warm” with manual failover often means you are functionally down for hours when you can least afford it.

The response will include more active-active patterns: stateless services across regions managed globally, multi-region data stores with conflict resolution, and messaging layers resilient to provider issues. Enterprises will adopt chaos engineering and failure testing as ongoing practices, requiring continuous resilience proof beyond disaster recovery records.

Rethinking third-party services

One of the more uncomfortable lessons from 2025 was that indirect cloud dependence can hurt just as much as direct dependence. Several SaaS and platform providers marketed themselves as simplifying complexity and insulating customers from cloud details, yet internally ran everything in a single cloud, sometimes a single region. When their underlying cloud experienced issues, customers found they had no visibility, no leverage, and no alternative.

In 2026, smart enterprises will start asking their vendors the hard questions. Which regions and providers do you use? Do you have a tested failover strategy across regions or providers? What happens to my data and SLAs if your primary cloud has a regional incident? Many will diversify not just across hyperscalers, but across SaaS and managed services, deliberately avoiding over-concentration on any provider that cannot demonstrate meaningful redundancy.

Embracing resilience in 2026

If 2025 was the wake-up call, 2026 will be the year to act with discipline. That starts with an honest dependency inventory: not just which clouds you use directly, but which clouds and regions sit beneath your SaaS, security, networking, and operations tools. From there, you can classify systems by business criticality and map appropriate resilience patterns to each class, reserving the most expensive mechanisms, such as cross-region active-active, for systems where downtime is truly existential.

Equally important is organizational change. Resilience is not only an architectural problem; it is an operations, finance, and governance problem. In 2026, the enterprises that succeed will be the ones that align architecture, site reliability engineering, security, and finance around a shared goal: reduce single points of failure in both technology and vendors, validate failover and recovery as rigorously as new features, and treat cloud dependence as a managed business risk rather than a hidden assumption. The cloud is not going away, nor should it, but our blind trust in any single piece of it must stop.

(image/jpeg; 5.25 MB)

How to build RAG at scale 30 Dec 2025, 1:00 am

Retrieval-augmented generation (RAG) has quickly become the enterprise default for grounding generative AI in internal knowledge. It promises less hallucination, more accuracy, and a way to unlock value from decades of documents, policies, tickets, and institutional memory. Yet while nearly every enterprise can build a proof of concept, very few can run RAG reliably in production.

This gap has nothing to do with model quality. It is a systems architecture problem. RAG breaks at scale because organizations treat it like a feature of large language models (LLMs) rather than a platform discipline. The real challenges emerge not in prompting or model selection, but in ingestion, retrieval optimization, metadata management, versioning, indexing, evaluation, and long-term governance. Knowledge is messy, constantly changing, and often contradictory. Without architectural rigor, RAG becomes brittle, inconsistent, and expensive.

RAG at scale demands treating knowledge as a living system

Prototype RAG pipelines are deceptively simple: embed documents, store them in a vector database, retrieve top-k results, and pass them to an LLM. This works until the first moment the system encounters real enterprise behavior: new versions of policies, stale documents that remain indexed for months, conflicting data in multiple repositories, and knowledge scattered across wikis, PDFs, spreadsheets, APIs, ticketing systems, and Slack threads.

When organizations scale RAG, ingestion becomes the foundation. Documents must be normalized, cleaned, and chunked with consistent heuristics. They must be version-controlled and assigned metadata that reflects their source, freshness, purpose, and authority. Failure at this layer is the root cause of most hallucinations. Models generate confidently incorrect answers because the retrieval layer returns ambiguous or outdated knowledge.

Knowledge, unlike code, does not naturally converge. It drifts, forks, and accumulates inconsistencies. RAG makes this drift visible and forces enterprises to modernize knowledge architecture in a way they’ve ignored for decades.

Retrieval optimization is where RAG succeeds or fails

Most organizations assume that once documents are embedded, retrieval “just works.” Retrieval quality determines RAG quality far more than the LLM does. As vector stores scale to millions of embeddings, similarity search becomes noisy, imprecise, and slow. Many retrieved chunks are thematically similar but semantically irrelevant.

The solution is not more embeddings; it is a better retrieval strategy. Large-scale RAG requires hybrid search that blends semantic vectors with keyword search, BM25, metadata filtering, graph traversal, and domain-specific rules. Enterprises also need multi-tier architectures that use caches for common queries, mid-tier vector search for semantic grounding, and cold storage or legacy data sets for long-tail knowledge.

The retrieval layer must behave more like a search engine than a vector lookup. It should choose retrieval methods dynamically, based on the nature of the question, the user’s role, the sensitivity of the data, and the context required for correctness. This is where enterprises often underestimate the complexity. Retrieval becomes its own engineering sub-discipline, on par with devops and data engineering.

Reasoning, grounding, and validation protect answers from drift

Even perfect retrieval does not guarantee a correct answer. LLMs may ignore context, blend retrieved content with prior knowledge, interpolate missing details, or generate fluent but incorrect interpretations of policy text. Production RAG requires explicit grounding instructions, standardized prompt templates, and validation layers that inspect generated answers before returning them to users.

Prompts must be version-controlled and tested like software. Answers must include citations with explicit traceability. In compliance-heavy domains, many organizations route answers through a secondary LLM or rule-based engine that verifies factual grounding, detects hallucination patterns, and enforces safety policies.

Without a structure for grounding and validation, retrieval is only optional input, not a constraint on model behavior.

A blueprint for enterprise-scale RAG

Enterprises that succeed with RAG rely on a layered architecture. The system works not because any single layer is perfect, but because each layer isolates complexity, makes change manageable, and keeps the system observable.

Below is the reference architecture that has emerged through large-scale deployments across fintech, SaaS, telecom, healthcare, and global retail. It illustrates how ingestion, retrieval, reasoning, and agentic automation fit into a coherent platform.

To understand how these concerns fit together, it helps to visualize RAG not as a pipeline but as a vertically integrated stack, one that moves from raw knowledge to agentic decision-making:

RAG stack

Foundry

This layered model is more than an architectural diagram: it represents a set of responsibilities. Each layer must be observable, governed, and optimized independently. When ingestion improves, retrieval quality improves. When retrieval matures, reasoning becomes more reliable. When reasoning stabilizes, agentic orchestration becomes safe enough to trust with automation.

The mistake most enterprises make is collapsing these layers into a single pipeline. That decision works for demos but fails under real-world demands.

Agentic RAG is the next step toward adaptive AI systems

Once the foundational layers are stable, organizations can introduce agentic capabilities. Agents can reformulate queries, request additional context, validate retrieved content against known constraints, escalate when confidence is low, or call APIs to augment missing information. Instead of retrieving once, they iterate through the steps: sense, retrieve, reason, act, and verify.

This is what differentiates RAG demos from AI-native systems. Static retrieval struggles with ambiguity or incomplete information. Agentic RAG systems overcome those limitations because they adapt dynamically.

The shift to agents does not eliminate the need for architecture, it strengthens it. Agents rely on retrieval quality, grounding, and validation. Without these, they amplify errors rather than correct them.

Where RAG fails in the enterprise

Despite strong early enthusiasm, most enterprises confront the same problems. Retrieval latency climbs as indexes grow. Embeddings drift out of sync with source documents. Different teams use different chunking strategies, producing wildly inconsistent results. Storage and LLM token costs balloon. Policies and regulations change, but documents are not re-ingested promptly. And because most organizations lack retrieval observability, failures are hard to diagnose, leading teams to mistrust the system.

These failures all trace back to the absence of a platform mindset. RAG is not something each team implements on its own. It is a shared capability that demands consistency, governance, and clear ownership.

A case study in scalable RAG architecture

A global financial services company attempted to use RAG to support its customer-dispute resolution process. The initial system struggled: retrieval returned outdated versions of policies, latency spiked during peak hours, and agents in the call center received inconsistent answers from the model. Compliance teams raised concerns when the model’s explanations diverged from the authoritative documentation.

The organization re-architected the system using a layered model. They implemented hybrid retrieval strategies that blended semantic and keyword search, introduced strict versioning and metadata policies, standardized chunking across teams, and deployed retrieval observability dashboards that exposed cases where documents contradicted each other. They also added an agent that automatically rewrote unclear user queries and requested additional context when initial retrieval was insufficient.

The results were dramatic. Retrieval precision tripled, hallucination rates dropped sharply, and dispute resolution teams reported significantly higher trust in the system. What changed was not the model but the architecture surrounding it.

Retrieval is the key

RAG is often discussed as a clever technique for grounding LLMs, but in practice it becomes a large-scale architecture project that forces organizations to confront decades of knowledge debt. Retrieval, not generation, is the core constraint. Chunking, metadata, and versioning matter as much as embeddings and prompts. Agentic orchestration is not a futuristic add-on, but the key to handling ambiguous, multi-step queries. And without governance and observability, enterprises cannot trust RAG systems in mission-critical workflows.

Enterprises that treat RAG as a durable platform rather than a prototype will build AI systems that scale with their knowledge, evolve with their business, and provide transparency, reliability, and measurable value. Those who treat RAG as a tool will continue to ship demos, not products.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 5.21 MB)

Understanding AI-native cloud: from microservices to model-serving 29 Dec 2025, 12:12 pm

Cloud computing has fundamentally transformed the way enterprises operate. Initially built for more basic, everyday computing tasks, its capabilities have expanded exponentially with the advent of new technologies (such as machine learning and analytics).

But AI — particularly generative AI and the emerging class of AI agents — presents all-new challenges for cloud architectures. It is resource-hungry, demands ultra-fast latency, and requires new compute pathways and data access. These capabilities can’t simply be bolted on to existing cloud infrastructures.

Simply put, AI has upended the traditional cloud computing paradigm, leading to a new category of infrastructure: AI-native cloud.

Understanding AI-native cloud

AI-native cloud, or cloud-native AI, is still a new concept, but it is broadly understood as an extension of cloud native. It is infrastructure built with AI and data as cornerstones, allowing forward-thinking enterprises to infuse AI into their operations, strategies, analysis, and decision-making processes from the very start.

Differences between AI-native and traditional cloud models

Cloud computing has become integral to business operations, helping enterprises scale and adopt new technologies. In recent years, many organizations have shifted to a ‘cloud native’ approach, meaning they are building and running apps directly in the cloud to take full advantage of its benefits and capabilities. Many of today’s modern applications live in public, private, and hybrid clouds.

According to the Cloud Native Computing Foundation (CNCF), cloud native approaches incorporate containers, service meshes, microservices, immutable infrastructure, and declarative APIs. “These techniques enable loosely coupled systems that are resilient, manageable, and observable,” CNCF explains.

5 things you need to know about AI-native cloud

  1. AI is the core technology: In a traditional cloud, AI is an add-on. In an AI-native cloud, every layer—from storage to networking — is designed to handle the high-throughput, low-latency demands of large models.
  2. GPU-first orchestration: AI-native clouds prioritize GPUs and TPUs. This requires advanced orchestration tools like Kubernetes for AI to manage distributed training and inference economics
  3. The vector foundation: Data modernization is the price of entry. AI-native clouds rely on vector databases to provide long-term memory for AI models, allowing them to access proprietary enterprise data in real-time without hallucinating
  4. Rise of neoclouds: 2026 will seei the rise of specialized neocloud providers (like CoreWeave or Lambda) that offer GPU-centric infrastructure that hyperscalers are often struggling to match in terms of raw performance and cost.
  5. From AIOps to agenticops: The goal isn’t just a faster system; it’s a self-operating one. AI-native cloud allows for agentic AI can autonomously manage network traffic, resolve IT tickets, and optimize cloud spend.

AI-native cloud is an evolution of this strategy, applying cloud-native patterns and principles to build and deploy scalable, repeatable AI apps and workloads. This can help devs and builders overcome key challenges and limitations when it comes to building, running, launching, and monitoring AI workloads with traditional infrastructures.

The challenges with AI in the cloud

The cloud is an evolution of legacy infrastructures, but it was largely built with software-as-a-service (SaaS) and other as-a-service models in mind. In this setting, AI, ML, and advanced analytics become just another workload, as opposed to a core, critical component.

But AI is much  more demanding than traditional workflows, which, when run in the cloud, can lead to higher computing costs, data bottlenecks, hampered performance, and other critical issues.

Generative AI, in particular, requires the following:

  • Specialized hardware and significant computational power
  • Infrastructure that is scalable and flexible
  • Massive and diverse datasets for iterative training
  • High-performance storage, high bandwidth and throughput, diverse data sets, and low-latency access to data

AI data needs are significant and continue to escalate as systems become more complex; data must be processed, handled, managed, transferred, and analyzed rapidly and accurately to ensure the success of AI projects. Distributed computing, parallelism (splitting AI tasks across multiple CPUs or GPUs), ongoing training and iteration, and efficient data handling are essential — but traditional cloud infrastructures can struggle to keep up.

Existing infrastructure simply lacks the flexibility demanded by more intense, complex AI and ML workflows. It can also fragment the user experience, meaning devs and builders have to move back and forth between numerous interfaces, instead of a unified plane.

Essential components of AI-native cloud

Rather than the traditional “lift and shift” cloud migration strategy — where apps and workloads are quickly moved to the cloud “as-is” without redesign — AI-native cloud requires a fundamental redesign and rewiring of infrastructures for a clean slate.

This refactoring involves many of the key principles of cloud-native builds, but in a way that supports the development of AI applications. It requires the following:

  • Microservices architecture
  • Containerized packaging and orchestration
  • Continuous integration/continuous delivery (CI/CD) DevOps practices
  • Observability tools
  • Dedicated data storage
  • Managed services and cloud-native products (like Kubernetes, Terraform, or OpenTelemetry)
  • More complex infrastructures like vector databases

Data modernization is critical for AI; systems require data flow in real time from data lakes, lakehouses or other stores, the ability to connect data and provide context for models, and clear rules for how to use and manage data.

AI workloads must be built in from the start, with training, iteration, deployment, monitoring, and version control capabilities all part of the initial cloud setup. This allows models to be managed just like any other service.

AI-native cloud infrastructures must also support continuous AI evolution. Enterprises can incorporate AIOps, MLOps, and FinOps practices to support efficiency, flexibility, scalability, and reliability. Monitoring tools can flag issues with models (like drift, or performance degradation over time), and security and governance guardrails can support encryption, identity verification, regulatory compliance, and other safety measures.

According to CNCF, AI-native cloud infrastructures can use the cloud’s underlying computing network (CPUs, GPUs, or Google’s TPUs) and storage capabilities to accelerate AI performance and reduce costs.

Dedicated, built-in orchestration tools can do the following:

  • Automate model delivery via CI/CD pipelines
  • Enable distributed training
  • Support scalable data science to automate ML
  • Provide infrastructure for model serving
  • Facilitate data storage via vector databases and other data architectures
  • Enhance model, LLM, and workload observability

The benefits of AI-native cloud and business implications

There are numerous benefits when AI is built in from the start, including the following:

  • Automation of routine tasks
  • Real-time data processing and analytics
  • Predictive insights and predictive maintenance
  • Supply chain management
  • Resource optimization
  • Operational efficiency and scalability
  • Hyper-personalization at scale for tailored services and products
  • Continuous learning, iteration and improvement through ongoing feedback loops.

Ultimately, AI-native cloud allows enterprises to embed AI from day one, unlocking automation, real-time intelligence, and predictive insights to support efficiency, scalability, and personalized experiences.

Paths to the AI-native cloud

Like any technology, there is no one-size-fits-all for AI-native cloud infrastructures.

IT consultancy firm Forrester identifies five “paths” to the AI-native cloud that align with key stakeholders including business leaders, technologists, data scientists, and governance teams. These include:

The open-source AI ecosystem

The cloud embedded Kubernetes into enterprise IT, and what started out as an open-source container orchestration system has evolved into a “flexible, multilayered platform with AI at the forefront,” according to Forrester.

The IT firm identifies different domains in open-source AI cloud, including model-as-a-service, and predicts that devs will shift from local compute to distributed Kubernetes clusters, and from notebooks to pipelines. This “enables direct access to open-source AI innovation.”

AI-centric neo-PaaS

Cloud platform-as-a-service (PaaS) streamlined cloud adoption. Now, Kubernetes-based PaaS provides access to semifinished or prebuilt platforms that abstract away “much or all” of the underlying infrastructure, according to Forrester. This supports integration with existing data science workflows (as well as public cloud platforms) and allows for flexible self-service AI development.

Public cloud platform-managed AI services

Public clouds have taken a distinctly enterprise approach, bringing AI “out of specialist circles into the core of enterprise IT,” Forrester notes. Initial custom models have evolved into widely-used platforms including Microsoft Azure AI Foundry, Amazon Bedrock, Google Vertex, and others. These provided early, easy entry points for exploration, and now serve as the core of many AI-native cloud strategies, appealing to technologists, data scientists, and business teams.

AI infrastructure cloud platforms (neocloud)

AI cloud platforms, or neoclouds, are providing platforms that minimize the use of CPU-based cloud tools (or eliminate it altogether). This approach can be particularly appealing for AI startups and enterprises with “aggressive AI programs,” according to Forrester, and is also a draw for enterprises with strong and growing data science programs.

Data/AI cloud platforms

Data infrastructure providers like Databricks and Snowflake have been using cloud infrastructures from leading providers to hone their own offerings. This has positioned them to provide first-party gen AI tools for model building, fine-tuning, and deployment. This draws on the power of public cloud platforms while insulating customers from those complex infrastructures. This “data/AI pure play” is attractive to enterprises looking to more closely align their data scientists and AI devs with business units, Forrester notes.

Ultimately, when pursuing AI-native cloud options, Forrester advises the following:

  • Start with your primary cloud vendor: Evaluate their AI services and develop a technology roadmap before switching to another provider. Consider adding new vendors if they “dangle a must-have AI capability” your enterprise can’t afford to wait for. Also, tap your provider’s AI training to grow skills throughout the enterprise.
  • Resist the urge of “premature” production deployments: Projects can go awry without sufficient reversal plans, so adopt AI governance that assesses model risk in the context of a particular use case.
  • Learn from your AI initiatives: Take stock of what you’ve done and assess whether your technology needs a refresh or an “outright replacement,” and generalize lessons learned to share across the business.
  • Scale AI-native cloud incrementally based on success in specific domains: Early adoption focused on recommendation and information retrieval and synthesis; internal productivity-boosting apps have since proved advantageous. Start with strategy and prove that the technology can work in a particular area and be translated elsewhere.
  • Take advantage of open-source AI: Managed services platforms like AWS Bedrock, Azure OpenAI, Google Vertex, and others were early entrants in the AI space, but they also offer various open-source opportunities that enterprises of different sizes can customize to their particular needs.

Conclusion

AI-native cloud represents a whole new design philosophy for forward-thinking enterprises. The limits of traditional cloud architectures are becoming increasingly clear, and tomorrow’s complex AI systems can’t be treated as “just another workload.” Next-gen AI-native cloud infrastructures put AI at the core and allow systems to be managed, governed, and improved just like any other mission-critical service.

(image/jpeg; 0.29 MB)

React2Shell: Anatomy of a max-severity flaw that sent shockwaves through the web 29 Dec 2025, 3:03 am

The React 19 library for building application interfaces was hit with a remote code vulnerability, React2Shell, about a month ago. However, as researchers delve deeper into the bug, the larger picture gradually unravels.

The vulnerability enables unauthenticated remote code execution through React Server Components, allowing attackers to execute arbitrary code on affected servers via a crafted request. In other words, a foundational web framework feature quietly became an initial access vector.

What followed was a familiar but increasingly compressed sequence. Within hours of disclosure, multiple security firms confirmed active exploitation in the wild. Google’s Threat Intelligence Group (GTIG) and AWS both reported real-world abuse, collapsing the already-thin gap between vulnerability awareness and compromise.

“React2Shell is another reminder of how fast exploitation timelines have become,” said Nathaniel Jones, field CISO at Darktrace. “The CVE drops, a proof-of-concept is circulating, and within hours you’re already seeing real exploitation attempts.”

That speed matters because React Server Components are not a niche feature. They are embedded into default React and Next.js deployments across enterprise environments, meaning organizations inherited this risk simply by adopting mainstream tooling.

Different reports add new signals

While researchers agreed on the root cause, multiple individual reports have emerged, sharpening the overall picture.

For instance, early analysis by cybersecurity firm Wiz demonstrated how easily an unauthenticated input can traverse the React Server Components pipeline and reach dangerous execution paths, even in clean, default deployments. Unit 42 has expanded on this by validating exploit reliability across environments and emphasizing the minimal variation attackers needed to succeed.

Google and AWS have added operational context by confirming exploitation by multiple threat categories, including state-aligned actors, shortly after disclosure. That validation moved React2Shell out of the “potentially exploitable” category and into a confirmed active risk.

A report from Huntress has shifted focus by documenting post-exploitation behavior. Rather than simple proof-of-concept shells, attackers were observed deploying backdoors and tunneling tools, signalling that React2Shell was already being used as a durable access vector rather than a transient opportunistic hit, the report noted.

However, not all findings amplified urgency. Patrowl’s controlled testing showed that some early exposure estimates were inflated due to version-based scanning and noisy detection logic.

Taken together, the research painted a clearer, more mature picture within days (not weeks) of disclosure.

What the research quickly agreed on

Across early reports from Wiz, Palo Alto Networks’ Unit 42, Google AWS, and others, there was a strong alignment on the core mechanics of React2Shell. Researchers independently confirmed that the flaw lives inside React’s server-side rendering pipeline and stems from unsafe deserialization in the protocol used to transmit component data between client and server.

Multiple teams confirmed that exploitation does not depend on custom application logic. Applications generated using standard tools were vulnerable by default, and downstream frameworks such as Next.js inherited the issue rather than introducing it independently. That consensus reframed React2Shell from a “developer mistake” narrative into a framework-level failure with systemic reach.

This was the inflection point. If secure-by-design assumptions no longer hold at the framework layer, the defensive model shifts from “find misconfigurations” to “assume exposure.”

Speed-to-exploit as a defining characteristic

One theme that emerged consistently across reports was how little time defenders had to react. Jones said Darktrace’s own honeypot was exploited in under two minutes after exposure, strongly suggesting attackers had automated scanning and exploitation workflows ready before public disclosure. “Threat actors already had scripts scanning for the vulnerability, checking for exposed servers, and firing exploits without any humans in the loop,” he said.

Deepwatch’s Frankie Sclafani framed this behavior as structural rather than opportunistic. The rapid mobilization of multiple China-linked groups, he noted, reflected an ecosystem optimized for immediate action. In that model, speed-to-exploit is not a secondary metric but a primary measure of operational readiness. “When a critical vulnerability like React2Shell is disclosed, these actors seem to execute pre-planned strategies to establish persistence before patching occurs,” he said.

This matters because it undercuts traditional patch-response assumptions. Even well-resourced enterprises rarely patch and redeploy critical systems within hours, creating an exposure window that attackers now reliably expect.

What exploitation looked like in practice

Almost immediately after the December 3 public disclosure of React2Shell, active exploitation was observed by multiple defenders. Within hours, automated scanners and attacker tools probed internet-facing React/Next.js services for the flaw.

Threat intelligence teams confirmed that China-nexus state-aligned clusters, including Earth Lumia and Jackpot Panda, were among the early actors leveraging the defect to gain server access and deploy follow-on tooling. Beyond state-linked activity, reports from Unit42 and Huntress detailed campaigns deploying Linux backdoors, reverse proxy tunnels, cryptomining kits, and botnet implants against exposed targets. This was a sign that both espionage and financially motivated groups are capitalizing on the bug.

Data from Wiz and other responders indicates that dozens of distinct intrusion efforts have been tied to React2Shell exploitation, with compromised systems ranging across sectors and regions. Despite these confirmed attacks and public exploit code circulating, many vulnerable deployments remain unpatched, keeping the window for further exploitation wide open.

The lesson React2Shell leaves behind

React2Shell is ultimately less about React than about the security debt accumulating inside modern abstractions. As frameworks take on more server-side responsibility, their internal trust boundaries become enterprise attack surfaces overnight.

The research community mapped this vulnerability quickly and thoroughly. Attackers moved even faster. For defenders, the takeaway is not just to patch, but to reassess what “default safe” really means in an ecosystem where exploitation is automated, immediate, and indifferent to intent.

React2Shell is rated critical, carrying a CVSS score of 10.0, reflecting its unauthenticated remote code execution impact and broad exposure across default React Server Components deployments. React maintainers and downstream frameworks such as Next.js have released patches, and researchers broadly agree that affected packages should be updated immediately.

Beyond patching, they warn that teams should assume exploitation attempts may already be underway. Recommendations consistently emphasize validating actual exposure rather than relying on version checks alone, and actively hunting for post-exploitation behavior such as unexpected child processes, outbound tunneling traffic, or newly deployed backdoors. The message across disclosures is clear: React2Shell is not a “patch when convenient” flaw, and the window for passive response has already closed.

The article first appeared on CSO.

(image/jpeg; 3.85 MB)

AI’s trust tax for developers 29 Dec 2025, 1:00 am

Andrej Karpathy is one of the few people in this industry who has earned the right to be listened to without a filter. As a founding member of OpenAI and the former director of AI at Tesla, he sits at the summit of AI and its possibilities. In a recent post, he shared a view that is equally inspiring and terrifying: “I could be 10X more powerful if I just properly string together what has become available over the last ~year,” Karpathy wrote. “And a failure to claim the boost feels decidedly like [a] skill issue.”

If you aren’t ten times faster today than you were in 2023, Karpathy implies that the problem isn’t the tools. The problem is you. Which seems both right…and very wrong. After all, the raw potential for leverage in the current generation of LLM tools is staggering. But his entire argument hinges on a single adverb that does an awful lot of heavy lifting:

“Properly.”

In the enterprise, where code lives for decades, not days, that word “properly” is easy to say but very hard to achieve. The reality on the ground, backed by a growing mountain of data, suggests that for most developers, the “skill issue” isn’t a failure to prompt effectively. It’s a failure to verify rigorously. AI speed is free, but trust is incredibly expensive.

A vibes-based productivity trap

In reality, AI speed only seems to be free. Earlier this year, for example, METR (Model Evaluation and Threat Research) ran a randomized controlled trial that gave experienced open source developers tasks to complete. Half used AI tools; half didn’t. The developers using AI were convinced the LLMs had accelerated their development speed by 20%. But reality bites: The AI-assisted group was, on average, 19% slower.

That’s a nearly 40-point gap between perception and reality. Ouch.

How does this happen? As I recently wrote, we are increasingly relying on “vibes-based evaluation” (a phrase coined by Simon Willison). The code looks right. It appears instantly. But then you hit the “last mile” problem. The generated code uses a deprecated library. It hallucinates a parameter. It introduces a subtle race condition.

Karpathy can induce serious FOMO with statements like this: “People who aren’t keeping up even over the last 30 days already have a deprecated worldview on this topic.” Well, maybe, but as fast as AI is changing, some things remain stubbornly the same. Like quality control. AI coding assistants are not primarily productivity tools; they are liability generators that you pay for with verification. You can pay the tax upfront (rigorous code review, testing, threat modeling), or you can pay it later (incidents, data breaches, and refactoring). But you’re going to pay sooner or later.

Right now, too many teams think they’re evading the tax, but they’re not. Not really. Veracode’s GenAI Code Security Report found that 45% of AI-generated code samples introduced security issues on OWASP’s top 10 list. Think about that.

Nearly half the time you accept an AI suggestion without a rigorous audit, you are potentially injecting a critical vulnerability (SQL injection, XSS, broken access control) into your codebase. The report puts it bluntly: “Congrats on the speed, enjoy the breach.” As Microsoft developer advocate Marlene Mhangami puts it, “The bottleneck is still shipping code that you can maintain and feel confident about.”

In other words, with AI we’re accumulating vulnerable code at a rate manual security reviews cannot possibly match. This confirms the “productivity paradox” that SonarSource has been warning about. Their thesis is simple: Faster code generation inevitably leads to faster accumulation of bugs, complexity, and debt, unless you invest aggressively in quality gates. As the SonarSource report argues, we’re building “write-only” codebases: systems so voluminous and complex, generated by non-deterministic agents, that no human can fully understand them.

We increasingly trade long-term maintainability for short-term output. It’s the software equivalent of a sugar high.

Redefining the skills

So, is Karpathy wrong? No. When he says he can be ten times more powerful, he’s right. It might not be ten times, but the performance gains savvy developers gain from AI are real or have the potential to be so. Even so, the skill he possesses isn’t just the ability to string together tools.

Karpathy has the deep internalized knowledge of what good software looks like, which allows him to filter the noise. He knows when the AI is likely to be right and when it is likely to be hallucinating. But he’s an outlier on this, bringing us back to that pesky word “properly.”

Hence, the real skill issue of 2026 isn’t prompt engineering. It’s verification engineering. If you want to claim the boost Karpathy is talking about, you need to shift your focus from code creation to code critique, as it were:

  • Verification is the new coding. Your value is no longer defined by lines of code written, but by how effectively you can validate the machine’s output.
  • “Golden paths” are mandatory. As I’ve written, you cannot allow AI to be a free-for-all. You need golden paths: standardized, secured templates. Don’t ask the LLM to write a database connector; ask it to implement the interface from your secure platform library.
  • Design the security architecture yourself. You can’t just tell an LLM to “make this secure.” The high-level thinking you embed in your threat modeling is the one thing the AI still can’t do reliably.

“Properly stringing together” the available tools doesn’t just mean connecting an IDE to a chatbot. It means thinking about AI systematically rather than optimistically. It means wrapping those LLMs in a harness of linting, static application security testing (SAST), dynamic application security testing (DAST), and automated regression testing.

The developers who will actually be ten times more powerful next year aren’t the ones who trust the AI blindly. They are the ones who treat AI like a brilliant but very junior intern: capable of flashes of genius, but requiring constant supervision to prevent them from deleting the production database.

The skill issue is real. But the skill isn’t speed. The skill is control.

(image/jpeg; 19.86 MB)

4 New Year’s resolutions for devops success 29 Dec 2025, 1:00 am

It has been a dramatic and challenging year for developers and engineers working in devops organizations. More companies are using AI and automation for both development and IT operations, including for writing requirements, maintaining documentation, and vibe coding. Responsibilities have also increased, as organizations expect devops teams to improve data quality, automate AI agent testing, and drive operational resiliency.

AI is driving new business expectations and technical capabilities, and devops engineers must keep pace with the speed of innovation. At the same time, many organizations are laying off white-collar workers, including more than 120,000 tech layoffs in 2025.

Devops teams are looking for ways to reduce stress and ensure team members remain positive through all the challenges. At a recent event I hosted on how digital trailblazers reduce stress, speakers suggested several stress reduction mechanisms, including limiting work in progress, bringing humor into the day, and building supportive relationships.

As we head into the new year, now is also a good time for engineers and developers to set goals for 2026. I asked tech experts what New Year’s resolutions they would recommend for devops teams and professionals.

1. Fully embrace AI-enabled software development

Developers and automation engineers have had their world rocked over the last two years, with the emergence of AI copilots, code generators, and vibe coding. Developers typically spend time deepening their knowledge of coding languages and broadening their skills to work across different cloud architectures. In 2026, more of this time should be dedicated to learning AI-enabled software development.

“Develop a growth mindset that AI models are not good or bad, but rather a new nondeterministic paradigm in software that can both create new issues and new opportunities,” says Matthew Makai, VP of developer relations at DigitalOcean. “It’s on devops engineers and teams to adapt to how software is created, deployed, and operated.”

Concrete suggestions for this resolution involve shifting both mindset and activities:

  • Makai suggests automating code reviews for security issues and technical defects, given the rise in AI coding tools that generate significantly more code and can transfer technical debt across the codebase.
  • Nic Benders, chief technical strategist at New Relic, says everyone needs to gain experience with AI coding tools. “For those of us who have been around a while, think of vibe coding as the Perl of today. Go find an itch, then have fun knocking out a quick tool to scratch it.”
  • John Capobianco, head of developer relations at Selector, suggests devops teams should strive to embrace vibe-ops. “We can take the principles and the approach that certain software engineers are using with AI to augment software development in vibe-ops and apply those principles, much like devops to net-devops and devops to vibe-ops, getting AI involved in our pipelines and our workflows.”
  • Robin Macfarlane, president and CEO of RRMac Associates, suggests engineers begin to rethink their primary role not as code developers but as code orchestrators, whether working on mainframes or in distributed computing. “This New Year, resolve to learn the programming language you want AI to code in, resolve to do your own troubleshooting, and become the developer who teaches AI instead of the other way around.”

Nikhil Mungel, director of AI R&D at Cribl, says the real AI skill is learning to review, challenge, and improve AI-generated work by spotting subtle bugs, security gaps, performance issues, and incorrect assumptions. “Devops engineers who pair frequent AI use with strong review judgment will move faster and deliver more reliable systems than those who simply accept AI suggestions at face value.”

Mungel recommends that devops engineers commit to the following practices:

  • Tracing the agent decision graph, not just API calls.
  • Building AI-aware security observability around OWASP LLM Top 10 and MCP risks.
  • Capturing A-specific lineage and incidents in CI/CD and ops runbooks.

Resolution: Develop the skills required to use AI for solving development and engineering challenges.

2. Strengthen knowledge of outcome-based, resilient operations

While developers focus on AI capabilities, operational engineers should target resolutions focused on resiliency. The more autonomous systems are in responding to and recovering from issues, the fewer priority incidents devops teams will have to manage, which likely means fewer instances where teams have to join bridge calls in the middle of the night.

A good place to start is improving observability across APIs, applications, and automations.

“Developers should adopt an AI-first, prevention-first mindset, using observability and AIops to move from reactive fixes to proactive detection and prevention of issues,” says Alok Uniyal, SVP and head of process consulting at Infosys. “Strengthen your expertise in self-healing systems and platform reliability, where AI-driven root-cause analysis and autonomous remediation will increasingly define how organizations meet demanding SLAs.”

As more businesses become data-driven organizations and invest in AI as part of their future of work strategy, another place to start building resiliency is in dataops and data pipelines.

“In 2026, devops teams should get serious about understanding the systems they automate, especially the data layer,” says Alejandro Duarte, developer relations engineer at MariaDB. “Too many outages still come from pipelines that treat databases as black boxes. Understanding multi-storage-engine capabilities, analytical and AI workload support, native replication, and robust high availability features will make the difference between restful weekends and late-night firefights.”

At the infrastructure layer, engineers have historically focused on redundancy, auto-scaling, and disaster recovery. Now, engineers should consider incorporating AI agents to improve resiliency and performance.

“For devops engineers, the resolution shouldn’t be about learning another framework, but about mastering the new operating model—AI-driven self-healing infrastructure,” says Simon Margolis, associate CTO AI and ML at SADA. “Your focus must shift from writing imperative scripts to creating robust observability and feedback loops that can enable an AI agent to truly take action. This means investing in skills that help you define intent and outcomes—not steps—which is the only way to unlock true operational efficiency and leadership growth.”

Rather than learning new AI tools, experts suggest reviewing opportunities to develop new AI capabilities within the platforms already used by the organization.

“A sound resolution for the new year is to stop trying to beat the old thing into some new AI solution and start using AI to augment and improve what we already have,” says Brett Smith, distinguished software engineer at SAS. “We need to finally stop chasing the ‘I can solve this with AI’ hype and start focusing on ‘How can AI help me solve this better, faster, cheaper?’”

Resolution: Shift the operating mindset from problem detection, resolution, and root-cause analysis to resilient, self-healing operations.

3. Learn new technology disciplines

It’s one thing to learn a new product or technology, and it’s a whole other level of growth to learn a new discipline. If you’re an application developer, one new area that requires more attention is understanding accessibility requirements and testing methodologies for improving applications for people with disabilities.

“Integrating accessibility into the devops pipeline should be a top resolution, with accessibility tests running alongside security and unit tests in CI as automated testing and AI coding tools mature,” says Navin Thadani, CEO of Evinced. “As AI accelerates development, failing to fix accessibility issues early will only cause teams to generate inaccessible code faster, making shift-left accessibility essential. Engineers should think hard about keeping accessibility in the loop, so the promise of AI-driven coding doesn’t leave inclusion behind.”

Data scientists, architects, and system engineers should also consider learning more about the Model Context Protocol for AI agent-to-agent communications. One place to start is learning the requirements and steps to configure a secure MCP server.

“Devops should focus on mastering MCP, which is set to create an entirely new app development pipeline in 2026,” says Rishi Bhargava, co-founder of Descope. “While it’s still early days for production-ready AI agents, MCP has already seen widespread adoption. Those who learn to build and authenticate MCP-enabled applications now securely will gain a major competitive edge as agentic systems mature over the next six months.”

Resolution: Embrace being a lifelong learner: Study trends and dig into new technologies that are required for compliance or that drive innovation.

4. Develop transformation leadership skills

In my book, Digital Trailblazer, I wrote about the need for transformation leaders, what I call digital trailblazers, “who can lead teams, evolve sustainable ways of working, develop technologies as competitive differentiators, and deliver business outcomes.”

Some may aspire to CTO roles, while others should consider leadership career paths in devops. For engineers, there is tremendous value in developing communication skills and business acumen.

Yaad Oren, managing director of SAP Labs U.S. and global head of research and innovation at SAP, says leadership skills matter just as much as technical fundamentals. “Focus on clear communication with colleagues and customers, and clear instructions with AI agents. Those who combine continuous learning with strong alignment and shared ownership will be ready to lead the next chapter of IT operations.”

For engineers ready to step up into leadership roles but concerned about taking on direct reports, consider mentoring others to build skills and confidence.

“There is high-potential talent everywhere, so aside from learning technical skills, I would challenge devops engineers to also take the time to mentor a junior engineer in 2026,” says Austin Spires, senior director of developer enablement at Fastly. “Guiding engineers early in their career, whether on hard skills like security or soft skills like communication and stakeholder management, is fulfilling and allows them to grow into one of your best colleagues.”

Another option, if you don’t want to manage people, is to take on a leadership role on a strategic initiative. In a complex job market, having agile program leadership skills can open up new opportunities.

Christine Rogers, people and operations leader at Sisense, says the traditional job description is dying. Skills, not titles, will define the workforce, she says. “By 2026, organizations will shift to skills-based models, where employees are hired and promoted based on verifiable capabilities and adaptability, often demonstrated through real projects, not polished resumes.”

Resolution: Find an avenue to develop leadership confidence, even if it’s not at work. There are leadership opportunities at nonprofits, local government committees, and even in following personal interests.

Happy New Year, everyone!

(image/jpeg; 7.56 MB)

High severity flaw in MongoDB could allow memory leakage 26 Dec 2025, 12:12 pm

Document database vendor MongoDB has advised customers to update immediately following the discovery of a flaw that could allow unauthenticated users to read uninitialized heap memory.

Designated CVE-2025-14847, the bug, mismatched length fields in zlib compressed protocol headers, could allow an attacker to execute arbitrary code and potentially seize control of a device.

The flaw affects the following MongoDB and MongoDB Server versions:

  • MongoDB 8.2.0 through 8.2.3
  • MongoDB 8.0.0 through 8.0.16
  • MongoDB 7.0.0 through 7.0.26
  • MongoDB 6.0.0 through 6.0.26
  • MongoDB 5.0.0 through 5.0.31
  • MongoDB 4.4.0 through 4.4.29
  • All MongoDB Server v4.2 versions
  • All MongoDB Server v4.0 versions
  • All MongoDB Server v3.6 versions

In its advisory, MongoDB “strongly suggested” that users upgrade immediately to the patched versions of the software: MongoDB 8.2.3, 8.0.17, 7.0.28, 6.0.27, 5.0.32, or 4.4.30.

However, it said, “if you cannot upgrade immediately, disable zlib compression on the MongoDB Server by starting mongod or mongos with a networkMessageCompressors or a net.compression.compressors option that explicitly omits zlib.”

MongoDB, one of the most popular NoSQL document databases for developers, says it currently has more than 62,000 customers worldwide, including 70% of the Fortune 100.

(image/jpeg; 8.92 MB)

Reader picks: The most popular Python stories of 2025 26 Dec 2025, 1:00 am

Python 3.14 was the star of the show in 2025, bringing official support for free-threaded builds, a new all-in-one installation manager for Windows, and subtler perks like the new template strings feature. Other great updates this year included a growing toolkit of Rust-backed Python tools, several new options for packaging and distributing Python applications, and a sweet little trove of third-party libraries for parallel processing in Python. Here’s our list of the 10 best and most-read stories for Python developers in 2025. Enjoy!

What is Python? Powerful, intuitive programming
Start here, with a top-down view of what makes Python a versatile powerhouse for modern software development, from data science and machine learning to web development and systems automation.

The best new features and fixes in Python 3.14
Released in October 2025, the latest edition of Python makes free-threaded Python an officially supported feature, adds experimental JIT powers, and brings new tools for managing Python versions.

Get started with the new Python Installation Manager
The newest versions of Python on Microsoft Windows come packaged with this powerful all-in-one tool for installing, updating, and managing multiple editions of Python on the same system.

How to use template strings in Python 3.14
One of Python 3.14’s most powerful new features delivers a whole new mechanism for formatting data in strings, more programmable and powerful than the existing “f-string” formatting system.

PyApp: An easy way to package Python apps as executables
This Rust-powered utility brings to life a long-standing dream in the Python world: It turns hard-to-package Python programs into self-contained click-to-runs.

The best Python libraries for parallel processing
Python’s getting better at doing more than one thing at once, and that’s thanks to its “no-GIL” edition. But these third-party libraries give you advanced tools for distributing Python workloads across cores, processors, and multiple machines.

Amp your Python superpowers with ‘uv run’ | InfoWorld
Astral’s uv utility lets you set up and run Python packages with one command, no setup, no fuss, and nothing to clean up when you’re done.

3 Python web frameworks for beautiful front ends
Write Python code on the back end and generate good-looking HTML/CSS/JavaScript-driven front ends, automatically. Here are three ways to Python-code your way to beautiful front ends.

How to boost Python program performance with Zig
The emerging Zig language, making a name as a safer alternative to C, can also be coupled closely with Python—the better to create Python libraries that run at machine-native speed.

PythoC: A new way to generate C code from Python
This new project lets you use Python as a kind of high-level macro system to generate C-equivalent code that can run as standalone programs, and with some unique memory safety features you won’t find in C.

(image/jpeg; 0.33 MB)

A small language model blueprint for automation in IT and HR 25 Dec 2025, 1:00 am

Large language models (LLMs) have grabbed the world’s attention for their seemingly magical ability to instantaneously sift through endless data, generate responses, and even create visual content from simple prompts. But their “small” counterparts aren’t far behind. And as questions swirl about whether AI can actually generate meaningful returns (ROI), organizations should take notice. Because, as it turns out, small language models (SLMs), which use far fewer parameters, compute resources, and energy than large language models to perform specific tasks, have been shown to be just as effective as their much larger counterparts.

In a world where companies have invested ungodly amounts of money on AI and questioned the returns, SLMs are proving to be an ROI savior. Ultimately, SLM-enabled agentic AI delivers the best of both SLMs and LLMs together — including higher employee satisfaction and retention, improved productivity, and lower costs. And given a report from Gartner that said over 40% of agentic AI projects will be cancelled by the end of 2027 due to complexities and rapid evolutions that often lead enterprises down the wrong path, SLMs can be an important tool in any CIO’s chest.

Take information technology (IT) and human resources (HR) functions for example. In IT, SLMs can drive autonomous and accurate resolutions, workflow orchestration, and knowledge access. And for HR, they’re enabling personalized employee support, streamlining onboarding, and handling routine inquiries with privacy and precision. In both cases, SLMs are enabling users to “chat” with complex enterprise systems the same way they would a human representative.

Given a well-trained SLM, users can simply write a Slack or Microsoft Teams message to the AI agent (“I can’t connect to my VPN,” or “I need to refresh my laptop,” or “I need proof of employment for a mortgage application”), and the agent will automatically resolve the issue. What’s more, the responses will be personalized based on user profiles and behaviors and the support will be proactive and anticipatory of when issues might occur.

Understanding SLMs

So, what exactly is an SLM? It’s a relatively ill-defined term, but generally it is a language model with somewhere between one billion and 40 billion parameters, versus 70 billion to hundreds of billions for LLMs. They can also exist as a form of open source where you have access to their weights, biases, and training code.

There are also SLMs that are “open-weight” only, meaning you get access to model weights with restrictions. This is important because a key benefit with SLMs is the ability to fine-tune or customize the model so you can ground it in the nuance of a particular domain. For example, you can use internal chats, support tickets, and Slack messages to create a system for answering customer questions. The fine-tuning process helps to increase the accuracy and relevance of the responses.

Agentic AI will leverage SLMs and LLMs

It’s understandable to want to use state-of-the-art models for agentic AI. Consider that the latest frontier models score highly on math, software development and medical reasoning, just to name a few categories. Yet the question every CIO should be asking: do we really need that much firepower in our organization? For many enterprise use cases, the answer is no.

And even though they are small, don’t underestimate them. Their small size means they have lower latency, which is critical for real-time processing. SLMs can also operate on small form factors, like edge devices or other resource-constrained environments. 

Another advantage with SLMs is that they are particularly effective with handling tasks like calling tools, API interactions, or routing. This is just what agentic AI was meant to do: carry out actions. Sophisticated LLMs, on the other hand, may be slower, engage in overly reasoned handling of tasks, and consume large amounts of tokens.

In IT and HR environments, the balance among speed, accuracy, and resource efficiency for both employees and IT or HR teams matters. For employees, agentic assistants built on SLMs provide fast, conversational help to solve problems faster. For IT and HR teams, SLMs reduce the burden of repetitive tasks by automating ticket handling, routing, and approvals, freeing staff to focus on higher-value strategic work. Furthermore, SLMs also can provide substantial cost savings as these models use relatively smaller levels of energy, memory, and compute power. Their efficiency can prove enormously beneficial when using cloud platforms. 

Where SLMs fall short

Granted, SLMs are not silver bullets either. There are certainly cases where you need a sophisticated LLM, such as for highly complex multi-step processes. A hybrid architecture — where SLMs handle the majority of operational interactions and LLMs are reserved for advanced reasoning or escalations — allows IT and HR teams to optimize both performance and cost. For this, a system can leverage observability and evaluations to dynamically decide when to use an SLM or LLM. Or, if an SLM fails to get a good response, the next step could then be an LLM. 

SLMs are emerging as the most practical approach to achieving ROI with agentic AI. By pairing SLMs with selective use of LLMs, organizations can create balanced, cost-effective architectures that scale across both IT and HR, delivering measurable results and a faster path to value. With SLMs, less is more.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 4.8 MB)

Microsoft is not rewriting Windows in Rust 24 Dec 2025, 7:15 am

A job posting by a Microsoft engineer sparked excitement about a project “to eliminate every line of C and C++ from Microsoft by 2030”, replacing it with Rust — but alas for fans of the memory-safe programming language, it turns out this is a personal goal, not a corporate one, and Rust isn’t necessarily even the final target.

Microsoft Distinguished Engineer Galen Hunt posted about his ambitious goal on LinkedIn four days ago, provoking a wave of excitement and concern.

Now he’s been forced to clarify: “My team’s project is a research project. We are building tech to make migration from language to language possible,” he wrote in an update to his LinkedIn post. His intent, he said, was to find like-minded engineers, “not to set a new strategy for Windows 11+ or to imply that Rust is an endpoint.”

Hunt’s project is to investigate how AI can be used to assist in the translation of code from one language to another at scale. “Our North Star is ‘1 engineer, 1 month, 1 million lines of code’,” he wrote.

He’s recruiting an engineer to help build the infrastructure to do that, demonstrating the technology using Rust as the target language and C and C++ as the source.

The successful candidate will join the Future of Scalable Software Engineering team in Microsoft’s CoreAI group, building static analysis and machine learning tools for AI-assisted translation and migration.

Pressure to ditch C and C++ in favor of memory-safe languages such as Rust comes right from the top, with research by Google and Microsoft showing that around 70 percent of all security vulnerabilities in software are caused by memory safety issues.

However, using AI to rewrite code, even in a memory-safe language, may not make things more secure: AI-generated code typically contains more issues than code written by humans, according to research by CodeRabbit.

That’s not stopping some of the biggest software developers pushing ahead with AI-powered software development, though. Already, AI writes 30% of Microsoft’s new code, Microsoft CEO Satya Nadella said in April.

(image/jpeg; 20.13 MB)

Get started with Python’s new native JIT 24 Dec 2025, 1:00 am

JITing, or “just-in-time” compilation, can make relatively slow interpreted languages much faster. Until recently, JITting was available for Python only in the form of specialized third-party libraries, like Numba, or alternate versions of the Python interpreter, like PyPy.

A native JIT compiler has been added to Python over its last few releases. At first it didn’t provide any significant speedup. But with Python 3.15 (still in alpha but available for use now), the core Python development team has bolstered the native JIT to the point where it’s now showing significant performance gains for certain kinds of program.

Speedups from the JIT range widely, depending on the operation. Some programs show dramatic performance improvements, others not at all. But the work put into the JIT is beginning to pay off, and users can start taking advantage of it if they’re willing to experiment.

Activating the Python JIT

By default, the native Python JIT is disabled. It’s still considered an experimental feature, so it has to be manually enabled.

To enable the JIT, you set the PYTHON_JIT environment variable, either for the shell session Python is running in, or persistently as part of your user environment options. When the Python interpreter starts, it checks its runtime environment for the variable PYTHON_JIT. If PYTHON_JIT is unset or set to anything but 1, the JIT is off. If it’s set to 1, the JIT is enabled.

It’s probably not a good idea to enable PYTHON_JIT as a persistent option. If you’re doing this with a user environment where you’re only running Python with the JIT enabled, it might be useful. But for the most part, you’ll want to set PYTHON_JIT manually — for instance, as part of a shell script to configure the environment.

Verifying the JIT is working

For versions of Python with the JIT (Python 3.13 and above), the sys module in the standard library has a new namespace, sys._jit. Inside it are three utilities for inspecting the state of the JIT, all of which return either True or False. The three utilities:

  • sys._jit.is_available(): Lets you know if the current build of Python has the JIT. Most binary builds of Python shipped will now have the JIT available, except the “free-threaded” or “no-GIL” builds of Python.
  • sys._jit.is_enabled(): Lets you know if the JIT is currently enabled. It does not tell you if running code is currently being JITted, however.
  • sys._jit.is_active(): Lets you know if the topmost Python stack frame is currently executing JITted code. However, this is not a reliable way to tell if your program is using the JIT, because you may end up executing this check in a “cold” (non-JITted) path. It’s best to stick to performance measurements to see if the JIT is having any effect.

For the most part, you will want to use sys._jit.is_enabled() to determine if the JIT is available and running, as it gives you the most useful information.

Python code enhanced by the JIT

Because the JIT is in its early stages, its behavior is still somewhat opaque. There’s no end-user instrumentation for it yet, so there’s no way to gather statistics about how the JIT handles a given piece of code. The only real way to assess the JIT’s performance is to benchmark your code with and without the JIT.

Here’s an example of a program that demonstrates pretty consistent speedups with the JIT enabled. It’s a rudimentary version of the Mandelbroit fractal:

from time import perf_counter
import sys

print ("JIT enabled:", sys._jit.is_enabled())

WIDTH = 80
HEIGHT = 40
X_MIN, X_MAX = -2.0, 1.0
Y_MIN, Y_MAX = -1.0, 1.0
ITERS = 500

YM = (Y_MAX - Y_MIN)
XM = (X_MAX - X_MIN)

def iter(c):
    z = 0j
    for _ in range(ITERS):
        if abs(z) > 2.0:
            return False
        z = z ** 2 + c
    return True

def generate():
    start = perf_counter()
    output = []

    for y in range(HEIGHT):
        cy = Y_MIN + (y / HEIGHT) * YM
        for x in range(WIDTH):
            cx = X_MIN + (x / WIDTH) * XM
            c = complex(cx, cy)
            output.append("#" if iter(c) else ".")
        output.append("\n")
    print ("Time:", perf_counter()-start)
    return output

print("".join(generate()))

When the program starts running, it lets you know if the JIT is enabled and then produces a plot of the fractal to the terminal along with the time taken to compute it.

With the JIT enabled, there’s a fairly consistent 20% speedup between runs. If the performance boost isn’t obvious, try changing the value of ITERS to a higher number. This forces the program to do more work, so should produce a more obvious speedup.

Here’s a negative example — a simple recursively implemented Fibonacci sequence. As of Python 3.15a3 it shows no discernible JIT speedup:

import sys
print ("JIT enabled:", sys._jit.is_enabled())
from time import perf_counter

def fib(n):
    if n 1:
        return n
    return fib(n-1) + fib(n-2)

def main():
    start = perf_counter()
    result = fib(36)
    print(perf_counter() - start)

main()

Why this isn’t faster when JITted isn’t clear. For instance, you might be inclined to think using recursion makes the JIT less effective, but even a non-recursive version of the algorithm doesn’t provide any speedup either.

Using the experimental Python JIT

Because the JIT is still considered experimental, it’s worth approaching it in the same spirit as the “free-threaded” or “no-GIL” builds of Python also now being shipped. You can conduct your own experiments with the JIT to see if provides any payoff for certain tasks, but you’ll always want to be careful about using it in any production scenario. What’s more, each alpha and beta revision of Python going forward may change the behavior of the JIT. What was once performant might not be in the future, or vice versa!

(image/jpeg; 7.1 MB)

AI power tools: 6 ways to supercharge your terminal 24 Dec 2025, 1:00 am

The command line has always been the bedrock of the developer’s world. Since time immemorial, the CLI was a static place defined by the REPL (read-evaluate-print-loop). But now modern AI tools are changing that.

The CLI tells you in spartan terms what is happening with your program, and it does exactly what you tell it to. The lack of frivolity and handholding is both the command-line’s power and its one major drawback. Now, a new class of AI tools seeks to preserve the power of the CLI while upgrading it with a more human-friendly interface.

These tools re-envision the REPL (the read-evaluate-print-loop) as a reason-evaluate loop. Instead of telling your operating system what to do, you just give it a goal and set it loose. Rather than reading the outputs, you can have them analyzed with AI precision. For the lover of the CLI—and everyone else who programs—the AI-powered terminal is a new and fertile landscape.

Gemini CLI

Gemini CLI is an exceptionally strong agent that lets you run AI shell commands. Able to analyze complex project layouts, view outputs, and undertake complex, multipart goals, Gemini CLI isn’t flawless, but it warms the command line-enthusiast’s heart.

A screenshot of the Google Gemini CLI.
Google’s Gemini comes to the command line.

Matthew Tyson

Gemini CLI recently added in-prompt interactivity support, like running vi inside the agent. This lets you avoid dropping out of the AI (or launching a new window) to do things like edit a file or run a long, involved git command. The AI doesn’t retain awareness during your interactions (you can use Ctrl-f to shift focus back to it), but it does observe the outcome when you are done, and may take appropriate actions such as running unit tests after closing vi.

Copilot is rumored to have better Git integration, but I’ve found Gemini performs just fine with git commands.

Like every other AI coding assistant, Gemini CLI can get confused, spin in circles, and spawn regressions, but the actual framing and prompt console are among the best. It feels fairly stable and solid. It does require some adjustments, such as being unable to navigate the file system (e.g., cd /foo/bar) because you’re in the agent’s prompt and not a true shell.

GitHub Copilot CLI

Copilot’s CLI is just as solid as Gemini’s. It handled complex tasks (like “start a new app that lets you visit endpoints that say hello in different languages”) without a hitch. But it’s just as nice to be able to do simple things quickly (like asking, “what process is listening on port 8080?”) without having to refresh system memory.

A screenshot of the GitHub Copilot CLI.
The ubiquitous Copilot VS Code extension, but for the terminal environment.

Matthew Tyson

There are still drawbacks, of course, and even simple things can go awry. For example, if the process listening on 8080 was run with systemctl, Copilot would issue a simple kill command.

Copilot CLI’s ?? is a nice idea, letting you provide a goal to be turned into a prompt—?? find the largest file in this directory yields find . -type f -exec du -h {} + 2>/dev/null | sort -rh | head -10— but I found the normal prompt worked just as well.

I noticed at times that Copilot seemed to choke and hang (or take inordinately long to complete) on larger steps, such as Creating Next.js project (Esc to cancel · 653 B).

In general, I did not find much distinction between Gemini and Copilot’s CLIs; both are top-shelf. That’s what you would expect from the flagship AI terminal tools from Google and Microsoft. The best choice likely comes down to which ecosystem and company you prefer.

Ollama

Ollama is the most empowering CLI in this bunch. It lets you install and run pre-built, targeted models on your local machine. This puts you in charge of everything, eliminates network calls, and discards any reliance on third-party cloud providers (although Ollama recently added cloud providers to its bag of tricks).

A screenshot of the Ollama CLI.
The DIY AI engine.

Matthew Tyson

Ollama isn’t an agent itself but is the engine that powers many of them. It’s “Docker for LLMs”—a simple command-line tool that lets you download, manage, and run powerful open source models like Llama 3 and Mistral directly on your own machine. You run ollama pull llama3 and then ollama run llama3 "..." to chat. (Programmers will especially appreciate CodeLlama.)

Incidentally, if you are not in a headless environment (like Windows) Ollama will install a simple GUI for managing and interacting with installed models (both local and cloud).

Ollama’s killer feature is privacy and offline access. Since the models run entirely locally, none of your prompts or code ever leaves your machine. It’s perfect for working on sensitive projects or in secure environments.

Ollama is an AI server, which gives you an API so that other tools (like Aider, OpenCode, or NPC Shell) can use your local models instead of paying for a cloud provider. The Ollama chat agent doesn’t compete with interactive CLIs like Gemini, Copilot, and Warp (see below); it’s more of a straight REPL.

The big trade-off is performance. You are limited by your own hardware, and running the larger models requires powerful (preferably Nvidia) GPUs. The choice comes down to power versus privacy: You get total control and security, but you’re responsible for bringing the horsepower. (And, in case you don’t know, fancy GPUs are expensive—even provisioning a decent one on the cloud can cost hundreds of dollars per month.)

Aider

Aider is a “pair-programming” tool that can use various providers as the AI back end, including a locally running instance of Ollama (with its variety of LLM choices). Typically, you would connect to an OpenRouter account to provide access to any number of LLMs, including free-tier ones.

A screenshot of the Aider CLI.
The agentic layer.

Matthew Tyson

Once connected, you tell Aider what model you want to use when launching it; e.g., aider --model ollama_chat/llama3.2:3b. That will launch an interactive prompt relying on the model for its brains. But Aider gives you agentic power and will take action for you, not just provide informative responses.

Aider tries to maintain a contextual understanding of your filesystem, the project files, and what you are working on. It also is designed to understand git, suggesting that you init a git project, committing as you go, and providing sensible commit messages. The core capability is highly influenced by the LLM engine, which you provide.

Aider is something like using Ollama but at a higher level. It is controlled by the developer; provides a great abstraction layer with multiple model options; and layers on a good deal of ability to take action. (It took me some wrangling with the Python package installations to get everything working in Aider, but I have bad pip karma.)

Aider is something like Roo Code, but for the terminal, adding project-awareness for any number of models. If you give it a good model engine, it will do almost everything that the Gemini or Copilot CLI does, but with more flexibility. The biggest drawback compared to those tools is probably having to do more manual asset management (like using the /add command to bring files into context).

AI Shell

Built by the folks at Builder.io, AI Shell focuses on creating effective shell commands from your prompts. Compared to the Gemini and Copilot CLIs, it’s more of a quick-and-easy utility tool; something to keep the terminal’s power handy without having to type out commands.

A screenshot of AI Shell.
The natural-language commander.

Matthew Tyson

AI Shell will take your desired goal (e.g., “$ ai find the process using the most memory right now and kill it”) and offer working shell commands in response. It will then ask if you want to run it, edit it, copy, or cancel the command. This makes AI Shell a simple place to drop into, as needed, from the normal command prompt. You just type “ai” followed by whatever you are trying to do.

Although it’s a handy tool, the current version of AI Shell can only use an OpenAI API, which is a significant drawback. There is no way to run AI Shell in a free tier, since OpenAI no longer offers free API access.

Warp

Warp started life as a full-featured terminal app. Its killer feature is that it gives you all the text and control niceties in a cross-platform, portable setup. Unlike the Gemini and Copilot CLI tools, which are agents that run inside an existing shell, Warp is a full-fledged, standalone GUI application with AI integrated at its core.

A screenshot of the Warp CLI.
The terminal app, reimagined with AI.

Matthew Tyson

Warp is a Rust-based, modern terminal that completely reimagines the user experience, moving away from the traditional text stream to a more structured, app-like interface.

Warp’s AI is not a separate prompt but is directly integrated with the input block. It has two basic modes: The first is to type # followed by a natural language query (e.g., “# find all files over 10 megs in this dir”), which Warp AI will translate into the correct command.

The second mode is the more complex, multistep agent mode (“define a cat-related non-blocking endpoint using netty”), which you enter with Ctrl-space.

An interesting feature, Warp Workflows are parameterized commands that you can save and share. You can ask the AI to generate a workflow for a complex task (like a multistage git rebase) and then supply it with arguments at runtime.

The main drawback for some CLI purists is that Warp is not a traditional CLI. It’s a block-based editor, which treats inputs and outputs as distinct chunks. That can take some getting used to—though some find it an improvement. In this regard, Warp breaks compatibility with many traditional terminal multiplexers like tmux/screen. Also, its AI features are tied to user accounts and a cloud back end, which likely raises privacy and offline-usability concerns for some developers.

All that said, Warp is a compelling AI terminal offering, especially if you’re looking for something different in your CLI. Aside from its AI facet, Warp is somewhere between a conventional shell (like Bash) and a GUI.

Conclusion

If you currently don’t like using a shell, these tools will make your life much easier. You will be able to do many of the things that previously were painful enough to make you think, “there must be a better way.” Now there is, and you can monitor processes, sniff TCP packets, and manage perms like a pro.

If you, like me, do like the shell, then these tools will make the experience even better. They give you superpowers, allowing you to romp more freely across the machine. If you tend (like I do) to do much of your coding from the command line, checking out these tools is an obvious move.

Each tool has its own idiosyncrasies of installation, dependencies, model access, and key management. A bit of wrestling at first is normal—which most command-line jockeys won’t mind.

(image/jpeg; 22.45 MB)

Deno adds tool to run NPM and JSR binaries 23 Dec 2025, 5:16 pm

Deno 2.6, the latest version of the TypeScript, JavaScript, and WebAssembly runtime, adds a tool, called dx, to run binaries from NPM and JSR (JavaScript Registry) packages.

The update to the Node.js rival was announced December 10; installation instructions can be found at docs.deno.com. Current users can upgrade by running the deno upgrade command in their terminal.

In Deno 2.6, dx is an equivalent to the npx command. With dx, users should find it easier to run package binaries in a familiar fashion, according to Deno producer Deno Land. Developers can enjoy the convenience of npxwhile leveraging Deno’s robust security model and performance optimizations, Deno Land said.

Also featured in Deno 2.6 is more granular control over permissions, with --ignore-read and --ignore-env flags for selectively ignoring certain file reads or environment variable access. Instead of throwing a NotCapable error, users can direct Deno to return a NotFounderror and undefined respectively.

Deno 2.6 also integrates tsgo, an experimental type checker for TypeScript written in Go. This type checker is billed as being significantly faster than the previous implementation, which was written in TypeScript.

Other new capabilities and improvements in Deno 2.6:

  • For dependency management, developers can control the minimum age of dependencies, ensuring that a project only uses dependencies that have been vetted. This helps reduce the risk of using newly published packages that may contain malware or breaking changes shortly after release.
  • A deno audit subcommand helps identify security vulnerabilities in dependencies by checking the GitHub CVE database. This command scans and generates a report for both JSR and NPM packages.
  • The--lockfile-onlyflag for deno install allows developers to update a lockfile without downloading or installing the actual packages. This is particularly useful in continuous integration environments where users want to verify dependency changes without modifying their node_modules or cache.
  • The deno approve-scripts flag replaces the deno install --allow-scripts flag, enabling more ergonomic and granular control over which packages can run these scripts.
  • Deno’s Node.js compatibility layer continues to mature in Deno 2.6, with improvements across file operations, cryptography, process management, and database APIs, according to Deno Land.

(image/jpeg; 14.91 MB)

Page processed in 0.287 seconds.

Powered by SimplePie 1.3, Build 20180209064251. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.