Teradata unveils enterprise AgentStack to push AI agents into production 28 Jan 2026, 1:17 am

Teradata has expanded the agent-building capabilities it launched last year into a full-blown toolkit, which it says will help enterprises address the challenge of moving AI agents beyond pilots into production-grade deployments.

Branded as Enterprise AgentStack, the expanded toolkit layers AgentEngine and AgentOps onto Teradata’s existing Agent Builder, which includes a user interface for building agents with the help of third-party frameworks such as LangGraph, and a context intelligence capability.

While AgentEngine is an execution environment for deploying agents across hybrid infrastructures, AgentOps is a unified interface for centralized discovery, monitoring, and lifecycle management of agents across a given enterprise.

The AgentEngine is a critical piece of Enterprise AgentStack as it sits between agent design and real-world operations, saidHyperFRAME Research’s practice leader of AI stack Stephanie Walter.

“Without an execution engine, enterprises often rely on custom glue code to coordinate agents. The Agent Engine standardizes execution behavior and gives enterprises a way to understand agent performance, reliability, and risk at scale,” Walter said, adding that AgentEngine-like capabilities are what enterprises need for moving agents or agentic systems into production.

However, analysts say Teradata’s approach to enterprise agent adoption differs markedly from that of rivals such as Databricks and Snowflake.

While Snowflake has been leaning on its Cortex and Native App Framework to let enterprises build AI-powered applications and agents closer to governed data, Databricks has been focusing on agent workflows through Mosaic AI, emphasizing model development, orchestration, and evaluation tied to its lakehouse architecture, Robert Kramer, principal analyst at Moor Insights and Strategy, said.

Seconding Kramer, Walter pointed out that Teradata’s differentiation lies in positioning Enterprise AgentStack as a vendor-agnostic execution and operations layer designed to work across hybrid environments, rather than anchoring agents tightly to a single cloud or data platform.

That positioning can be attributed to Teradata’s reliance on third-party frameworks such as Karini.ai, Flowise, CrewAI, and LangGraph, which give enterprises and their developers flexibility to evolve their agent architectures over time without being locked onto platforms from Snowflake and Databricks that tend to optimize for end-to-end control within their own environments, Walter added.

However, the analyst cautioned that, although Enterprise AgentStack’s architecture aligns well with enterprise needs, its litmus test will be to continue maintaining deep integrations with third-party frameworks.

“Customers will want to see concrete evidence of AgentStack supporting complex, long-running, multi-agent deployments in production,” Walter said.

Kramer, too, pointed out that enterprises and developers should try to understand the depth of usability before implementation.

“They need to check how easy it is to apply policies consistently, run evaluations after changes, trace failures end-to-end, and integrate with existing security and compliance tools. Openness only works if it doesn’t shift complexity back onto the customer,” Kramer said.

Enterprise AgentStack is expected to be made available in private preview on the cloud and on-prem between April and June this year.

(image/jpeg; 2.54 MB)

Is code a cow path? 28 Jan 2026, 1:00 am

When the automobile was first invented, many looked like—and were indeed called—horseless carriages.

Early websites for newspapers were laid out just like the paper versions. They still are to some extent.

Our computers have “desktops” and “files”—just like an office from the 1950s.

It is even said that the width of our railroad tracks is a direct result of the width between the wheels of a Roman chariot (though that claim is somewhat dubious).

This phenomenon—using new technology in the old technology way—is often called “paving the cow paths.” Cows are not known for understanding that the shortest distance between two points is a straight line, and it doesn’t always make sense to put pavement down where they wear tracks in the fields.

This notion was formalized by the great Peter Drucker, who said “There is surely nothing quite so useless as doing with great efficiency what should not be done at all.” 

All of this got me thinking about AI writing all of our code now. 

Is code necessary?

We developers spend years honing our craft. We read books about clean code, write blogs about the proper way to structure code, and tell ourselves, rightly, that code is meant to be read and maintained as much as it is to be written.

AI coding could change all of that. Your coding agent doesn’t need to see all that great stuff that we humans do. Comments, good variable names, cleanly constructed classes, and the like are all things we do for humans. Shoot, code itself is a human construct, a prop we created to make it easier to reason about the software we design and build. 

I was recently using Claude Code to build an application, and I insisted that he (I can’t help but think of Claude Code as a person) code against interfaces and not implementations, that he design everything with small classes that do one thing, etc. I wanted the code Claude created to be what we developers always shoot for—well-written, easy to maintain, and decoupled. You know the drill. 

And then it occurred to me—are we all merely paving cowpaths? Should agentic AI be concerned with the same things we humans care about when constructing software? Claude wrote comments all over the place—that was for me, not for him. He wrote the code the way that I wanted him to. Does he have a better idea about how to make the software work? 

For that matter, who needs code anyway? It’s not inconceivable that coding agents will eventually just render machine code—i.e., they could compile your native language directly into a binary. (That’s one way to end the language wars!)

Right now we have the process of writing code, reviewing it, compiling it, and running it. We’ve added an extra layer—explaining our intentions to an agent that translates them into code. If that is a cow path—and the more I think about it, the more it does seem a rather indirect way to get from point A to point B—then what will be the most direct way to get from here to there?

The straightest path to software

Every day, our coding agents get better. The better they get, the more we’ll trust them, and the less we’ll need to review their code before committing it. Someday, we might expect, agents will review the code that agents write. What happens to code when humans eventually don’t even read what the agents write anymore? Will code even matter at that point?

Will we write unit tests—or have our agents write unit tests—only for our benefit? Will coding agents even need tests? It’s not hard to imagine a future where agents just test their output automatically, or build things that just work without testing because they can “see” what the outcome of the tests would be. 

Ask yourself this: When is the last time that you checked the output of your compiler? Can you even understand the output of your compiler? Some of you can, sure. But be honest, most of you can’t. 

Maybe AI will come up with a way of designing software based on our human language inputs that is more direct and to the point—a way that we haven’t conceived of yet. Code may stop being the primary representation of software.

Maybe code will become something that, as Peter Drucker put it, should not be done at all.

(image/jpeg; 0.22 MB)

CPython vs. PyPy: Which Python runtime has the better JIT? 28 Jan 2026, 1:00 am

PyPy, an alternative runtime for Python, uses a specially created JIT compiler to yield potentially massive speedups over CPython, the conventional Python runtime.

But PyPy’s exemplary performance has often come at the cost of compatibility with the rest of the Python ecosystem, particularly C extensions. And while those issues are improving, the PyPy runtime itself often lags in keeping up to date with the latest Python releases.

Meanwhile, the most recent releases of CPython included the first editions of a JIT compiler native to CPython. The long-term promise there is better performance, and in some workloads, you can already see significant improvements. CPython also has a new alternative build that eliminates the GIL to allow fully free-threaded operations—another avenue of significant performance gains.

Could CPython be on track to displace PyPy for better performance? We ran PyPy and the latest JIT-enabled and no-GIL CPython builds side by side on the same benchmarks, with intriguing results.

PyPy still kills it at raw math

CPython has always performed poorly in simple numerical operations, due to all the indirection and abstraction required. There’s no such thing in CPython as a primitive, machine-level integer, for instance.

As a result, benchmarks like this one tend to perform quite poorly in CPython:


def transform(n: int):
    q = 0
    for x in range(0, n * 500):
        q += x
    return q


def main():
    return [transform(x) for x in range(1000)]

main()

On a Ryzen 5 3600 with six cores, Python 3.14 takes about 9 seconds to run this benchmark. But PyPy chews through it in around 0.2 seconds.

This also isn’t the kind of workload that benefits from Python’s JIT, at least not yet. With the JIT enabled in 3.14, the time drops only slightly, to around 8 seconds.

But what happens if we use a multi-threaded version of the same code, and throw the no-GIL version of Python at it?


def transform(n: int):
    q = 0
    for x in range(0, n * 500):
        q += x
    return q


def main():
    result = []
    with ThreadPoolExecutor() as pool:
        for x in range(1000):
            result.append(pool.submit(transform, x))
    return [_.result() for _ in result]

main()

The difference is dramatic, to say the least. Python 3.14 completes this job in 1.7 seconds. Still not the sub-second results of PyPy, but a big enough jump to make using threads and no-GIL worth it.

What about PyPy and threading? Ironically, running the multithreaded version on PyPy slows it down drastically, with the job taking around 2.1 seconds to run. Blame that on PyPy still having a GIL-like locking mechanism, and therefore no full parallelism across threads. Its JIT compilation is best exploited by running everything in a single thread.

If you’re wondering if swapping a process pool for a thread pool would help, the answer is, not really. A process pool version of the above does speed things up a bit—1.3 seconds on PyPy—but process pools and multiprocessing on PyPy are not as optimized as they are in CPython.

To recap: for “vanilla” Python 3.14:

  • No JIT, GIL: 9 seconds
  • With JIT, GIL: 8 seconds
  • No JIT, no-GIL: 9.5 seconds

The no-GIL build is still slightly slower than the regular build for single-threaded operations. The JIT helps a little here, but not much.

Now, consider the same breakdown for Python 3.14 and a process pool:

  • No JIT, GIL: 1.75 seconds
  • With JIT, GIL: 1.5 seconds
  • No JIT, no-GIL: 2 seconds

How about for Python 3.14, using other forms of the script?

  • Threaded version with no-GIL: 1.7 seconds
  • Multiprocessing version with GIL: 2.3 seconds
  • Multiprocessing version with GIL and JIT: 2.4 seconds
  • Multiprocessing version with no-GIL: 2.1 seconds

And here’s a summary of how PyPy fares:

  • Single-threaded script: 0.2 seconds
  • Multithreaded script: 2.1 seconds
  • Multiprocessing script: 1.3 seconds

The n-body problem

Another common math-heavy benchmark vanilla Python is notoriously bad at is the “n-body” benchmark. This is also the kind of problem that’s hard to speed up by using parallel computation. It is possible, just not simple, so the easiest implementations are single-threaded.

If I run the n-body benchmark for 1,000,000 repetitions, I get the following results:

  • Python 3.14, no JIT: 7.1 seconds
  • Python 3.14, JIT: 5.7 seconds
  • Python 3.15a4, no JIT: 7.6 seconds
  • Python 3.15a4, JIT: 4.2 seconds

That’s an impressive showing for the JIT-capable editions of Python. But then we see that PyPy chews through the same benchmark in 0.7 seconds—as-is.

Computing pi

Sometimes even PyPy struggles with math-heavy Python programs. Consider this naive implementation to calculate digits of pi. This is another example of a task that can’t be parallelized much, if at all, so we’re using a single-threaded test.

When run for 20,000 digits, here’s what came out:

  • Python 3.14, no JIT: 13.6 seconds
  • Python 3.14, JIT: 13.5 seconds
  • Python 3.15, no JIT: 13.7 seconds
  • Python 3.15, JIT: 13.5 seconds
  • PyPy: 19.1 seconds

It’s uncommon, but hardly impossible, for PyPy’s performance to be worse than regular Python’s. What’s surprising is to see it happen in a scenario where you’d expect PyPy to excel.

CPython is getting competitive for other kinds of work

Another benchmark I’ve used often with Python is a variant of the Google n-gram benchmark, which processes a multi-megabyte CSV file and generates some statistics about it. That makes it more I/O-bound than the previous benchmarks, which were more CPU-bound, but it’s still possible to use it for useful information about the speed of the runtime.

I’ve written three incarnations of this benchmark: single-threaded, multi-threaded, and multi-process. Here’s the single-threaded version:


import collections
import time
import gc
import sys
try:
    print ("JIT enabled:", sys._jit.is_enabled())
except Exception:
    ...

def main():
    line: str
    fields: list[str]
    sum_by_key: dict = {}

    start = time.time()

    with open("ngrams.tsv", encoding="utf-8", buffering=2 

Here’s how Python 3.14 handles this benchmark with different versions of the script:

  • Single-threaded, GIL: 4.2 seconds
  • Single-threaded, JIT, GIL: 3.7 seconds
  • Multi-threaded, no-GIL: 1.05 seconds
  • Multi-processing, GIL: 2.42 seconds
  • Multi-processing, JIT, GIL: 2.4 seconds
  • Multi-processing, no-GIL: 2.1 seconds

And here’s the same picture with PyPy:

  • Single-threaded: 2.75 seconds
  • Multi-threaded: 14.3 seconds (not a typo!)
  • Multi-processing: 8.7 seconds

In other words, for this scenario, the CPython no-GIL multithreaded version beats even PyPy at its most optimal. As yet, there is no build of CPython that enables the JIT and uses free threading, but such a version is not far away and could easily change the picture even further.

Conclusion

In sum, PyPy running the most basic, unoptimized version of a math-heavy script still outperforms CPython. But CPython gets drastic relative improvements from using free-threading and even multiprocessing, where possible.

While PyPy cannot take advantage of those built-in features, its base speed is fast enough that using threading or multiprocessing for some jobs isn’t really required. For instance, the n-body problem is hard to parallelize well, and computing pi can hardly be parallelized at all, so it’s a boon to be able to run single-threaded versions of those algorithms fast.

What stands out most from these tests is that PyPy’s benefits are not universal, or even consistent. They vary widely depending on the scenario. Even within the same program, there can be a tremendous variety of scenarios. Some programs can run tremendously fast with PyPy, but it’s not easy to tell in advance which ones. The only way to know is to benchmark your application.

Something else to note is that one of the major avenues toward better performance and parallelism for Python generally—free-threading—isn’t currently available for PyPy. Multiprocessing doesn’t work well in PyPy either, due to it having a much slower data serialization mechanism between processes than CPython does.

As fast as PyPy can be, the benchmarks here show the benefits of true parallelism with threads in some scenarios. PyPy’s developers might find a way to implement that in time, but it’s unlikely they’d be able to do it by directly repurposing what CPython already has, given how different PyPy and CPython are under the hood.

(image/jpeg; 8.16 MB)

Gemini Flash model gets visual reasoning capability 27 Jan 2026, 7:20 pm

Google has added an Agentic Vision capability to its Gemini 3 Flash model, which the company said combines visual reasoning with code execution to ground answers in visual evidence. The capability fundamentally changes how AI models process images, according to Google.

Introduced January 27, Agentic Vision is available via the Gemini API in the Google AI Studio development tool and Vertex AI in the Gemini app.

Agentic Vision in Gemini Flash converts image understanding from a static act into an agentic process, Google said. By combining visual reasoning andcode execution, the model formulates plans to zoom in, inspect, and manipulate images step-by-step. Until now, multimodal models typically processed the world in a single, static glance. If they missed a small detail—like a serial number or a distant sign—they were forced to guess, Google said. By contrast, Agentic Vision converts image understanding into an active investigation, introducing an agentic, “think, act, observe” loop into image understanding tasks, the company said.

Agentic Vision allows a model to interact with its environment by annotating images. Instead of just describing what it sees, Gemini 3 Flash can execute code to draw directly on the canvas to ground reasoning. Also, Agentic Vision can parse high-density tables and execute Python code to visualize findings. Future plans for Agentic Vision including adding more implicit code-driven behaviors, equipping Gemini models with more tools, and delivering the capability in more model sizes, extending it beyond Flash.

(image/jpeg; 9.21 MB)

OpenSilver 3.3 runs Blazor components inside XAML apps 27 Jan 2026, 2:37 pm

Userware has released OpenSilver 3.3, an update to the open-source framework for building cross-platform applications using C# and XAML. OpenSilver 3.3 lets Blazor components for web development run directly inside XAML applications, streamlining the process of running these components.

Userware unveiled OpenSilver 3.3 on January 27. OpenSilver SDKs for Microsoft’s Visual Studio and Visual Studio Code can be downloaded from opensilver.net.

With the Blazor boost in OpenSilver 3.3, Blazor components run directly inside an XAML visual tree, sharing the same DOM and the same runtime. Developers can drop a MudBlazor data grid, a DevExpress rich text editor, or any Blazor component directly into their XAML application without requiring JavaScript bridges or interop wrappers, according to Userware. Because OpenSilver runs on WebAssembly for browsers and .NET MAUI Hybrid for native apps, the same code deploys to Web, iOS, Android, Windows, macOS, and Linux.

The company did warn, though, that Razor code embedded inside XAML will currently show errors at design time but will compile and run correctly. Workarounds include wrapping the Razor code in CDATA, using separate .razor files, or filtering to “Build Only” errors.

Open source OpenSilver is a replacement for Microsoft Silverlight, a rich Internet application framework that was discontinued in 2021 and is no longer supported. For developers maintaining a Silverlight or Windows Presentation Foundation app, Blazor integration offers a way to modernize incrementally. Users can identify controls that need updating, such as an old data grid or a basic text editor, and replace them with modern Blazor equivalents.

    (image/jpeg; 0.24 MB)

    Anthropic integrates third‑party apps into Claude, reshaping enterprise AI workflows 27 Jan 2026, 4:47 am

    Anthropic has added a new capability inside its generative AI-based chatbot Claude that will allow users to directly access applications inside the Claude interface.

    The new capability, termed as interactive apps, is based on a new extension — MCP Apps — of the open source Model Context Protocol (MCP).

    MCP Apps, first proposed in November and subsequently developed with the help of OpenAI’s Apps SDK, expands MCP’s capabilities to allow tools to be included as interactive UI components directly inside the chat window as part of a conversation, instead of just text or query results.

    “Claude already connects to your tools and takes actions on your behalf. Now those tools show up right in the conversation, so you can see what’s happening and collaborate in real time,” the company wrote in a blog post.

    Currently, the list of interactive apps is limited to Amplitude, Asana, Box, Canva, Clay, Figma, Hex, monday.com, and Slack. Agentforce 360 from Salesforce will be added soon, the company said, adding that the list will be expanded to include other applications.

    Easier integration into workflows

    Claude’s evolution from a chatbot into an integrated execution environment for applications is expected to help enterprises move agentic AI systems toward broader, production-scale deployments, analysts say.

    The ability to access applications directly within Claude’s interface lowers integration friction, making it simpler for enterprises to deploy Claude as an agentic system across workflows, said Akshat Tyagi, associate practice leader at HFS Research.

    “Most pilots fail in production because agents are unpredictable, hard to govern, and difficult to integrate into real workflows. Claude’s interactive apps change that,” Tyagi noted.

    For enterprise developers, the reduction in integration complexity could also translate into faster iteration cycles and higher productivity, according to Forrester Principal Analyst Charlie Dai.

    The approach provides a more straightforward path to building multi-step, output-producing workflows without the need for extensive setup or custom plumbing, Dai said.

    Targeting more productivity

    According to Tyagi, productivity gains will not only be limited to developers, but business teams will stand to benefit as well, and teams don’t need to move between systems, copy outputs, or translate AI responses into actions due to the integration of multiple applications within Claude’s interface.

    MCP Apps and Anthropic’s broader approach to productivity underscore a widening architectural split in the AI landscape, according to Avasant research director Chandrika Dutt, even as vendors pursue the same underlying goal of boosting productivity by embedding agents directly into active workflows.

    While Anthropic and OpenAI are building models in which applications run inside the AI interface, other big tech vendors, including Microsoft and Google, are focused on embedding AI directly into their productivity suites, such as Microsoft 365 Copilot and Google Gemini Workspace, Dutt said.

    “These strategies represent two paths toward a similar end state. As enterprise demand for agent-driven execution grows, in a separate layer, it is likely that these approaches will converge, with Microsoft and Google eventually supporting more interactive, app-level execution models as well,” Dutt added.

    Further, the analyst pointed out that Claude’s new interactive apps will also facilitate easier governance and trust — key facets of scaling agentic AI systems in an enterprise.

    “Claude operates on the same live screen, data, and configuration the user is viewing, allowing users to see exactly what changes are made, where they are applied, and how they affect tasks, files, or design elements in real time, without having to cross-check across different tools,” Dutt said.

    Increased burden of due diligence

    However, analysts cautioned that the unified nature of the interface may exponentially increase the risk from a security standpoint, especially as more applications are added.

    More so because running UIs of these interactive applications means running code that enterprises didn’t write themselves and that will increase the burden of due diligence before connecting to an interactive application, said Abhishek Sengupta, practice director at Everest Group.

    MCP Apps itself, though, offers several security features, in the form of sandboxing UIs, the ability for enterprises to review all templates before rendering, and audit messages between the application server and their Claude client.

    The interactive app feature is currently available to all paid subscribers of Claude, including Pro, Max, Team, and Enterprise subscribers.

    It is expected to be added to Claude Cowork soon, the company said.

    (image/jpeg; 2.6 MB)

    Alibaba’s Qwen3-Max-Thinking expands enterprise AI model choices 27 Jan 2026, 3:25 am

    Alibaba Cloud’s latest AI model, Qwen3-Max-Thinking, is staking a claim as one of the world’s most advanced reasoning engines after posting benchmark results that delivered competitive results against leading models from Google and OpenAI.

    In a blog post, Alibaba said the model was trained using expanded capacity and large-scale computing resources, including reinforcement learning, which led to improvements in factual accuracy, reasoning, instruction following, alignment with human preferences, and agent-style capabilities.

    “On 19 established benchmarks, it demonstrates performance comparable to leading models such as GPT-5.2-Thinking, Claude-Opus-4.5, and Gemini 3 Pro,” the company said.

    Alibaba said it has added two key upgrades to Qwen3-Max-Thinking: adaptive tool use that lets the model retrieve information or run code as needed, and test-time scaling techniques that it says deliver stronger reasoning performance than Google’s Gemini 3 Pro on selected benchmarks.

    Analysts offer a cautious approach to the announcement. Benchmark results evaluate the performance under specific conditions, “but enterprise IT leaders may be deploying foundation models across various use cases under different IT environments,” said Lian Jye Su, chief analyst at Omdia.

    “As such, while Qwen models have shown themselves to be legitimate alternatives to Western mainstream models, their performance still needs to be evaluated in domain-specific tasks, along with their adaptability and customization,” Su said. “It is also critical to assess scalability and efficiency when these models run on Alibaba Cloud infrastructure, which operates differently from Google Cloud Platform and Azure.”

    More options for vendor diversification

    The launch of Qwen3-Max-Thinking is likely to add momentum to AI model diversification strategies within enterprises.

    “Now that Qwen models have demonstrated themselves as legit alternatives to Western models, CIOs should consider them when evaluating pricing models, licensing terms, and the total cost of ownership of their AI projects,” Su said.  “Running on Alibaba Cloud, the cost of ownership is likely more efficient, especially in the Asia Pacific, which is great news for global companies looking to make inroads into the Chinese market or China-friendly markets.”

    Competitive reasoning scores from Qwen models expand the pool of viable suppliers, making diversification more attractive, according to Charlie Dai, principal analyst at Forrester.

    “For CIOs managing digital sovereignty and cost efficiency, strong alternatives change the strategic equation, and rising model parity increases the viability of mixed portfolios that balance sovereignty, compliance, and innovation speed,” Dai said.

    Others said benchmark momentum is also influencing how CIOs think about multi-model strategies.

    “These benchmarks are a good yardstick not just to monitor performance, but also to assess which companies are serious and consistent in investing in foundation model capabilities and adoption,” said Neil Shah, VP for research at Counterpoint Research. “This is shaping how CIOs look at diversifying to multi-model strategies to avoid putting all their eggs in one basket, while weighing performance, cost efficiency, and geopolitical headwinds.”

    That said, CIOs will need to consider the availability of these models outside of APAC alongside other factors such as export controls and compliance with local regulations.

    “The bigger question is how CIOs adopt US versus non-US models based on AI use cases,” Shah said. “Where reliability and compliance are critical, enterprises, especially in Western markets, will favor proprietary US models, while highly capable Chinese models may be used for non-critical workloads.”

    More governance and compliance challenges

    Geopolitical tensions are adding another layer of complexity for enterprises evaluating models such as Qwen3-Max-Thinking. According to Dai, this requires closer scrutiny of operational details, particularly around system logs, model update mechanisms, and how data moves across borders.

    He added that enterprise evaluations should go beyond performance testing to include red-team exercises, strict isolation of sensitive data, and alignment with internal risk and compliance frameworks.

    “Enterprises evaluating Alibaba Cloud-hosted models need to scrutinize how AI safety controls, data isolation, and auditability are implemented in practice, not just on paper,” Su said. “While most cloud providers now offer in-region or on-premise deployments to address sovereignty rules, CIOs still need to assess whether those controls meet internal risk thresholds, particularly when sensitive IP or regulated data is involved.”

    (image/jpeg; 2.39 MB)

    The private cloud returns for AI workloads 27 Jan 2026, 1:00 am

    A North American manufacturer spent most of 2024 and early 2025 doing what many innovative enterprises did: aggressively standardizing on the public cloud by using data lakes, analytics, CI/CD, and even a good chunk of ERP integration. The board liked the narrative because it sounded like simplification, and simplification sounded like savings. Then generative AI arrived, not as a lab toy but as a mandate. “Put copilots everywhere,” leadership said. “Start with maintenance, then procurement, then the call center, then engineering change orders.”

    The first pilot went live quickly using a managed model endpoint and a retrieval layer in the same public cloud region as their data platform. It worked and everyone cheered. Then invoices started arriving. Token usage, vector storage, accelerated compute, egress for integration flows, premium logging, premium guardrails. Meanwhile, a series of cloud service disruptions forced the team into uncomfortable conversations about blast radius, dependency chains, and what “high availability” really means when your application is a tapestry of managed services.

    The final straw wasn’t just cost or downtime; it was proximity. The most valuable AI use cases were those closest to people who build and fix things. Those people lived near manufacturing plants with strict network boundaries, latency constraints, and operational rhythms that don’t tolerate “the provider is investigating.” Within six months, the company began shifting its AI inference and retrieval workloads to a private cloud located near its factories, while keeping model training bursts in the public cloud when it made sense. It wasn’t a retreat. It was a rebalancing.

    AI changed the math

    For a decade, private cloud was often framed as a stepping-stone or, worse, a polite way to describe legacy virtualization with a portal. In 2026, AI is forcing a more serious reappraisal. Not because public cloud suddenly stopped working, but because the workload profile of AI is different from the workload profile of “move my app server and my database.”

    AI workloads are spiky, GPU-hungry, and brutally sensitive to inefficient architecture. They also tend to multiply. A single assistant becomes dozens of specialized agents. A single model becomes an ensemble. A single department becomes every department. AI spreads because the marginal utility of another use case is high, but the marginal cost can be even higher if you don’t control the fundamentals.

    Enterprises are noticing that the promise of elasticity is not the same thing as cost control. Yes, public cloud can scale on demand. But AI often scales and stays scaled because the business immediately learns to depend on it. Once a copilot is embedded into an intake workflow, a quality inspection process, or a claims pipeline, turning it off is not a realistic lever. That’s when predictable capacity, amortized over time, becomes financially attractive again.

    Cost is no longer a rounding error

    AI economics are exposing a gap between what people think the cloud costs and what the cloud actually costs. When you run traditional systems, you can hide inefficiencies behind reserved instances, right-sizing tools, and a few architectural tweaks. With AI, waste has sharp edges. Overprovision GPUs and you burn money. Underprovision and your users experience delays that make the system feel broken. Keep everything in a premium managed stack, and you may pay for convenience forever with little ability to negotiate the unit economics.

    Private clouds are attractive here for a simple reason: Enterprises can choose where to standardize and where to differentiate. They can invest in a consistent GPU platform for inference, cache frequently used embeddings locally, and reduce the constant tax of per-request pricing. They can still use public cloud for experimentation and burst training, but they don’t have to treat every inference call like a metered microtransaction.

    Outages are changing risk discussions

    Most enterprises know complex systems fail. The outages in 2025 did not show that cloud is unreliable, but they did reveal that relying on many interconnected services leads to correlated failure. When your AI experience depends on identity services, model endpoints, vector databases, event streaming, observability pipelines, and network interconnects, your uptime is the product of many moving parts. The more composable the architecture, the more failure points.

    Private cloud won’t magically eliminate outages, but it does shrink the dependency surface area and give teams more control over change management. Enterprises that run AI close to core processes often prefer controlled upgrades, conservative patching windows, and the ability to isolate failures to a smaller domain. That’s not nostalgia; it’s operational maturity.

    Proximity matters

    The most important driver I’m seeing in 2026 is the desire to keep AI systems close to the processes and people who use them. That means low-latency access to operational data, tight integration with Internet of Things and edge environments, and governance that aligns with how work actually happens. A chatbot in a browser is easy. An AI system that helps a technician diagnose a machine in real time on a constrained network is a different game.

    There’s also a data gravity issue that rarely receives the attention it deserves. AI systems don’t just read data; they generate it. Feedback loops, human ratings, exception handling, and audit trails become first-class assets. Keeping those loops close to the business domains that own them reduces friction and improves accountability. When AI becomes a daily instrument panel for the enterprise, architecture must serve the operators, not just the developers.

    Five steps for private cloud AI

    First, treat unit economics as a design requirement, not a postmortem. Model the cost per transaction, per employee, or per workflow step, and decide which are fixed costs and which are variable, because AI that works but is unaffordable at scale is just a demo with better lighting.

    Second, design for resilience by reducing dependency chains and clarifying failure domains. A private cloud can help, but only if you deliberately choose fewer, more reliable components, build sensible fallbacks, and test degraded modes so the business can keep moving when a component fails.

    Third, plan for data locality and the feedback loop as carefully as you plan for compute. Your retrieval layer, embedding life cycle, fine-tuning data sets, and audit logs will become strategic assets; place them where you can govern, secure, and access them with minimal friction across the teams that improve the system.

    Fourth, treat GPUs and accelerators as a shared enterprise platform with precise scheduling, quotas, and chargeback policies. If you don’t operationalize accelerator capacity, it will be captured by the teams who are the loudest but not necessarily the most critical. The resulting chaos will appear to be a technology problem when it’s really a governance problem.

    Fifth, make security and compliance practical for builders, not performative for documents. That means identity boundaries that align with real roles, automated policy enforcement in pipelines, strong isolation for sensitive workloads, and a risk management approach that recognizes that AI is software but also something new: software that talks, recommends, and occasionally hallucinates.

    (image/jpeg; 14.94 MB)

    What is the future for MySQL? 27 Jan 2026, 1:00 am

    In May of 2025, MySQL celebrated its 30th anniversary. Not many technology projects go strong for three decades, let alone at the level of use that MySQL enjoys. MySQL is listed at #2 on the DB-Engines ranking, and it is listed as the most deployed relational database by technology installation tracker 6sense.

    Yet for all its use, MySQL is seen as taking a back seat to PostgreSQL. Checking the Stack Overflow Developer Survey for 2025, 55.6% of developers use PostgreSQL compared to 40.5% that use MySQL. And when you look at the most admired technologies, PostgreSQL is at 46.5% while MySQL languishes at 20.5%. Whereas developers clearly think highly of PostgreSQL, they do not view MySQL as positively.

    Both databases are excellent options. PostgreSQL is a reliable, scalable, and functionality-rich database, but it can be beyond the needs of simple application projects. MySQL is fast to deploy, easy to use, and both scalable and effective when implemented in the right way. But PostgreSQL has fans and supporters where MySQL does not.

    This is not a question of age. PostgreSQL is older than MySQL, as development work started in 1986, though the first version of PostgreSQL wasn’t released until 1995. What is different is that the open source community is committed to PostgreSQL and celebrates the development and diversity taking place. The sheer number of companies and contributors around PostgreSQL makes it easier to adopt.

    In comparison, the MySQL community is … quiet. Although Oracle has been a great steward for MySQL since the company acquired Sun in 2010, the open source MySQL Community Edition has received less love and attention than the paid MySQL Enterprise Edition or cloud versions, at least in terms of adding innovative new features.

    For example, while Oracle’s MySQL HeatWave boasts innovations like vector search, which is essential for AI projects, MySQL Community Edition lacks this capability. Although MySQL Community Edition can store vector data, it cannot perform an index-based search or approximate nearest neighbour search for that data.

    A shock to the community

    In other open source communities, we have seen a “big shock” that led to change. For example, when Redis changed its software license to be “source available,” the community created Valkey as an alternative. When HashiCorp changed its license for Terraform, it led to the creation of OpenTofu. These projects joined open source foundations and saw an increase in the number of companies that provided contributions, support, and maintenance around the code.

    Having avoided such a big shock to the system, the MySQL community has been in stasis for years, continuing with the status quo. Yet in an industry where technology companies are like sharks, always moving forward to avoid death at the hands of the competition, this stasis is detrimental to the community and to the project as a whole.

    However, a big shock may have finally arrived. The loss of many Oracle staffers has impacted the speed of MySQL development. Looking at the number of bug fixes released in each quarterly update, the number of issues fixed has dropped to a third of what it was previously. Compared to Q1 2025 (65 fixes) and Q2 2025 (again, 65 fixes), MySQL 8.4.7 saw just 21 bug fixes released. While straight bug numbers are not a perfectly representative metric, the drop itself does show how much emphasis has been taken off MySQL.

    In response, companies that are behind MySQL are coming together. Rather than continuing with things as they are, these companies recognize that developing a future path for MySQL is essential. What this will lead to will depend on decisions outside the community. Will this act as a spur for a fork of MySQL that has community support, similar to PostgreSQL? Or will this lead to MySQL moving away from the control of a single vendor, as has been the case since it was founded?

    Whatever happens, MySQL as an open source database is still a valid and viable option for developers today. MySQL has a huge community around it, and there is a lot of passion around what the future holds for the database. The challenge is how to direct that passion and get MySQL back to where it should be. MySQL is a great database that makes it easy to implement and run applications, and it is a useful option where PostgreSQL is not a good fit or overkill for an application deployment.

    Now is the time to get involved in the events being organized by the MySQL community, to join the Foundation for MySQL Slack channel, and to help build that future for the community as a whole, and to get excited about the future for MySQL again.

    New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

    (image/jpeg; 8.99 MB)

    From devops to CTO: 8 things to start doing now 27 Jan 2026, 1:00 am

    I was promoted to CTO in my late twenties, and while it is common to see young CTOs leading startups these days, it was unusual in the ‘90s. I was far less experienced back then, and still developing my business acumen. While I was a strong software developer, it wasn’t my architecture and coding skills that helped me transition to a C-level role.

    Of all the technical skills I had back then, my devops skills were the most critical. Of course, we didn’t call it devops, as the term hadn’t been invented yet. We didn’t yet have CI/CD pipelines or infrastructure-as-code capabilities. Nonetheless, I automated our builds, scripted the deployments, standardized infrastructure configurations, and monitored systems performance.

    Developing that scaffolding enabled our development teams to focus on building and testing applications while operations managed infrastructure improvements. With automation in place and a team focused on the technology, I was able to focus on higher-level tasks such as understanding customer needs, partnering with product managers, learning marketing objectives, and learning about sales operations. When our CTO left for another opportunity, I was given the chance to step into the leadership role.

    In my book, Digital Trailblazer, I elaborate on my journey from developer to CTO and CIO. Since the book came out, many readers have asked me for advice about how to accelerate their career trajectories. In this article, I focus on how high-potential employees in devops roles—including developers and engineers—can start making moves toward a CTO role.

    Lead AI programs that deliver business value

    Studies show that a significant number of generative AI experiments fail to get deployed into production. According to the recent MIT State of AI in business report, 95% of organizations are getting zero return on their AI investments. Experimentation is an essential stage in the learning experience, especially when adopting new technologies and AI models. But the C-suite is pressuring IT departments to demonstrate better return on investment (ROI) from AI initiatives.

    Devops leaders have the opportunity to make a difference in their organization and for their careers. Lead a successful AI initiative, deploy to production, deliver business value, and share best practices for other teams to follow. Successful devops leaders don’t jump on the easy opportunities; they look for the ones that can have a significant business impact.

    Recommendation: Look for opportunities with clearly defined vision statements, active sponsors, and a dedicated team committed to the objectives. Take on the role of agile delivery leader and partner with a product manager who specifies targeted user personas, priorities, and success criteria for the AI program.

    Establish development standards for using AI tools effectively

    Another area where devops engineers can demonstrate leadership skills is by establishing standards for applying genAI tools throughout the software development lifecycle (SDLC). Advanced tools and capabilities require effective strategies to extend best practices beyond early adopters and ensure that multiple teams succeed. Some questions to consider:

    “The most relevant engineers will be the ones who treat AI as a collaborator and leadership as a craft,” says Rukmini Reddy, SVP of engineering at PagerDuty. “Resolve to deepen your automation skills, but also strengthen how you communicate, mentor, and create safety across both technical systems and human processes. Resilient operations depend just as much on how teams work together as on the automation that ships our software.”

    Recommendation: The key for devops leaders is to first find the most effective ways to apply AI in the SDLC and operations, then take a leadership role in drafting and communicating standards that teams readily adopt.

    Develop platforms teams want to use

    If you want to be recognized for promotions and greater responsibilities, a place to start is in your areas of expertise and with your team, peers, and technology leaders. However, shift your focus from getting something done to a practice leadership mindset. Develop a practice or platform your team and colleagues want to use and demonstrate its benefits to the organization.

    Devops engineers can position themselves for a leadership role by focusing on initiatives that deliver business value. Look to deliver small, incremental wins and guide solutions that help teams make continuous improvements in key areas.

    Another important area of work is reviewing platform engineering approaches that improve developer experience and creating self-service solutions. Leaders seeking recognition can also help teams adopt shift-left security and improve continuous testing practices.

    Recommendation: Don’t leave it to chance that leadership will recognize your accomplishments. Track your activities, adoption, and impacts in technology areas that deliver scalable and reusable patterns.

    Shift your mindset to tech facilitator and planner

    One of the bigger challenges for engineers when taking on larger technical responsibilities is shifting their mindset from getting work done today to deciding what work to prioritize and influencing longer-term implementation decisions. Instead of developing immediate solutions, the path to CTO requires planning architecture, establishing governance, and influencing teams to adopt self-organizing standards.

    Martin Davis, managing partner at Dunelm Associates, says to become a CTO, engineers must shift from tactical problem-solving to big-picture, longer-term strategic planning. He suggests the following three questions to evaluate platforms and technologies and shift to a more strategic mindset:

    • How will these technologies handle future expansion, both business and technology?
    • How will they adapt to changing circumstances?
    • How will they allow the addition and integration of other tools?

    “There are rarely right and wrong answers, and technology changes fast, so be pragmatic and be prepared to abandon previous decisions as circumstances change,” recommends Davis.

    Recommendation: One of the hardest mindset transitions for CTOs is shifting from being the technology expert and go-to problem-solver to becoming a leader facilitating the conversation about possible technology implementations. If you want to be a CTO, learn to take a step back to see the big picture and engage the team in recommending technology solutions.

    Develop data governance and science expertise

    Many CTOs come up the ranks as delivery leaders focused on building APIs, applications, and now AI agents. Some will have data management skills and understand architecture decisions behind data warehouses, data lakes, and data fabrics.

    But fewer CTOs have a background in data engineering, dataops, data science, and data governance. Therein lies an opportunity for devops engineers who want to become CTOs one day: Get hands-on with the challenges faced by data specialists tasked with building governed data products, which are typically composed of reusable data assets that serve multiple business needs.

    A good area to dive into is improving data quality and ensuring data is AI-ready. It’s an underappreciated function that’s key to building accurate data products and AI agents.

    Camden Swita, head of AI and ML at New Relic, says to prioritize understanding how your datasets can be used by an AI system and sussing out poor data quality. “It’s one thing for a human to recognize poor data and work around it, but AI agents are still not great at it, and using poor or outdated data will lead to undesirable outcomes. Cleaning and improving data will help address common issues like hallucinations, bad recommendations, and other issues,” says Swita.

    Recommendation: Devops engineers have many opportunities to deepen their knowledge and skills in data practices. Consider getting involved in answering some of these 10 data management questions around building trust, monitoring AI models, and improving data lineage. Also, review the 6 data risks CIOs and business leaders should be paranoid about, including intellectual property and third-party data sources.

    Extend your technology expertise across disciplines

    To ascend to a leadership role, gaining expertise in a handful of practices and technologies is insufficient. CTOs are expected to lead innovation, establish architecture patterns, oversee the full SDLC, and collaborate on and sometimes manage aspects of IT operations.

    “If devops professionals want to be considered for the role of CTO, they need to take the time to master a wide range of skills,” says Alok Uniyal, SVP and head of IT process consulting practice at Infosys. “You cannot become a CTO without understanding areas such as enterprise architecture, core software engineering and operations, fostering tech innovation, the company’s business, and technology’s role in driving business value. Showing leadership that you understand all technology workstreams at a company as well as key tech trends and innovations in the industry is critical for CTO consideration.”

    Devops professionals seeking to develop a deep and wide breadth of technology knowledge and expertise recognize it requires a commitment to lifelong learning. You can’t easily invest all the time required to dive into technology expertise, take classes in every technology, or wait for the right opportunities to join programs and teams where you can develop new skills. The most successful candidates find efficient ways to learn through reading, learning from peers, and finding mentors.

    Recommendation: Add learning to your sprint commitments and chronicle your best practices in a journal or blog. Writing helps with retention and adds an important CTO skill of sharing and teaching.

    Embrace experiences outside your comfort zone

    In Digital Trailblazer, I recommend that leadership requires getting out of your comfort zone and seeking experiences beyond your expertise.

    My devops career checklist includes several recommendations for embracing transformation experiences and seeking challenges that will train you to listen, question how things work today, and challenge people to think differently. For example, consider volunteering to manage an end-to-end major incident response to better understand being under pressure and finding problem root causes. That certainly will grow your appreciation of why observability is important and the value of monitoring systems.

    However, to be a CTO, the more important challenge is to lead efforts that require participation from stakeholders, customers, and business teams. Seek out opportunities to experience change leadership:

    • Lead a journey mapping exercise to document the end-user flows through a critical transaction and discover pain points.
    • Participate in a change management program and learn the practices required to accelerate end-user adoption of a new technology.
    • Go on a customer tour or spend time with operational teams to learn firsthand how well—or not well—the provided technology is working for them.

    “One of the best ways I personally achieved an uplift in the value I brought to a business came from experiencing change,” says Reggie Best, director of product management at IBM. “Within my current organization, that usually happened by changing projects or teams—gaining new experiences, developing an understanding of new technologies, and working with different people.”

    John Pettit, CTO at Promevo, says to rise from devops professional to CTO, embrace leadership opportunities, manage teams, and align with your organization’s strategic goals. “Build business acumen by understanding how technology impacts company performance. Invest in soft skills like communication, negotiation, and strategic thinking.”

    Petit recommends that aspiring CTOs build relationships across departments, read books on digital transformation, mentor junior engineers, develop a network by attending events, and find a mentor in a non-tech C-level leadership role.

    Recommendation: The path to CTO requires spending more time with people and less time working with technology. Don’t wait for experience opportunities—seek them out and get used to being uncomfortable: it’s a key aspect of learning leadership.

    Develop a vision and deliver results

    CTOs see their roles beyond delivering technology, architecture, data, and AI capabilities. They learn the business, customers, and employees while developing executive relationships that inform their technology strategies and roadmaps.

    Martin Davis of Dunelm Associates recommends, “Think strategically, think holistically. Always look at the bigger picture and the longer term and how the decisions you make now play out as the organization builds, grows, and develops.”

    My recent research of top leadership competencies of digital leaders includes strategic thinking, value creation, influencing, and passion for making a difference. These are all competencies that aspiring CTOs develop over time by taking on more challenging assignments and focusing on collaborating with people over technical problem solving.

    Beyond strategies and roadmaps, the best CTOs are vision painters who articulate a destiny and objectives that leaders and employees embrace. They then have the leadership chops to create competitive, differentiating technical, data, and AI capabilities while reducing risks and improving security.

    You can’t control when a CTO opportunity will present itself, but if technology leadership is your goal, you can take steps to prepare. Start by changing your mindset from doing to leading, then look for opportunities to guide teams and increase collaboration with business stakeholders.

    (image/jpeg; 14.41 MB)

    Unplugged holes in the npm and yarn package managers could let attackers bypass defenses against Shai-Hulud 26 Jan 2026, 5:42 pm

    Javascript developers should consider moving away from the npm and yarn platforms for distributing their work because newly-found holes allow threat actors to run malicious worm attacks like Shai-Hulud, says an Israeli researcher.

    The warning comes from Oren Yomtov of Koi Security, who blogged Monday of discovering six zero day vulnerabilities in several package managers that could allow hackers bypass defenses that had been recommended last November after Shai-Hulud roamed through npm and compromised over 700 packages.

    Those defenses are:

    • disabling the ability to run lifecycle scripts, commands that run automatically during package installation,
    • saving lockfile integrity checks (package-lock.json, pnpm-lock.yaml, and others) to version control (git). The lockfile records the exact version and integrity hash of every package in a dependency tree. On subsequent installs, the package manager checks incoming packages against these hashes, and if something doesn’t match, installation fails. If an attacker compromises a package and pushes a malicious version, the integrity check should catch the mismatch and block it from being installed.

    Those recommendations “became the standard advice everywhere from GitHub security guides to corporate policy docs” after November, says Yomtov, “because if malicious code can’t run on install, and your dependency tree is pinned, you’re covered.”

    November’s advice still valid, but more issues need addressing

    That advice is still valid, he added in an email interview.

    However, the vulnerabilities he discovered — dubbed PackageGate — that allow hackers to get around those two defenses have to be addressed by all platforms, he said.

    So far, the pnpm, vlt, and Bun platforms have addressed the bypass holes, Yomtov said, but npm and yarn haven’t. He therefore recommends that JavaScript developers use pnpm, vlt or Bun.

    He added that, in any case, JavaScript developers should keep whatever JavaScript package manager they use up to date to ensure they have the latest patches.

    GitHub statement ‘bewildering’

    Microsoft, which owns and oversees npm through GitHub, referred questions about the vulnerabilities to GitHub. It said in a statement, “We are actively working to address the new issue reported as npm actively scans for malware in the registry.” In the meantime, it urges project developers to adopt the recommendations in this blog issued after the Shai-Hulud attacks.

    The statement also notes that, last September, GitHub said it is strengthening npm’s security, including making changes to authentication and token management.

    GitHub also warns that that, if a package being installed through git contains a prepare script, its dependencies and devDependencies will be installed. “As we shared when the ticket was filed, this is an intentional design and works as expected. When users install a git dependency, they are trusting the entire contents of that repository, including its configuration files.”

    Yomtov found this explanation of intentional design “bewildering.”

    Not the complete picture

    He says the scripts bypass vulnerability was reported through the HackerOne bug bounty program on November 26, 2025. While other JavaScript package managers accepted the reports, npm said the platform was working as intended, and that the ‘ignore scripts’ command should prevent the running of unapproved remote code.

    “We didn’t write this post to shame anyone,” Yomtov said in the blog. “We wrote it because the JavaScript ecosystem deserves better, and because security decisions should be based on accurate information, not assumptions about defenses that don’t hold up.

    “The standard advice, disable scripts and commit your lockfiles, is still worth following. But it’s not the complete picture,” he said. “Until PackageGate is fully addressed, organizations need to make their own informed choices about risk.”

    (image/jpeg; 0.93 MB)

    Descope introduces Agentic Identity Hub 2.0 for managing AI agents 26 Jan 2026, 5:15 pm

    Descope has announced Agentic Identity Hub 2.0, an update to its no-code identity platform for AI agents and Model Context Protocol (MCP) servers. The new release gives developers and security teams a dedicated UI and control plane to manage authorization, access control, credentials, and policies for AI agents and MCP servers, Descope said.

    Unveiled January 26, Agentic Identity Hub 2.0 lets MCP developers and AI agent builders use the platform to manage AI agents as first-class identities alongside human users, and adds OAuth 2.1 and tool-level scopes to internal and external MCP servers. In addition, users can govern agent access to MCP servers with enterprise-grade policy enforcement. Descope’s no-code identity platform is intended to help organizations build and modify identity journeys for customers, partners, AI agents, and MCP servers using visual workflows, Descope said.

    Specific capabilities of Agentic Identity Hub 2.0 include:

    • Agentic identity management and a centralized view of the agentic identities connected to an organization’s applications, APIs, and MCP servers.
    • MCP authorization, enabling MCP server developers to add protocol-compliant access control to internal and external-facing MCP servers.
    • A credential vault to manage credentials (OAuth tokens and API keys) that agents can use to access third-party systems.
    • Policy controls to help define granular authorizations to govern which AI agents can access MCP servers and which tool-level scopes can be invoked.
    • AI agent logging and auditing, providing visibility into every agent action.

    (image/jpeg; 8.51 MB)

    Kotlin-based Ktor 3.4 boosts HTTP client handling 26 Jan 2026, 12:11 pm

    JetBrains has released Ktor 3.4, an update to its Kotlin-based framework for asynchronous server-side and client-side application development. The release brings duplex streaming to the OkHttp client engine and introduces a new HttpRequestLifecycle plugin that enables the cancellation of in-flight HTTP requests when the client disconnects.

    Ktor 3.4 was announced January 23. Instructions for getting started can be found at ktor.io.

    The new HttpRequestLifecycle plugin, which enables cancellation of in-flight HTTP requests when the client disconnects, is useful when there is a need to cancel a long-running or resource-intensive in-flight request. When the client disconnects, the coroutine handling the request is canceled, along with launch or async coroutines started by the client, and structured concurrency cleans all resources. This feature is currently supported only for the Netty and CIO engines.

    Ktor 3.4 also introduces a new API for dynamically documenting endpoints that works in tandem with a new compiler plugin. Instead of building a Swagger front end from a static file, the model is built at runtime from details embedded in the routing tree. To generate documentation, developers can enable it through the Ktor Gradle plugin, then it will automatically provide details in code via the new describe API.

    Also in Ktor 3.4, the OkHttp client engine now supports duplex streaming, enabling clients to send request body data and receive response data simultaneously, in contrast to regular HTTP calls, where the request body must be fully sent before the response begins. Duplex streaming is available for HTTP/2 connections and can be enabled using the new duplexStreamingEnabled property in OkHttpConfig.

    And the Compression plugin now supports Ztsd via a ktor-server-compression-zstd module. Zstd is a fast compression algorithm with high compression ratios, low compression times, and a configurable compression level.

    (image/jpeg; 3.01 MB)

    Zero-trust data governance needed to protect AI models from slop 26 Jan 2026, 8:08 am

    Organizations need to be less trustful of data given how much of it is AI-generated, according to new research from Gartner.

    As more enterprises jump on board the generative AI train — a recent Gartner survey found 84% expect to spend more on it this year — the risk grows that future large language models (LLMs) will be trained on outputs from previous models, increasing the danger of so-called model collapse.

    To avoid this, Gartner recommends companies make changes to manage the risk of unverified data. These include the appointment of an AI governance leader to work closely with data and analytics teams; improve collaboration between departments with cross-functional groups including representatives from cybersecurity, data, and analytics; and updating existing security and data management policies to address risks from AI-generated data.

    Gartner predicts that by 2028, 50% of organizations will have had to adopt a zero-trust posture for data governance as a result of this tidal wave of unverified AI-generated data.

    “Organizations can no longer implicitly trust data or assume it was human generated,” Gartner managing VP Wan Fui Chan said in a statement. “As AI-generated data becomes pervasive and indistinguishable from human-created data, a zero-trust posture establishing authentication and verification measures is essential to safeguard business and financial outcomes.”

    What makes matters even trickier to handle, said Chan, is that there will be different approaches to AI from governments. “Requirements may differ significantly across geographies, with some jurisdictions seeking to enforce stricter controls on AI-generated content, while others may adopt a more flexible approach,” he said.

    Perhaps the best example of how AI can cause data governance issues was when Deloitte Australia had to refund part of a government contract fee after AI-generated errors, including non-existent legal citations, were included in its final report.

    This article first appeared on CIO.

    (image/jpeg; 11.39 MB)

    Stop treating force multiplication as a side gig. Make it intentional 26 Jan 2026, 2:30 am

    Most engineers have heard the phrase “10x engineer” or “force multiplier“. In practice, the highest leverage is not solo heroics but whether your presence makes everyone around you faster, safer and more effective. That is force multiplication.

    Early in my time as a Senior Software Engineer, I didn’t have a clear framework for how to excel as a senior IC and multiply my team’s impact. I spoke with more than five mentors and a dozen senior and principal engineers. I distilled the lessons into a practical checklist that I use today. I’ll share that framework and how you can apply it this week to reduce risk, raise quality and increase delivery capacity.

    What “force multiplication” really means

    When I first joined, I thought being a force multiplier meant shipping more code. It took a few incidents and a couple of painful postmortems to realize the real leverage is what you do before the outage. A “10x engineer” is someone who masters context and amplifies team impact rather than focusing on raw output. Force multiplication is about setting the technical bar, improving team processes, mentoring teammates and providing direction so the whole team moves faster and safer.

    • Lead without authority. You may not have direct reports, yet you shape architecture, quality and the roadmap. Your leverage comes from artifacts, reviews and clear standards, not from title.I started by publishing a lightweight architecture template and a rollout checklist that the team could copy. That reduced ambiguity during design and cut review cycles by nearly 30 percent.

    • Establish clear standards. Work with your peers to define templates for design docs, pull requests, release plans and incident runbooks. Make those templates living artifacts in your wiki. When a new engineer joins, the first question shouldn’t be “How do we do this?”. It should be “Which template applies?”.

    • Hold a high bar with clarity. Give specific, timely feedback on code and designs. When timelines are tight, present options with explicit tradeoffs and risks and recommend a path. I learned to stop saying “we’ll figure it out later.” Instead, I started offering two plans: plan A holds the date with reduced scope and known risks; plan B holds scope with a later date. That framing helped leadership make tradeoffs without surprise.

    • Build in quality and operational excellence. Treat code quality, test coverage, observability, error budgets, release safety and rollback readiness as part of the definition of done. I wired these into code reviews and on‑call routines, so they persisted under pressure. Observability is not a feature; it’s a nonfunctional requirement that must be planned and reviewed. According to the 2023 DORA report, elite teams deploy code multiple times per day and restore service in under an hour, while low performers take weeks to recover.

    • Make learning and knowledge-sharing routine. Run short office hours, host brown‑bag sessions, capture lessons from incidents and maintain a “first 30 days” checklist for new hires. The goal is to replace ad hoc heroics with replicable practices.

    The senior engineer portfolio: How you spend your time

    To avoid burnout and cover the areas above, you need a plan. Here’s the mix I use as a planning tool. The percentages will vary by company and team, but if any bucket falls to zero, quality or culture may drift.

    • 50%: Individual project delivery. Own a meaningful slice of product or platform work end-to-end. This keeps your judgment sharp and your credibility high. Show by example that your architecture, code and testing follow best practices.
    • 20%: Reviews and feedback on architecture, code and processes. Give actionable feedback that improves maintainability, resilience and readability. Use the documented standards as examples so your comments scale beyond one pull request.
    • 10%: Long-term architecture and technical debt. Maintain a living architecture document, potential bottlenecks, a small set of SLIs and a technical debt register that explains business risk in plain language. Advocate for a steady capacity for debt reduction.
    • 10%: Mentoring and unblocking the team. Run office hours, help teammates navigate tricky technical issues and unblock work.
    • 10%: Learning and sharing best practices. Curate guidelines for architectures, code, rollouts and rollback plans. Encourage the team to contribute and keep the guidance current.

    Why this mix matters: culture and quality are as much your job as they are your manager’s. Managers create the environment and priorities. Senior engineers operationalize the standards that make quality and learning the default.

    7 practical steps to multiply impact

    Below are the actions I took that you could adopt. Each is simple but not easy; they require discipline and a small-time investment that pays back quickly.

    1. Build a weekly learning routine. Block 30 minutes on your calendar to review coding standards, architecture templates and operational excellence practices. Note gaps you see in the team. Compare against company guidance and industry best practices. Share one short tip or pattern each week. Weekly, visible improvements create momentum and motivation.

    2. Schedule peer 1:1s that develop people. Meet teammates with a coaching agenda: understand career goals, the projects they are interested in. Provide actionable feedback to improve in the areas of interest. After design reviews or incidents, send a one‑paragraph follow‑up that captures the lesson and links to a reference example.

    3. Grow your staff network and bring the outside in. Connect with senior and principal engineers across orgs. Ask for one lesson from a recent migration, incident or scale event. Summarize what you learned in a short note and link artifacts your team can copy. This prevents local maxima and speeds adoption.

    4. Advise leadership with options and risk. For deadlines that may potentially compromise quality, present two plans: hold the date with explicit scope tradeoffs and stated risks or hold the scope with a date that matches capacity. Tie technical debt to user or revenue impact and propose a steady allocation to address it. Surface cultural risks like burnout with data and anecdotes.

    5. Influence the roadmap from the bottom up. Facilitate brainstorming on long-term architecture and reliability goals. Turn ideas into lightweight proposals with options and tradeoffs. Partner with product to merge user value, technical debt and reliability into a single prioritized roadmap.

    6. Raise the hiring bar and make it scalable. Invite engineers to shadow your interviews with a clear rubric. Debrief immediately, then graduate them to reverse shadow and go solo. Capture good prompts, work samples and scoring guidance so the loop is consistent and fair across candidates.

    7. Allocate time to do the multiplying work. If your team participates in sprint planning, make sure to reserve your bandwidth for these activities. Trying to do them outside your project work leads to burnout and inconsistent impact.

    One small example

    On one project, we were shipping a new feature with a tight deadline. The first rollout exposed a gap in observability: we couldn’t distinguish a configuration drift from a control plane bug. I proposed adding a small, focused set of SLIs and a release checklist that required a canary window, synthetic traffic tests and a rollback plan. I also documented the pattern in our wiki and ran a 20‑minute brownbag to walk through the checklist.

    The result: the next rollout caught the issue in the canary phase, we rolled back gracefully and the postmortem was short and constructive. More importantly, the checklist became the team norm. We reduced incident severity in the next two semesters and shortened our mean time to recovery. That is force multiplication in action: a small change in standards yielded outsized reliability gains.

    Make force multiplication your operating model

    Force multiplication isn’t a personality trait or a side gig – it’s a repeatable operating model. It’s the standards you codify, the reviews you run, the guardrails you build and the automations you ship so the team can move faster with less risk. When senior ICs make this work visible and measurable, the organization gets predictable delivery, fewer surprises and more capacity for innovation.

    Start this week. Pick one of the seven steps and turn it into a tiny experiment:

    • Publish a one‑page architecture template and a release checklist
    • Add 2–3 SLIs for a critical service and set alert thresholds
    • Draft a one‑page A/B decision memo for your next release

    Measure the impact. Track 1–2 leading indicators (e.g., canary‑phase failures caught, number of automated recoveries, review cycle time) and 1–2 lagging outcomes (e.g., incident severity, MTTR, rollback rate). Share the results in your next retro and iterate.

    Your next move. Publish your checklist in the team wiki, run a 20‑minute brown‑bag to socialize it and put one “multiplying work” card on the board this sprint. Repeat, measure and teach.

    [Note: The views expressed in this article are my own and do not represent the views of Microsoft.]

    This article is published as part of the Foundry Expert Contributor Network.
    Want to join?

    (image/jpeg; 11.16 MB)

    Why your AI agents need a trust layer before it’s too late 26 Jan 2026, 2:00 am

    When one compromised agent brought down our entire 50-agent ML system in minutes, I realized we had a fundamental problem. We were building autonomous AI agents without the basic trust infrastructure that the internet established 40 years ago with DNS.

    As a PhD researcher and IEEE Senior Member, and I’ve spent the past year building what I call “DNS for AI agents” — a trust layer that finally gives autonomous AI the security foundation it desperately needs. What started as a research project to solve authentication problems in multi-tenant ML environments has evolved into a production system that’s changing how organizations deploy AI agents at scale.

    The transformation from traditional machine learning to agentic AI represents one of the most significant shifts in enterprise technology. While traditional ML pipelines require human oversight at every step — data validation, model training, deployment and monitoring — modern agentic AI systems enable autonomous orchestration of complex workflows involving multiple specialized agents. But with this autonomy comes a critical question: How do we trust these agents?

    The cascading failure that changed everything

    Let me share what happened in our production environment that crystallized this problem. We were running a multi-tenant ML operations system with 50 agents, handling everything from concept-drift detection to automated model retraining. Each agent had its own responsibility, its own credentials and its own hardcoded endpoints for communicating with other agents.

    On a Tuesday morning, a single agent was compromised due to a configuration error. Within six minutes, the entire system collapsed. Why? Because agents had no way to verify each other’s identity. The compromised agent impersonated our model deployment service, causing downstream agents to deploy corrupted models. Our monitoring agent, unable to distinguish legitimate from malicious traffic, dutifully reported everything as normal.

    This wasn’t just a technical failure — it was a trust failure. We had built an autonomous system without the fundamental mechanisms for agents to discover, authenticate and verify each other. It was like building a global network without DNS, where every connection relies on hardcoded IP addresses and blind trust.

    That incident revealed four critical gaps in how we deploy AI agents today.

    1. There’s no uniform discovery mechanism — agents rely on manual configuration and hardcoded endpoints.
    2. Cryptographic authentication between agents is virtually nonexistent.
    3. Agents can’t prove their capabilities without exposing sensitive implementation details.
    4. Governance frameworks for agent behavior are either nonexistent or impossible to enforce consistently.

    Building trust from the ground up

    The solution we developed, called Agent Name Service (ANS), takes inspiration from how the internet solved a similar problem decades ago. DNS transformed the internet by mapping human-readable names to IP addresses. ANS does something similar for AI agents, but with a crucial addition: it maps agent names to their cryptographic identity, their capabilities and their trust level.

    Here’s how it works in practice. Instead of agents communicating through hardcoded endpoints like “http://10.0.1.45:8080,” they use self-describing names like “a2a://concept-drift-detector.drift-detection.research-lab.v2.prod.” This naming convention immediately tells you the protocol (agent-to-agent), the function (drift detection), the provider (research-lab), the version (v2) and the environment (production).

    But the real innovation lies beneath this naming layer. We built ANS on three foundational technologies that work together to create comprehensive trust.

    1. Decentralized Identifiers (DIDs) give each agent a unique, verifiable identity using W3C standards originally designed for human identity management.
    2. Zero-knowledge proofs allow agents to prove they have specific capabilities — like database access or model training permissions — without revealing how they access those resources.
    3. Policy-as-code enforcement through Open Policy Agent ensures that security rules and compliance requirements are declarative, version-controlled and automatically enforced.

    We designed ANS as a Kubernetes-native system, which was crucial for enterprise adoption. It integrates directly with Kubernetes Custom Resource Definitions, admission controllers and service mesh technologies. This means it works with the cloud-native tools organizations already use, rather than requiring a complete infrastructure overhaul.

    The technical implementation leverages what’s called a zero-trust architecture. Every agent interaction requires mutual authentication using mTLS with agent-specific certificates. Unlike traditional service mesh mTLS, which only proves service identity, ANS mTLS includes capability attestation in the certificate extensions. An agent doesn’t just prove “I am agent X” — it proves “I am agent X and I have the verified capability to retrain models.”

    From research to production reality

    The real validation came when we deployed ANS in production. The results exceeded even our optimistic expectations. Agent deployment time dropped from 2–3 days to under 30 minutes — a 90% reduction. What used to require manual configuration, security reviews, certificate provisioning and network setup now happens automatically through a GitOps pipeline.

    Even more impressive was the deployment success rate. Our traditional approach had a 65% success rate, with 35% of deployments requiring manual intervention to fix configuration errors. With ANS, we achieved 100% deployment success with automated rollback capability. Every deployment either succeeds completely or rolls back cleanly — no partial deployments, no configuration drift, no manual cleanup.

    The performance metrics tell an equally compelling story. Service response times average under 10 milliseconds, which is fast enough for real-time agent orchestration while maintaining cryptographic security. We’ve successfully tested the system with over 10,000 concurrent agents, demonstrating that it scales far beyond typical enterprise needs.

    ANS in action

    Let me share a concrete example of how this works. We have a concept-drift detection workflow that illustrates the power of trusted agent communication. When our drift detector agent notices a 15% performance degradation in a production model, it uses ANS to discover the model retrainer agent by capability — not by hardcoded address. The drift detector then proves it has the capability to trigger retraining using a zero-knowledge proof. An OPA policy validates the request against governance rules. The retrainer executes the update and a notification agent alerts the team via Slack.

    This entire workflow — discovery, authentication, authorization, execution and notification — happens in under 30 seconds. It’s 100% secure, fully audited and happens without any human intervention. Most importantly, every agent in the chain can verify the identity and capabilities of the others.

    Lessons learned and the path forward

    Building ANS taught me several lessons about deploying autonomous AI systems. First, security can’t be an afterthought. You can’t bolt trust onto an agent system later — it must be foundational. Second, standards matter. By supporting multiple agent communication protocols (Google’s A2A, Anthropic’s MCP and IBM’s ACP), we ensured ANS works across the fragmented agent ecosystem. Third, automation is non-negotiable. Manual processes simply can’t scale to the thousands of agents that enterprises will be running.

    The broader implications extend beyond just ML operations. As organizations move toward autonomous AI agents handling everything from customer service to infrastructure management, the trust problem becomes existential. An autonomous system without proper trust mechanisms is a liability, not an asset.

    We’ve seen this pattern before in technology evolution. In the early internet, we learned that security through obscurity doesn’t work. With cloud computing, we learned that perimeter security isn’t enough. Now, with agentic AI, we’re learning that autonomous systems require comprehensive trust frameworks.

    The open-source implementation we’ve released includes everything needed to deploy ANS in production: the core library, Kubernetes manifests, demo agents, OPA policies and monitoring configurations. We’ve also published the complete technical presentation from MLOps World 2025 where I demonstrated the system live.

    What this means for enterprise AI strategy

    If you’re deploying AI agents in your organization — and recent surveys suggest most enterprises are — you need to ask yourself some hard questions.

    • How do your agents authenticate with each other?
    • Can they verify capabilities without exposing credentials?
    • Do you have automated policy enforcement?
    • Can you audit agent interactions?

    If you can’t answer these questions confidently, you’re building on a foundation of trust assumptions rather than cryptographic guarantees. And as our cascading failure demonstrated, those assumptions will eventually fail you.

    The good news is that this problem is solvable. We don’t need to wait for vendors or standards bodies. The technologies exist today: DIDs for identity, zero-knowledge proofs for capability attestation, OPA for governance and Kubernetes for orchestration. What was missing was a unified framework that brings them together specifically for AI agents.

    The shift to autonomous AI is inevitable. The only question is whether we’ll build these systems with proper trust infrastructure from the start or whether we’ll wait for a major incident to force our hand. Based on my experience, I strongly recommend the former.

    The future of AI is agentic. The future of agentic AI must be secure. ANS provides the trust layer that makes both possible.

    The complete Agent Name Service implementation, including source code, deployment configurations and documentation, is available at github.com/akshaymittal143/ans-live-demo. A technical presentation demonstrating the system is available at MLOps World 2025.

    This article is published as part of the Foundry Expert Contributor Network.
    Want to join?

    (image/jpeg; 2.8 MB)

    With AI, the database matters again 26 Jan 2026, 1:00 am

    Developers have spent the past decade trying to forget databases exist. Not literally, of course. We still store petabytes. But for the average developer, the database became an implementation detail; an essential but staid utility layer we worked hard not to think about.

    We abstracted it behind object-relational mappers (ORM). We wrapped it in APIs. We stuffed semi-structured objects into columns and told ourselves it was flexible. We told ourselves that persistence was a solved problem and began to decouple everything. If you needed search, you bolted on a search system. Ditto for caching (grab a cache), documents (use a document store), relationships (add a graph database), etc. We thought we were being clever but really we were shifting complexity from the database engine into glue code, pipelines, and operational overhead.

    The architectural frailty of this approach has been laid bare by AI.

    In an AI-infused application, the database stops being a passive store of record and becomes the active boundary between a probabilistic model and your system of record. The difference between a cool demo and a mission-critical system is not usually the large language model (LLM). It is the context you can retrieve, the consistency of that context, and the speed at which you can assemble it.

    AI has made the database visible again. We are not suddenly loving SQL again, but we are realizing that AI memory is just another database problem. Your database is no longer just where data lives. It is where context gets assembled, and in AI, context is everything.

    Consistency and hallucinations

    To understand why we are struggling, we have to look at how we got here. We didn’t eliminate the database during the past 10 years. We actually multiplied it.

    Modern application design taught us to route around database limits. We built caches, search clusters, stream processors, and a cacophony of purpose-built stores. We convinced ourselves that “polyglot persistence” (wiring together five different “best of breed” systems) was architectural enlightenment. In reality, it was mostly just résumé-driven development that shifted complexity from the database engine to the application code.

    That worked when “eventual consistency” was acceptable. In AI, it doesn’t work. At all.

    Consider a practical retrieval-augmented generation (RAG) pipeline. It is rarely as simple as “fetch vectors.” A real enterprise AI workflow needs vector similarity search to find semantic matches, document retrieval to fetch the content, graph traversal to understand relationships like permissions or hierarchy, and time-series analysis to ensure the data is not stale. In the “bolt-on” architecture of the last decade, you implement that by composing specialized services: a vector database, a document store, a graph database, and a time-series system.

    Such a bolt-on architecture wasn’t ideal but neither was it the deal-killer that AI makes it. Every arrow in the flow is a network hop, and every hop adds serialization overhead. Making matters worse, every separate system introduces a new consistency model. You are paying a tax in complexity, latency, and inconsistency when you split what should be one logical context into six different physical systems. AI is uniquely sensitive to this tax.

    When a normal web app shows stale data, a user might see an old inventory count for a few seconds. When an AI agent retrieves inconsistent context (perhaps fetching a vector that points to a document that has already been updated in the relational store), it constructs a plausible narrative based on false premises. We call these hallucinations, but often the model is not making things up. It is being fed stale data by a fragmented database architecture. If your search index is “eventually consistent” with your system of record, your AI is “eventually hallucinating.”

    How about if your transactional system is the source of truth, but your vector index updates asynchronously? Well, you’ve built a time lag into your agent’s memory. If your relationship data is synced through a pipeline that can drift, your agent can “know” relationships that are no longer true. If permissions are checked in one system while content is fetched from another, you are one bug away from data leakage.

    This problem gets worse when we move from passive chatbots to active agents. We expect the next generation of AI to perform tasks, not just summarize text. But doing things requires transactions. If your agent needs to update a customer record (relational), re-index their preference profile (vector), and log the interaction (document), and your database architecture requires a saga pattern to coordinate those writes across three different systems, you have built a fragility engine. In a fragmented stack, a failure halfway through that workflow leaves your agent’s world in a corrupted state. You cannot build a reliable agent if it cannot rely on atomicity, consistency, isolation, durability (ACID) guarantees across its entire memory space.

    That is not an AI problem. It’s a basic architecture problem. As I have noted, you cannot build a reliable agent on unreliable data infrastructure. Reliability work for agents is inseparable from database work.

    Architectural constraints

    For years, developers have accepted good-enough performance because we could simply scale horizontally. If a query was slow, we added nodes. But AI workloads are computationally heavy, and the underlying physics of our data structures matters again.

    Take the simple act of reading a JSON document. In some popular document stores, the underlying binary format requires sequential field scanning. That is an O(n) operation. To find a field at the end of a large document, the engine must scan everything preceding it. That may work for a simple CMS, but it completely breaks down when you’re operating at enterprise scale (say, 100,000 requests per second during a viral event), because nanoseconds compound. An O(n) scan at that volume can waste dozens of CPU cores just parsing data. In contrast, newer binary formats use hash-indexed navigation to allow O(1) jumps to specific fields, delivering performance gains of 500 times or more for deep document traversal.

    This is not a micro-optimization. It is a fundamental architectural constraint. “Just nanoseconds” is what you say when you have not operated at the scale where latency kills the user experience. In an AI world where inference is already slow, your database cannot afford to add latency due to inefficient algorithmic foundations.

    Stop building infrastructure

    I am not saying there is no place for specialized tools. But the mistake we made in the last decade was assuming we could assemble five specialized systems and get something simpler than one general system. For AI, the question to ask is not “Which database has vector search?” The question is “Where does my context live, and how many consistency boundaries do I cross to assemble it?”

    If the answer is “a lot,” you are signing up to build infrastructure. You are building pipelines, retries, reconciliation logic, and monitoring for lag you will eventually treat as normal.

    The alternative is to stop treating data models as physical mandates. In the past, if we needed a graph, we copied data to a graph database. If we needed vectors, we copied data to a vector store. That copying is the root of the synchronization evil.

    The scientific approach is to treat data as having one canonical form that can be projected into whatever shape the application needs. Do you need to traverse relationships? The database should project a graph view. Do you need to search semantics? It should project a vector view. These should not be copies that require pipelines. They should be different lenses on the same single source of truth. When the record updates, every projection updates instantly.

    If you are building an AI-infused application and your plan involves maintaining multiple pipelines to keep multiple databases in sync, you are not building AI. You are building a context delivery system, and you are taking on all the operational risk that comes with it.

    Here is the decision rule I would use. Count how many consistency boundaries your agent crosses to answer a question. Count how many physical copies of the same truth exist across your stack. If either number keeps growing, your reliability work is already in trouble.

    This is why the database matters again. AI is forcing us to confront what we spent a decade abstracting away: The hard part of software is turning messy reality into a coherent, queryable representation of the world. In the age of AI, context rules. Databases, for all their supposed boringness, are still the best tool we have for delivering that context securely at scale. They allow you to delete the infrastructure that provides no business value: the ETL jobs, the synchronization pipelines, the object-relational mapping layers, and the distributed transaction coordinators. The best code is the code you do not write.

    (image/jpeg; 5.21 MB)

    16 open source projects transforming AI and machine learning 26 Jan 2026, 1:00 am

    For several decades now, the most innovative software has always emerged from the world of open source software. It’s no different with machine learning and large language models. If anything, the open source ecosystem has grown richer and more complex, because now there are open source models to complement the open source code.

    For this article, we’ve pulled together some of the most intriguing and useful projects for AI and machine learning. Many of these are foundation projects, nurturing their own niche ecology of open source plugins and extensions. Once you’ve started with the basic project, you can keep adding more parts.

    Most of these projects offer demonstration code, so you can start up a running version that already tackles a basic task. Additionally, the companies that build and maintain these projects often sell a service alongside them. In some cases, they’ll deploy the code for you and save you the hassle of keeping it running. In others, they’ll sell custom add-ons and modifications. The code itself is still open, so there’s no vendor lock in. The services simply make it easier to adopt the code by paying someone to help.

    Here are 16 open source projects that developers can use to unlock the potential in machine learning and large language models of any size—from small to large, and even extra large.

    Agent Skills

    AI coding agents are often used to tackle standard tasks like writing React components or reviewing parts of the user interface. If you are writing a coding agent, it makes sense to use vetted solutions that are focused on the task at hand. Agent Skills are pre-coded tools that your AI can deploy as needed. The result is a focused set of vetted operations capable of producing refined, useful code that stays within standard guidelines. License: MIT.

    Awesome LLM Apps

    If you are looking for good examples of agentic coding, see the Awesome LLM Apps collection. Currently, the project hosts several dozen applications that leverage some combination of RAG databases and LLMs. Some are simple, like a meme generator, while others handle deeper research like the Journalist agent. The most complex examples deploy multi-agent teams to converge upon an answer. Every application comes with working examples for experimentation, so you can learn from what’s been successful in the past. Altogether, the apps in this collection are great inspiration for your own projects. License: Apache 2.0.

    Bifrost

    If your application requires access to an LLM service, and you don’t have a particular one in mind, check out Bifrost. A fast, unified gateway to more than 15 LLM providers, this OpenAI-compatible API quickly abstracts away the differences between models, including all the major ones. It includes essential features like governance, caching, budget management, load balancing, and it has guardrails to catch problems before they are sent out to service providers, who will just bill you for the time. With dozens of great LLM providers constantly announcing new and better models, why limit yourself? License: Apache 2.0.

    Claude Code

    If the popularity of AI coding assistants tells us anything, it’s that all developers—and not just the ones building AI apps—appreciate a little help writing and reviewing their code. Claude Code is that pair programmer. Trained on all the major programming languages, Claude Code can help you write code that is better, faster, and cleaner. It digests a codebase and then starts doing your bidding, while also making useful suggestions. Natural language commands plus some vague hand waving are all the Anthropic LLM needs to refactor, document, or even add new features to your existing code. License: Anthropic’s Commercial TOS.

    Clawdbot

    Many of the tools in this list help developers create code for other people. Clawdbot is the AI assistant for you, the person writing the code. It integrates with your desktop to control built-in tools like the camera and large applications like the browser. A multi-channel inbox accepts your commands through more than a dozen different communication channels including WhatsApp, Telegram, Slack, and Discord. A cron job adds timing. It’s the ultimate assistant for you, the ruler of your data. If AI exists to make our lives easier, why not start by organizing the applications on your desktop? License: MIT.

    Dify

    For projects that require more than just one call to an LLM, Dify could be the solution you’ve been looking for. Essentially a development environment for building complex agentic workflows, Dify stitches together LLMs, RAG databases, and other sources. It then monitors how they perform under different prompts and parameters and puts it all together in a handy dashboard, so you can iterate on the results. Developing agentic AI requires rapid experimentation, and Dify provides the environment for those experiments. License: Modified version of Apache 2.0 to exclude some commercial uses.

    Eigent

    The best way to explore the power and limitations of an agentic workflow is to deploy it yourself on your own machine, where it can solve your own problems. Eigent delivers a workforce of specialized agents for handling tasks like writing code, searching the web, and creating documents. You just wave your hands and issue instructions, and Eigent’s LLMs do their best to follow through. Many startups brag about eating their own dogfood. Eigent puts that concept on a platter, making it easy for AI developers to experience directly the abilities and failings of the LLMs they’re building. License: Apache 2.0.

    Headroom

    Programmers often think like packrats. If the data is good, why not pack in some more? This is a challenge for code that uses an LLM because these services charge by the token, and they also have a limited context window. Headroom tackles this issue with agile compression algorithms that trim away the excess, especially the extra labels and punctuation found in common formats like JSON. A big part of designing working AI applications is cost engineering, and saving tokens means saving money. License: Apache 2.0.

    Hugging Face Transformers

    When it comes to starting up a brand-new machine learning project, Hugging Face Transformers is one of the best foundations available. Transformers offers a standard format for defining how the model interacts with the world, which makes it easy to drop a new model into your working infrastructure for training or deployment. This means your model will interact nicely with all the already available tools and infrastructure, whether for text, vision, audio, video, or all of the above. Fitting into a standard paradigm makes it much easier to leverage your existing tools while focusing on the cutting edge of your research. License: Apache 2.0.

    LangChain

    For agentic AI solutions that require endless iteration, LangChain is a way to organize the effort. It harnesses the work of a large collection of models and makes it easier for humans to inspect and curate the answers. When the task requires deeper thinking and planning, LangChain makes it easy to work with agents that can leverage multiple models to converge upon a solution. LangChain’s architecture includes a framework (LangGraph) for organizing easily customizable workflows with long-term memory, and a tool (LangSmith) for evaluating and improving performance. Its Deep Agents library provides teams of sub-agents, which organize problems into subsets then plan and work toward solutions. It is a proven, flexible test bed for agentic experimentation and production deployment. License: MIT.

    LlamaIndex

    Many of the early applications for LLMs are sorting through large collections of semi-structured data and providing users with useful answers to their questions. One of the fastest ways to customize a standard LLM with private data is to use LlamaIndex to ingest and index the data. This off-the-shelf tool provides data connectors that you can use to unpack and organize a large collection of documents, tables, and other data, often with just a few lines of code. The layers underneath can be tweaked or extended as the job requires, and LlamaIndex works with many of the data formats common in enterprises. License: MIT.

    Ollama

    For anyone experimenting with LLMs on their laptop, Ollama is one of the simplest ways to download one or more of them and get started. Once it’s installed, your command line becomes a small version of the classic ChatGPT interface, but with the ability to pull a huge collection of models from a growing library of open source options. Just enter: ollama run and the model is ready to go. Some developers are using it as a back-end server for LLM results. The tool provides a stable, trustworthy interface to LLMs, something that once required quite a bit of engineering and fussing. The server simplifies all this work so you can tackle higher level chores with many of the most popular open source LLMs at your fingertips. License: MIT.

    OpenWebUI

    One of the fastest ways to put up a website with a chat interface and a dedicated RAG database is to spin up an instance of OpenWebUI. This project knits together a feature-rich front end with an open back end, so that starting up a customizable chat interface only requires pulling a few Docker containers. The project, though, is just a beginning, because it offers the opportunity to add plugins and extensions to enhance the data at each stage. Practically every part of the chain from prompt to answer can be tweaked, replaced, or improved. While some teams might be happy to set it up and be done, the advantages come from adding your own code. The project isn’t just open source itself, but a constellation of hundreds of little bits of contributed code and ancillary projects that can be very helpful. Being able to customize the pipeline and leverage the MCP protocol supports the delivery of precision solutions. License: Modified BSD designed to restrict removing OpenWebUI branding without an enterprise license.

    Sim

    The drag-and-drop canvas for Sim is meant to make it easier to experiment with agentic workflows. The tool handles the details of interacting with the various LLMs and vector databases; you just decide how to fit them together. Interfaces like Sim make the agentic experience accessible to everyone on your team, even those who don’t know how to write code. License: Apache 2.0.

    Sloth

    One of the most straightforward ways to leverage the power of foundational LLMs is to start with an open source model and fine-tune it with your own data. Unsloth does this, often faster than other solutions do. Most major open source models can be transformed with reinforcement learning. Unsloth is designed to work with most of the standard precisions and some of the largest context windows. The best answers won’t always come directly from RAG databases. Sometimes, adjusting the models is the best solution. License: Apache 2.0.

    vLLM

    One of the best ways to turn an LLM into a useful service for the rest of your code is to start it up with vLLM. The tool loads many of the available open source models from repositories like Hugging Face and then orchestrates the data flows so they keep running. That means batching the incoming prompts and managing the pipelines so the model will be a continual source of fast answers. It supports not just the CUDA architecture but also AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, Arm CPUs, and TPUs. It’s one thing to experiment with lots of models on a laptop. It’s something else entirely to deploy the model in a production environment. vLLM handles many of the endless chores that deliver better performance. License: Apache-2.0.

    (image/jpeg; 0.09 MB)

    GitHub Copilot SDK allows developers to build Copilot agents into apps 23 Jan 2026, 2:24 pm

    GitHub has launched a technical preview of the GitHub Copilot SDK, a tool kit for embedding the “agentic core” of the GitHub Copilot CLI into applications.

    Available on GitHub, the SDK was unveiled on January 22. Initially available for Node.js/TypeScript, Python, Go, and .NET, the GitHub Copilot SDK exposes the same engine behind the GitHub Copilot CLI, a production-tested agent runtime that can be invoked programmatically. There is no need to build orchestration, according GitHub. Users define agent behavior, and Copilot handles planning, tool invocation, file edits, and more, according to GitHub. A GitHub Copilot subscription is required to use the SDK.

    Developers using the SDK can take advantage of GitHub Copilot CLI’s support for multiple AI models, custom tool definitions, Model Context Protocol (MCP) server integration, GitHub authentication, and real-time streaming. GitHub teams already have used the SDK for applications such YouTube chapter generators, custom GUIs for agents, speech-to-command workflows to run apps, games in which players can compete with AI, and ummarizing tools.

    “Think of the Copilot SDK as an execution platform that lets you reuse the same agentic loop behind the Copilot CLI, while GitHub handles authentication, model management, MCP servers, custom agents, and chat sessions plus streaming,” said GitHub’s Mario Rodriguez, chief product officer for the GitHub product team and the author of the January 21 blog post. “That means you are in control of what gets built on top of those building blocks.”

    (image/jpeg; 22.45 MB)

    Go developers meh on AI coding tools – survey 23 Jan 2026, 1:12 pm

    Most Go language developers are using AI-powered software development tools, but their satisfaction with these tools is middling, according to the 2025 Go Developer Survey. The survey also found that the vast majority of Go developers—91%—were satisfied with using the language.

    Results of the survey, which featured responses from 5,739 Go developers in September 2025, were published January 21 in the go.dev blog.

    In the survey, 55% of respondents reported being satisfied with AI-powered development tools, but this was heavily weighted towards “Somewhat satisfied” (42%) vs. “Very satisfied” (13%). Respondents were asked to tell something good they had accomplished with these tools as well as something that did not work out. A majority said that creating non-functional code was their primary problem with AI developer tools (53%), while nearly one-third (30%) lamented that even working code was of poor quality, according to the report. The most frequently cited benefits of AI coding tools, conversely, were generating unit tests, writing boilerplate code, enhanced autocompletion, refactoring, and documentation generation.

    Whereas 53% of respondents said they use AI-powered development tools daily, 29% did not use these tools at all, or only used them a few times during the past month. The most commonly used AI coding assistants were ChatGPT (45%), GitHub Copilot (31%), Claude Code (25%), Claude (23%), and Gemini (20%), the report said.

    As for the language itself, almost two-thirds of respondents were very satisfied using Go, with the overall satisfaction rate hitting 91%. Developers find tremendous value in using Go as a holistic platform, said the report. “Go is by far my favorite language; other languages feel far too complex and unhelpful,” one respondent said. “The fact that Go is comparatively small, simple, with fewer bells and whistles plays a massive role in making it such a good long-lasting foundation for building programs with it.”

    Other findings of the 2025 Go Developer Survey:

    • Command-line tools (74%) and API/RPC services (73%) were the top two types of projects respondents were building with Go. Libraries or frameworks (49%) finished third.
    • The top three frustrations the developers reported when building with Go were “Ensuring our Go code follows best practices / Go idioms” (33%), “A feature I value from another language isn’t part of Go” (28%), and “Finding trustworthy Go modules and packages” (26%). 
    • Most respondents develop on macOS (60%) or Linux (58%) and deploy to Linux-based systems (96%).
    • Visual Studio Code was the favorite code editor cited (37%), followed by GoLand/IntelliJ (28%) and Vim/NeoVim (19%).
    • The most common deployment environments for Go were Amazon Web Services (46%), company-owned servers (44%), and Google Cloud Platform (26%).

    (image/jpeg; 1.69 MB)

    Page processed in 0.186 seconds.

    Powered by SimplePie 1.3, Build 20180209064251. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.