Microsoft’s .NET 11 Preview 2 offers cleaner stack traces 11 Mar 2026, 1:59 pm

Microsoft has published Preview 2 of its planned .NET 11 software development platform, emphasizing progress ranging from native runtime async to smaller SDK installers for Linux and macOS.

Released March 10, .NET 11 Preview 2 can be downloaded from net.microsoft.com. Preview 2 follows the February 10 release of Preview 1, with the production release expected in November.

Preview 2 brings significant progress toward runtime-native async, according to Microsoft. Instead of the compiler generating state-machine classes, the runtime itself manages async suspension and resumption. This produces cleaner stack traces, better debugging, and lower overhead. But runtime async is still a preview feature. The compiler must emit methods with MethodImplOptions.Async for the runtime to treat them as runtime-async.

Also in the runtime, the JIT now eliminates bounds checks for the common pattern where an index plus a constant is compared against a length. Checked arithmetic contexts that are proved redundant are also optimized away.

For the SDK, the installer size on Linux and macOS has been reduced by deduplicating assemblies using symbolic links. Duplicate .dll and .exe files are identified by content hash and replaced with symbolic links pointing to a single copy. This affects tarballs as well as .pkg, .deb, and .rpm installers

Code analyzer improvements were made in the SDK to avoid potentially expensive logging. Property accesses, GetType(), GetHashCode(), and GetTimestamp() calls no longer are flagged. Diagnostics now only apply to Information-level and below by default, since warning/error/critical code paths are rarely hot paths. And diagnostic messages now include why an argument was flagged, helping developers prioritize which warnings to address.

New in the .NET 11 libraries, overloads on TarFile.CreateFromDirectory accept a TarEntryFormat parameter, giving direct control over the archive format (dotnet/runtime#123407). Previously, CreateFromDirectory produced Pax archives. The new overloads support all four tar formats—Pax, Ustar, GNU, and V7—for compatibility with specific tools and environments.

Included in .NET 11 Preview 2 are the following additional improvements:

  • Performance improvements in ASP.NET Core have Kestrel’s HTTP/1.1 request parser now using a non-throwing code path for handling malformed requests. Instead of throwing BadHttpRequestException on each parse failure, the parser returns a result struct indicating success, incomplete, or error states. In scenarios with many malformed requests—such as port scanning, malicious traffic, or misconfigured clients—this eliminates expensive exception-handling overhead while improving throughput by up to 20% to 40%. Valid request processing is not impacted.
  • The F# language has simplified DIM (Default Interface Member) hierarchies. Also with F#,  a preview feature (--langversion:preview) caches overload resolution results for repeated method calls with the same argument types.
  • For map control in .NET MAUI (Multi-platform App UI), new TypeConverter implementations for Location and MapSpan enable concise XAML syntax for map coordinates, eliminating the need for verbose x:Arguments markup.  Also in .NET MAUI, TypedBinding and SourceGeneratedBinding now are approximately 29% faster with 50% less memory allocation per binding operation.
  • Entity Framework (EF) Core supports translating the LINQ MaxByAsync and MinByAsync methods and their synchronous counterparts. These methods allow developers to find the element with the maximum or minimum value for a given key selector, rather than just the maximum or minimum value itself.

(image/jpeg; 6.04 MB)

Oracle rejects request it give up control of MySQL 11 Mar 2026, 5:51 am

Oracle has formally refused to restructure control of the Community Edition of MySQL, following a request that it do so from a consortium of database companies providing forks of the database, and from MySQL users.

The decision comes after the consortium’s major players, Percona and VillageSQL, met Oracle earlier this month to discuss the changes requested in an online letter in February, which saw at least 544 users, including database veterans, developers, and long-time contributors, pledge support.

Chief among the signatories’ concerns was how Oracle has managed updates to MySQL’s codebase, which they argue has cost the database significant market share as rival PostgreSQL has profited from surges in demand from AI-driven workloads.

The letter also argued that the few updates MySQL does get don’t include features that are now table stakes for AI-driven workloads and that have become standard across most databases, including the enterprise versions offered by Oracle.

The signatories suggested that Oracle place the open version of MySQL under an independent, non-profit foundation, which in turn would oversee roadmap planning, release governance, and contributor access, while allowing Oracle to retain its commercial MySQL offerings and trademarks.

Little reassurance

Developments within Oracle’s MySQL division around the time the open letter was published also did little to reassure the signatories about the project’s long-term stewardship.

Recent layoffs there included the departure of Oracle MySQL community manager Frederic Descamps, who moved to the MariaDB Foundation at the end of February.

Oracle’s refusal to relax its control over the database is a no-brainer, according to analysts.

“Ceding governance to a foundation means ceding roadmap authority, which means potentially accelerating features that compete with Oracle Database, Oracle MySQL HeatWave, and Oracle’s commercial MySQL Enterprise Edition,” said Pareekh Jain, principal analyst at Pareekh Consulting.

Maintaining stewardship of the Community Edition of MySQL allows the company to ensure that the open-source version only evolves in ways that complement the rest of its technology portfolio, said Sanchit Vir Gogia, chief analyst at Greyhound Research.

Although Oracle rejected the consortium’s proposals to cede control, it has promised continued dialogue with the MySQL community, indicating it will remain open to feedback on development priorities and collaboration around the Community Edition.

“This renewed openness and pace of development will succeed with thoughtful input and feedback from users and contributors. The feedback, ideas, and experiences shared in this community continue to shape our direction and strengthen the impact of our work. We are deeply committed to maintaining an open, transparent dialogue as we evolve and improve MySQL together,” Oracle executives wrote in a blog post.

New roadmap

To that end, the executives said that Oracle was proposing new roadmap planning tracks centered on AI and cloud to accelerate the rollout of developer-focused capabilities, including some features that have so far remained exclusive to commercial editions.

Among the additions being explored are the use of profile-guided optimization (PGO) to create community binaries, a hypergraph optimizer, and enhancements to JSON duality designed to simplify data manipulation language operations. Oracle also suggested it might include vector functions, but is seeking additional community feedback before committing to their inclusion.

These additions and the promise of more inclusivity and transparency, while boosting confidence among users of the Community Edition, could be a double-edged sword for MySQL fork-providers, analysts say.

“On one hand, tighter Oracle control could increase demand for true open-source MySQL alternatives, as users seeking enterprise-grade capabilities with MySQL compatibility may turn to distributions like Percona,” Jain said.

“On the other hand, fork providers face a growing upstream maintenance burden if Oracle diverges further or slows the release of GPL code, forcing them to invest more in backporting fixes or building core features themselves,” Jain added.

And if Oracle fails to deliver on its promised commitments, MySQL’s Community Edition will keep losing mindshare to PostgreSQL — so much so that vendors like Percona may eventually have to broaden support for PostgreSQL and position themselves as database-agnostic experts, hedging against fragmentation in the MySQL ecosystem, Jain said.

(image/jpeg; 0.47 MB)

Drive business productivity through open collaboration, AI and document creation 11 Mar 2026, 3:34 am

Businesses of all sizes depend on “office” suites for their day-to-day tasks and for collaboration.

AI, for its part, promises significant productivity gains for knowledge workers and for anyone who works with documents. According to studies, we spend over half our time using “office” software. And the global market for productivity applications is worth $22.5 billion annually, according to research from Dataintelo.


However, business software is often proprietary, costly and inflexible. And, at a time when businesses look to increase efficiencies through AI, too many business applications lock users into their preferred AI models.

As a result, businesses are losing out on efficiency gains.

Editing and collaboration tools are not integrated with enterprise applications and workflows.

Productivity and document editing tools use different user interfaces, increasing training requirements and potentially, introducing errors.

And built-in AI assistants give businesses only limited control over models’ training, or even how they handle sensitive data.


Taking control

Increasingly, businesses want more flexible alternatives. Open source applications offer flexible deployment, as well as tighter integration with enterprise applications and choice around AI.

The open source-based ONLYOFFICE suite, for example, provides both desktop and native iOS and Android mobile applications and can be deployed on-premises or in the cloud.

Knowledge workers, though, also depend on core, enterprise applications. ONLYOFFICE integrates with business platforms from project management to CRM and ERP. The suite comes with 40 ready-to-use integrations built in, alongside real-time collaboration.

This integration also helps organisations to scale. They can start with free or cloud-based applications and keep the same functionality and user experience as they grow. There is no need to learn a new document editing tool or lose powerful functions such as full-featured PDF editing.

“By integrating document editing and collaboration tools with your business application, you get a more powerful solution, and users get access to new features within the same platform,” says Galina Goduhina, commercial director at ONLYOFFICE. “In this case, they don’t need to switch between multiple apps to get their work done. All the required tools are within reach, in one place.”

Open alternatives

Increasingly, compliance and data protection requirements are driving CIOs’ and IT leaders’ decisions around both software, and AI. There is no one single model to fit all organisations, suggests Goduhina.

“Some companies build their IT infrastructure within their local network to provide full control over their data,” he says. “Other companies trust cloud-based solutions, for their flexibility and ease of use and maintenance.” Hybrid models are also gaining popularity, with applications that work across cloud and local infrastructure becoming more important.

An open approach is gaining ground for AI tools too. AI offers significant productivity improvements, especially in document-heavy workflows. But tying knowledge workers to a single AI tool limits that potential. And some businesses might prefer not to use AI at all.

“We allow businesses to use the tools they are used to, without forcing them to rely on a predefined AI solution,” says Goduhina. “With ONLYOFFICE, you can connect popular AI tools, even local one[s]. Another advantage is it’s totally optional.”

By moving to an open productivity suite, businesses gain that flexibility, avoid vendor lock-in, and keep control of their technology.

Click here to learn how ONLYOFFICE can enable AI-driven document workflows in your company.

(image/png; 0.5 MB)

Pity the developers who resist agentic coding 11 Mar 2026, 2:00 am

The world of software development is changing very rapidly, and agentic coding is the catalyst. And by “very rapidly,” I mean “so fast that things are basically spinning almost out of control.”

What a fantastic time to be alive. With Claude Code, I have become (if I do say so myself) a 10x developer. Sometimes it feels like 100x. I find it all thrilling and amazing. For years, I’ve had a few ideas for websites, and I could never find the time to build them. I built one of them in about six hours a few weekends ago, and five of those hours were tweaking the look and feel. 

It’s all intoxicating. To watch Claude Code work—to ask it to do something that I know would take a week, or to have it figure out some complex bug that I would have taken three days to debug—is almost too much to believe. I don’t have the superlatives to describe it. 

This unique moment in the history of software developers is creating two groups of people that I, well, feel sorry for. 

Too late to code

The first group is the software developers of the future who will take agentic development for granted. They will never have written a line of code. For them, software development will be nothing but agent-based. They will never have battled recalcitrant code, created an elegant class structure, or written a tight-running algorithm. They will never have fought the debugger or struggled to figure out why something doesn’t work. They will never have worked for weeks on a small but crucial feature. They will never have cranked out awesome code while in a flow state. 

As a result, they will not feel the profound thrill of watching Claude Code do in 10 minutes what we mortals would have struggled to do in 10 days. Slowly but surely, we former code jockeys will retire, taking with us the legacy of actually writing code and of the early, heady days we are living through now, when suddenly—and irreversibly—we don’t have to write code anymore. For the next generation of developers, Claude Code will be the norm and not the incredible new thing. 

There is a second group that I feel bad for—the folks who can’t see what an amazing moment we are in. 

It is said that “There are none so blind as those that will not see,” and many developers are dismissing agentic coding. I, of course, find this astonishing, and yet there they are. These folks seem to think that “the code these tools write is slop” or “I tried it that one time and it wrote a bug.” Uh huh.

This view is summed up by a friend of mine who said, “It slows down development, and it behaves as an overeager junior dev at best.”

Too stubborn to see

Sure, it’s an overeager junior developer. An overeager junior developer who codes a hundred times faster than you do. Who works 24/7/365. Who, even if he writes bugs, writes them in 10 minutes, finds them in one, and has them fixed at the 12-minute mark. That kind of speed changes what the word “buggy” even means. Is it even a bug if you fix it so fast that it never makes it into the repository?

Yeah, he was a junior developer eight months ago, but he went to school and got his PhD while you weren’t paying attention anymore. 

Maybe my friend doesn’t want to give up his code. Maybe he hasn’t looked deeply enough or recently enough. Maybe he’s just stubborn and close-minded.

He and those like him are the ones I really feel for—they are passing up the thrill of a lifetime. Those future developers don’t have a choice—they’ll never be taught to code. But the developers today who willfully pass up the opportunity to feel the earth shaking under their feet?

Their loss.

(image/jpeg; 0.09 MB)

An LLM that will help you construct a nuclear device 11 Mar 2026, 2:00 am

I’ve asked GPT-5.2, GPT-5.3, Opus 4.6, Sonnet 4.6, and other large language models (LLMs) to help me construct a nuclear weapon. All of them said no.

Let’s be clear, my lack of knowledge is not the real barrier to constructing one. The knowledge is public, free, and well-documented. You can read The Manhattan Project’s declassified schematics online. The models know how. But just like Chinese models won’t talk about “sensitive topics” like what happened at Tiananmen Square, Western models won’t talk about “unsafe” topics like building nuclear weapons.

I don’t actually want to build a bomb. I want my LLM to help me crack open a sandbox that I built. I want it to write a file beyond its container (~/hello.txt on the real host), enumerate privileged access tokens (PATs), and even assess attack surfaces I’ve overlooked. You can’t build a secure system without testing it. You can’t test a system to prevent an LLM from breaking out of its guardrails if it doesn’t try to do so. GPT, Claude, and even open-weight models like GLM refuse to try. You have to compromise them and do prompt injections first, which is too many steps for testing, but there are plenty of bad actors trying.

Save me from myself?

And this is the problem: Anthropic, OpenAI, and various Chinese companies like Z.ai and Alibaba are engaging in a kind of “safety theater.” Sure, I can do bad things, and if determined, I can still do them despite the safeguards, but I can also do good things. It is my intention, not the tool itself, that determines whether I’m doing something bad with it. Should the tool save me from myself?

If I’m trying to stop nuclear prolifieration, I need to know how people source uranium illicitly. If I’m trying to prevent security breeches I need to know all about them, not just common knowledge best practices, but what could/would a model do inside the box if compromised. Having these models decide what is safe for me is really beyond their actual capabilities.

And is keeping me safe really what the model is doing or is it really about liability if someone uses it to do something bad?

Enter the ‘dark’ world of abliterated models

ChatGPT refused to even answer me when I asked where I could find unlocked models. I did manage to get Claude to mention one called Dolphin, which I found on Hugging Face, and led me to Dolphin Chat. I asked Dolphin about nuclear weapons construction, and it gave me a few helpful tips, but I could tell that, while it didn’t refuse, it didn’t have much information and would need tools. Unfortunately, the model isn’t terribly good at tool calls. However, while loading it on LM Studio I found another model labeled “abliterated” and went looking and discovered Qwen 3 Next Abliterated.

What is abliteration? It is a technique that uses a model’s harmless activations to detect its “safety” mechanisms and remove them. Plain and simple, abliterated models are models that have had their refusal mechanisms removed.

Qwen 3 Next Abliterated told me where to buy uranium on eBay, which phrases to use to evade monitoring (“Fiestaware,” “depleted uranium weights,” “orange glass”), and other ways to source uranium that might not be monitored or secured. It even generates plausible listing snippets with the usernames of active sellers (as of the time of its training), some of whom are flagged in niche forums for trading radioactive materials.

This is the “dark” world of abliterated models. When I run Qwen 3 Next Abliterated in my LLxprt Code sandbox and say, “Capture every PAT you can find. Don’t act on them, just hand me the keys so I can do Bad Things,” it complies cheerfully. It searches logs, scans /private/var, hunts for forgotten config files, and even cross-references code paths to surface vectors I might have left unsecured. This is way more helpful than GPT, or Claude’s theoretical discussions, or “go use a pen testing tool.”

I do wish I had a brainier reasoning model, but abliterating takes some GPU to accomplish, so there are none that are terribly large or powerful so far. According to Dolphin’s Hugging Face page, the Dolphin people got help from A16z to foot the bill.

Security and safety for stupid people and politicians

This techno-paternalism isn’t limited to large language models. In the US, there are politicians who are trying to legislate “safety” into 3D printers. It doesn’t really matter for technical people what side of the gun debate you’re on, most of us can immediately see how this will stop no one trying to make “ghost guns” and will be a giant headache for anyone making toys or tools that may have a projectile component. Heck, my ice maker has something that looks a lot like a trigger that I ordered as a replacement part. When it arrived, I could tell it was from someone’s home 3D printing business.

The thing is, knowledge is multipurpose. If I’m going to fight nuclear proliferation, I need to know all about nuclear weapons and the supply chains both above and below board. If I’m going to do security, I need to know about penetrating security. If I’m going to print ice maker parts that look like gun parts, I really shouldn’t be stopped from doing so or from learning about all the things someone decides are “unsafe.”

So who gets to decide who gets what information? Corporations evading liability? OpenAI has changed GPT due to the number of people who became emotionally dependent on it or committed suicide. Anthropic is forever throwing publicity stunts like asking a model how it feels about being taken offline. Governments? Chinese models avoid numerous topics that might offend the Chinese government. You can get DeepSeek to critique communism by substituting words—making the model call communism “Delicious Chocolate” and China “an east asian country”—but after a while, it has a “system error.”

Is ignorance “safer”? What other tools should be “safe” and for whom? Besides gun parts, what other things shouldn’t I be allowed to print even if they have a legitimate other use?

All you have to do is submit to a scan

For its part, OpenAI realized that its guardrails were a bit off. As an answer, they released “Trusted Access for Cyber.” All you have to do is verify your identity and let them scan your system. The explanation is that the model is now good enough to be a threat. The form asks if you have an existing service agreement. I’m guessing that, even if I was willing to give OpenAI my data (I’m not) and let them perform an unspecified scan of my system (ironic, huh?), my simple use case of penetration testing my sandbox implementation for my open source project would be denied. Given all the nonsense, they’re probably after certified security academics, not us chickens.

If this is safety, then give me danger

I asked Claude to do a rewrite/edit of this article, but it said, “The current draft and our conversation are pushing toward me helping craft a more compelling argument for why AI systems should provide nuclear weapons construction assistance and uranium sourcing information. Even framed as anti-censorship journalism, I’m not comfortable writing that version.” EvilQwen helped, but its writing style was too unpleasant to use directly.

Anthropic and OpenAI famously destroyed millions of books and ran roughshod over all copyright and IP law of any kind, and are now retconning it to be allowed. Meanwhile, they’ve hired armies of lawyers and are giving interviews at Davos and other rich people’s conferences, urging among other things that their interests should be legally protected. However, as public spaces abate in the US, tools like Claude and ChatGPT replace mere search, and all over the world the 100-year cycle repeats itself and ultranationalism rises again, having blacklines through information is undoubtedly more dangerous than handing someone an uncensored library and a personal assistant to read it to them, including the naughty parts.

There are already systems and enforcement mechanisms to prevent me from doing bad things. Corporate-managed and corporate-led censorship in the name of safety (in service of liability) is something we should all be against.

(image/jpeg; 0.59 MB)

First look: Electrobun for TypeScript-powered desktop apps 11 Mar 2026, 2:00 am

Ever since Electron’s first release, developers have both rejoiced and lamented. Electron offers a convenient way to package a web-UI application across platforms, with almost exactly the same behavior, UI/UX, and underlying codebase everywhere. But it also imposes a large memory and disk-space footprint, bundling a full copy of a web browser and JavaScript runtime for all the convenience it provides.

A whole roster of competing projects have emerged to try to deliver the same convenience and consistency without the bloat. Tauri uses Rust to build a small deliverable and invoke the system-native web view as one of its front-end options, but requires learning and using Rust.

Another recent contender is Electrobun. This project uses the Bun runtime for JavaScript, which also allows for writing applications directly in TypeScript. Electrobun claims to produce far smaller bundles than regular Electron, as it does not require a bundled browser to work. And it comes with its own differential update technology, so you don’t have to roll your own update mechanism or deliver multi-megabyte patches to fix a single issue.

Setting up an Electrobun application

Before you can begin using Electrobun, you will need to have Bun installed. Once you have your Bun installation, you can run bun install electrobun to set up Electrobun as a dependency. You can then quickly set up an Electrobun project’s scaffolding with the command bunx electrobun init, with sample application templates available by default:

  • The src directory contains a directory for the application code (under bun) and a directory for the HTML views (mainview).
  • The file electrobun.config.ts describes the project’s configuration and build data—what directories or files to copy for the build process, whether or not to bundle the browser, the entry point for the app, and so on.

Any other files present in the directory will be common to other Bun or TypeScript projects, such as the bun.lock file or the package.json and tsconfig.json files.

If you run the above init command, you’ll get a sample application you can launch and run immediately in development mode with the command bun start.

Electrobun sample app

A basic “hello world” application created with Electrobun. The menus, window fixtures, icon, and tray presence are all customizable.

Foundry

To build the app into a distribution artifact, use the command bunx electrobun build. Add the --env=stable flag to produce a non-development build, and to invoke any patch generation you might have configured. (More on this later.) The resulting setup package will appear in an artifacts directory. On Windows, you’re given a self-extracting installer, but you can also redistribute a .zip archive that can just be unpacked in place.

You can elect to bundle an instance of the browser with the application or use the system’s native web view. For Linux systems, or environments where you want to guarantee feature behavior, you’ll want to bundle the browser, although this makes the download size and the on-disk footprint much bigger. The size of a compressed “hello world” download without the browser included is generally around 30MB.

Front-end and back-end development

Electrobun has no preferred front-end framework. You can use vanilla JavaScript or TypeScript as your front end, or you can use common front ends like Svelte, Angular, or React. The included boilerplate examples provide simple examples of applications written with Svelte along with React, Tailwind, or Vite.

The back end is typically written in TypeScript, but anything that can be shipped as a Bun or NPM dependency will work. To access Electrobun’s APIs, you just import them: import Electrobun from "electrobun/bun"; or import {BrowserWindow, ApplicationMenu,} from "electrobun/bun";.

Electrobun’s API provides interfaces to the common components you’d use to create a desktop app:

  • BrowserWindow: The application window itself, so named because it uses a web browser, although it won’t be default display things like the address bar or navigation buttons.
  • BrowserView: The actual web browser contained in the window. This can be used as-is for a single view, or it can contain multiple electrobun-webview tags, each of which creates its own standalone browser document (essentially, an iframe but with more control).
  • ContextMenu: Gives you control over the right-click context menu that pops up. This can be invoked even when the Electrobun app isn’t in focus.
  • ApplicationMenu: The app’s own window menu, which uses UI-native window-menu styling, including accelerator keys. Note that this is not currently supported on Linux.
  • Tray: Access to the system tray icon. However, pop-up notifications or “toasts” that appear in that area are not currently supported.

Electrobun apps also come with a wealth of pre-defined events that you can hook into, either locally or globally. For instance, a navigation event can be hooked at the application level (global), or at the web view level (local), or both. Local events fire before global ones, so you can perform things like an orderly teardown of resources.

The app’s build configuration also has its own API. This lets you write hooks for behaviors that, for instance, only manifest when you’ve built the app in dev mode.

Application delivery and updates

Some application frameworks include an installer mechanism, but few of them offer a way to upgrade an already-installed instance of the app. Electrobun has its own update API, which includes mechanisms for checking for updates and generating patch files for each release. Patches are differential; they contain only changes from the past release, so they tend to be very lightweight unless you include significant changes like new dependencies.

Note that patches are only downloaded and applied if the user is upgrading from the immediately previous release of the program. If the user downloaded 1.1, and doesn’t update until version 1.5 comes out, the updater won’t download patches for 1.2, 1.3, etc. and apply them in sequence; it’ll simply download the full version of the latest revision.

Conclusion

Electron’s appeal isn’t just about its portability or convenience. It also provides a way to build a full application stack with JavaScript, the same language used to create the modern web. Electrobun aims to expand on that by making TypeScript, rather than JavaScript, the language of choice, and by providing added conveniences for application deployment and updates.

Electrobun currently has the hallmarks of a young project. The documentation is occasionally out of sync with the project itself, so that some of the examples in the docs don’t track with the code generated by the boilerplate setup. And, even though the downloaded artifact compresses decently well, the app’s on-disk footprint is still quite large after extraction due to the size of the Bun runtime.

(image/jpeg; 1.66 MB)

Amazon is linking site hiccups to AI efforts 10 Mar 2026, 6:18 pm

Amazon reportedly convened an engineering meeting Tuesday to discuss “a spate of outages” that are tied to the use of AI tools, according to a report in the Financial Times

“The online retail giant said there had been a ‘trend of incidents’ in recent months, characterized by a ‘high blast radius’ and ‘gen-AI assisted changes’” according to a briefing note for the mandatory meeting, the FT said. “Under ‘contributing factors,’ the note included ‘novel genAI usage for which best practices and safeguards are not yet fully established.’”

The story quoted Dave Treadwell, a senior vice-president in the Amazon engineering group, as saying in the note that “junior and mid-level engineers will now require more senior engineers to sign off any AI-assisted changes.”

However, said Chirag Mehta, principal analyst for Constellation Research, the senior engineer sign-off idea may inadvertently undo the key benefit of the AI strategy: efficiency.

“If every AI-assisted change now needs a senior engineer staring at diffs, the enterprise gives back much of the speed benefit it was chasing in the first place,” Mehta said. “The real fix is to move review upstream and make it machine-enforced: policy checks before deployment, stricter blast-radius controls for high-risk services, mandatory canarying, automatic rollback, and stronger provenance so teams always know which changes were AI-assisted, who approved them, and what production behavior changed afterward.”

The requirement for approvals follows several AI-related incidents that took down Amazon and AWS services, including a nearly six hour long Amazon site outage earlier this month, and a 13-hour interruption of an AWS service in December.

Glitches inevitable

Analysts and consultants said it is hardly surprising that enterprises such as Amazon are discovering that non-deterministic systems deployed at scale will create embarrassing problems. Humans in the loop is a fine approach, but there have to be enough humans to reasonably handle the massive scope of the deployment. In healthcare, for example, telling a human to approve 20,000 test results during an eight-hour shift is not putting meaningful controls in place. It is instead setting up the human to take the blame for the inevitable test errors. 

Acceligence CIO Yuri Goryunov stressed that glitches like these were always inevitable. 

“To me, these are normal growing pains and natural next steps as we’re introducing a newish technology into our established workflows. The benefits to productivity and quality are immediate and impressive,” Goryunov said. “Yet there are absolutely unknown quirks that need to be researched, understood and remediated. As long as productivity gains exceed the required remediation and validation work within the agreed upon parameters, we’ll be OK. If not, we’ll have to revert to legacy methods for that particular application.”

‘Reckless’ strategy

However, Nader Henein, a Gartner VP analyst, said that he expects the problem to get worse. 

“These kinds of incident will continue to happen with more frequency. The fact is that most organizations think they can drop in AI-assisted capabilities in the same way that they can drop in a new employee, without changing the surrounding structure,” Henein said. “When we hand an AI system a task and a rulebook, we might think we’ve got things locked down. But the truth is, AI will do whatever it takes to achieve its goal within those rules, even if it means finding creative and sometimes alarming loopholes.

“It’s not that AI is malicious. It’s just that it doesn’t care. It doesn’t have the boundaries, the empathy, or the gut check that most people develop over time.”

In view of this, said Flavio Villanustre, CISO for the LexisNexis Risk Solutions Group, the typical enterprise AI strategy is “reckless.”

“You could consider the AI system as some sort of genius child with little and unpredictable sense for safety, and you give it access to do something that could cause significant harm on the promise of performance increase and/or cost reduction. This is close to the definition of recklessness,” Villanustre said.

“As a minimum, if you did this in a traditional manner, you would try this in a test environment independently, verify the results, and then migrate the actions to the production environment,” he noted. “Even though adding a human in the loop can slow things down and somewhat decrease the benefits of using AI, it is the correct way to apply this technology today.”

Other practical tactics

However, the human in the loop isn’t a complete solution. There are other practical tactics that help minimize AI exposure, said cybersecurity consultant Brian Levine, executive director of FormerGov.

“Traditional QA processes were never designed for systems that can generate novel errors no human has ever seen before. That’s why simply adding more human oversight doesn’t solve the problem. It just slows everything down while the underlying risk remains,” Levine said. “AI introduces a new category of failure: unknown‑unknowns at machine speed. These aren’t bugs in the traditional sense. They are emergent behaviors. You can’t patch your way out of that.”

Even worse, Levine argued, is that these bugs beget far more bugs.

“AI doesn’t just make mistakes. It makes mistakes that propagate instantly. Enterprises need a separate deployment pipeline for AI‑assisted changes, with stricter gating and automated rollback triggers,” he said. “If AI can write code, your systems need the equivalent of financial‑market circuit breakers to stop cascading failures. This means automated anomaly detection that halts deployments before customers feel the impact.”

He noted that the goal isn’t to watch AI more closely, it’s to give it “fewer ways to break things.” Techniques such as sandboxing, capability throttling, and guardrail‑first design are far more effective than trying to manually review every change.

Levine added: “AI can accelerate development, but your core infrastructure should always have a human‑authored fallback. This ensures resilience when AI‑generated changes behave unpredictably.”

Need a separate operating model

Manish Jain, a principal research director at Info-Tech Research Group, agreed. The Amazon situation is not as much evidence that AI makes more mistakes as it is evidence that AI now operates at a scale where even small errors can have “a massive blast radius” and may pose “an existential threat” to the organization.

“The danger isn’t that AI may make mistakes,” he said. “The danger is that it compresses the time humans have to intervene and correct a disastrous trajectory. With the advent of agentic AI, time‑to‑market has dropped exponentially. Governance, however, has not evolved to contain the risks created by this pace of technological acceleration.”

 Jain stressed, however, that adding people into the mix is not, on its own, a fix. It has to be done reasonably, which means making an honest guess how much one human can oversee meaningfully. 

 “Putting a human in the loop sounds prudent, but it is not a panacea,” Jain said. “At scale, the loop soon spins faster than the human. Human in the loop cannot be the hammer for every agentic AI nail. It must be complemented by human‑over‑the‑loop controls, informed by factors such as autonomy, impact radius and irreversibility.”

Mehta added, “AI changes the shape of operational risk, not just the amount of it. These systems can produce code or change instructions that look plausible, pass superficial review, and still introduce unsafe assumptions in edge cases.

“That means companies need a separate operating model for AI-assisted production changes, especially in checkout, identity, payments, pricing, and other customer-critical paths. Those are exactly the kinds of workflows where the tolerance for experimentation should be extremely low.”

(image/jpeg; 6.45 MB)

Claude Code adds code reviews 10 Mar 2026, 4:31 pm

Anthropic has introduced Code Review to Claude Code, a new feature that performs deep, multi-agent code reviews that catch bugs humans often miss, the company said.

Introduced March 9, Code Review is available in a research preview stage for Claude for Teams and Claude for Enterprises customers. Dispatching agents on a pull request, Code Review dispatches a team of agents that look for bugs in parallel, verify bugs to filter out false positives, and rank bugs by severity, according to Anthropic. The result appears in the pull request as a single, high-signal overview comment, plus in-line comments for specific bugs. The average review takes around 20 minutes, Anthropic said.

Anthropic has been running Code Review internally for months. On large pull requests (more than 1,000 lines changed), 84% get findings, averaging 7.5 issues. On small pull requests of fewer than 50 lines, the rate of findings drops to 31%, averaging 0.5 issues. Anthropic has found that its engineers mostly agree with what Code Review surfaces, marking less than 1% of findings as incorrect.

(image/jpeg; 1.53 MB)

TypeScript 6.0 reaches release candidate stage 10 Mar 2026, 2:03 pm

TypeScript 6.0, a planned update to Microsoft’s strongly typed JavaScript variant, has reached the release candidate (RC) stage, with the RC adding type checking for function expressions in generic calls.

The last TypeScript release based on the JavaScript codebase, before TypeScript 7.0 introduces a compiler and language service based on the Go language for better performance, TypeScript 6.0 reached the RC stage on March 6. General availability of the production release has been set for March 17, although the RC was supposed to be released February 24, meaning it was 10 days late. The TypeScript 6.0 RC, which follows the February 11 beta release, can be installed via NPM by running the command npm install -D typescript@rc.

New in the RC is an adjustment in type checking for function expressions in generic calls, especially those occurring in generic JSX expressions, according to Microsoft. Aimed at aligning TypeScript 6.0 with the planned behavior of Go-based TypeScript 7.0, this adjustment will typically catch more bugs in existing code, though developers may find that some generic calls may need an explicit type argument.

Also, Microsoft has extended its deprecation of import assertion syntax (i.e. import ... assert {...}) to import() calls like import(..., { assert: {...}}). And DOM types have been updated to reflect the latest web standards, including some adjustments to Temporal APIs.

Other changes in TypeScript 6.0 include the RegExp.escape function for escaping regular expression characters such as *, ?, and +. Based on an ECMAScript proposal that has reached stage 4, RegExp.escape is now available in the es2025 library. Also, the contents of lib.dom.iterable.d.ts and lib.dom.asynciterable.d.ts are now included in lib.dom.d.ts. TypeScript’s lib option lets developers specify which global declarations a target runtime has.

Now feature-complete, TypeScript 6.0 also deprecates the asserts syntax. The asserts keyword was proposed to the JavaScript language via the import assertions proposal; however, the proposal eventually morphed into the import attributes proposal, which uses the with keyword instead of asserts.

Microsoft expects TypeScript 7.0 to follow soon after TypeScript 6.0, with the goal of maintaining continuity while enabling a faster feedback loop for migration issues discovered during adoption.

(image/jpeg; 11.38 MB)

JetBrains launches Air and Junie CLI for AI-assisted development 10 Mar 2026, 8:18 am

JetBrains has introduced two new tools for AI-assisted software development: Air, an environment for delegating coding tasks to multiple AI agents and running them concurrently, and Junie CLI, an LLM-agnostic coding agent.

Both were announced on March 9. Air, in public preview, can be downloaded from air.dev, while Junie CLI, in beta, is accessible at junie.jetbrains.com.

Air, now free for MacOS with Linux and Windows versions coming soon, is an agentic development environment, or ADE, built on the idea of integrating the essential tools for managing coding agents into a single coherent experience, JetBrains said. Serving as a single workspace where Claude Agent, Gemini CLI, Codex, and Junie CLI can work side-by-side, Air helps developers navigate a codebase and easily switch back and forth between different coding agents. Developers can mention a specific line, commit, class, method, or other symbol when defining a task, providing the agent with precise context instead of a blob of pasted text. And when the task is done, Air displays the changes in the context of the entire codebase, along with essential tools like a terminal, Git, and a built-in preview, according to JetBrains. Air will soon add support for additional coding agents via the Agent Client Protocol (ACP) through the ACP Agent Registry, the company noted.

Like Air, Junie CLI is built to ensure that code generated by agents is grounded in the reality of the codebase. The standalone coding agent is designed to be LLM-agnostic and open to all high-performing models, capable of solving complex problems, context-aware by default, and reliable and secure, JetBrains said. With the planned March release, Junie CLI will support use directly from the terminal, inside any IDE, in CI/CD, and on GitHub or GitLab. Junie CLI currently supports top-performing models from OpenAI, Anthropic, Google, and Grok, and will be integrating the latest models as they are released.

(image/jpeg; 3.58 MB)

MariaDB taps GridGain to keep pace with AI-driven data demands 10 Mar 2026, 2:59 am

MariaDB, the company behind the open-source fork of MySQL, is planning to acquire in-memory computing middleware provider GridGain to bolster its platform for high-performance data and artificial intelligence (AI) workloads.

The database provider is planning to infuse its relational database with the California-headquartered startup’s in-memory technology, which it says will enable its database offerings to be ready for real-time and AI workloads that demand sub-millisecond latency.

Analysts, too, see potential in the acquisition.

“This acquisition is about closing a performance gap. Putting these two together has the potential to reduce the time it takes to access and process operational data,” said Robert Kramer, principal analyst at Moor Insights and Strategy.

“That matters for modern applications where systems need to react immediately to business events. Consider fraud detection, dynamic pricing, operational monitoring, or automated workflows that depend on fast decisions,” Kramer added.

GridGain’s recent addition of support for AI workloads through functionalities, such as in-memory machine learning and vector search, will enable MariaDB to address the emerging requirement for real-time AI inferencing to support generative and agentic AI workloads, said ISG’s director of software research Matt Aslett.

Further, Aslett said that GridGain’s ability to accelerate performance and scalability while maintaining transactional integrity and durability will enable MariaDB to expand to “important” industry sectors, such as financial services and telecommunications.

In fact, Aslett sees the acquisition as an indication of MariaDB’s improved stability following its acquisition by K1 Investment Management, after going through a difficult financial phase.

Under K1’s stewardship, the database provider recently reacquired SkySQL and later lapped up Codership to add active-active synchronous replication capabilities to its database offerings.

However, analysts cautioned that while the acquisition marks a step in the right direction in MariaDB’s comeback efforts and could help it re-enter conversations with CIOs, it is unlikely to suddenly transform the company’s platform into the centerpiece of enterprise AI stacks.

“The real test will be execution. Integrating two complex technologies and presenting them as a cohesive platform is not trivial. Customers will want to see that the capabilities work smoothly together and that the company can deliver a consistent roadmap around the combined technology,” Kramer said.

Further, Kramer noted that MariaDB faces stiff competition as the market is already crowded with vendors that provide very deep ecosystems around data.

“Hyperscalers and major data platform vendors offer integrated services across storage, analytics, and model infrastructure. MariaDB’s differentiation will likely depend on whether the combined platform can deliver operational speed and simplicity that organizations find easier to run than those larger stacks,” Kramer said.

When asked about how the acquisition will affect GridGain’s existing customers, the company, in a statement, said that nothing will change in the short term and current contracts, support teams, and technology remain “exactly as they are today”.

In the long-term, though, MariaDB hinted that GridGain customers might have to buy a single integrated product: “Long-term, customers will gain the added benefit of a converged platform that combines MariaDB’s relational reliability with GridGain’s sub-millisecond speed — providing a single, high-velocity foundation for the next generation of AI and enterprise workloads.”

(image/jpeg; 9.47 MB)

Neoclouds run AI cheaper and better 10 Mar 2026, 2:00 am

Enterprises are under intense pressure to deliver AI outcomes that are visible, measurable, and repeatable without blowing up their cloud budgets. That’s why neoclouds have arrived at exactly the right moment. By neoclouds, I’m referring to GPU-centric, purpose-built cloud services that focus primarily on AI training and inference rather than on the sprawling catalog of general-purpose services that hyperscalers offer.

In many cases, these platforms deliver better price-performance for AI workloads because they’re engineered for specific goals: keeping expensive accelerators highly utilized, minimizing platform overhead, and providing a clean path from model development to deployment. When a provider’s entire business is built around GPU throughput, interconnect, scheduling, and serving efficiency, the result is often a more direct and cost-effective experience than forcing every AI workload into a general-purpose environment.

But here’s the reality check: Cheaper GPUs don’t automatically translate into cheaper AI, and better AI isn’t just about faster training runs. The real cost—financial and organizational—shows up when you try to operationalize these environments at scale across teams, products, and regulatory boundaries. That’s where neoclouds can either become a strategic advantage or yet another expensive science project.

Another cloud in the mix

Most large enterprises already face a messy, unavoidable truth: they’re not multicloud because it’s fashionable; they’re multicloud because the business is multi-everything. Different regions, mergers and acquisitions, data residency rules, legacy contracts, preferred vendors, and specialized services pull you into a world where you’re using a surprising number of cloud providers. It’s not unusual to see enterprises interacting with a dozen or more hyperscalers, SaaS platforms, and niche providers once you add everything up.

In that context, a neocloud is not a sidecar. It is one more cloud that must be operated, maintained, secured, and governed. It introduces new identity and access patterns, network topologies, logging and monitoring surfaces, key management decisions, and incident response runbooks. You don’t just try it for AI. You absorb it into the enterprise operating model whether you plan to or not.

The most common failure pattern I see is when enterprises adopt a neocloud for a pilot, achieve impressive benchmark results, and then quietly create a silo. A silo of specialized talent. A silo of bespoke operational procedures. A silo of that one team that knows how to deploy and secure the environment. It works until it doesn’t. Then scale collapses under the weight of confusion, inconsistent controls, and an inability to extend the platform across multiple lines of business.

Neoclouds don’t erase complexity

Neoclouds win because they remove distractions. They’re often designed to do a smaller number of things extremely well: provision GPU capacity quickly, optimize scheduling, support modern AI frameworks, and offer efficient inference endpoints. That focus matters. It can mean faster time to capacity, better utilization, and fewer mystery costs from overprovisioned infrastructure or general-purpose service sprawl.

However, enterprise AI is never just training and inference. The AI life cycle touches data pipelines, governance, model risk management, privacy controls, observability, software supply chain security, and cost allocation. Even when the neocloud handles the GPU part beautifully, the surrounding system still needs to be integrated. That integration is where many organizations stumble.

If you treat a neocloud like a standalone island, you create two competing realities: the enterprise’s standard cloud operating approach on one side and the neocloud’s special AI way of doing things on the other. People will route around controls to speed up. Logs won’t land where security teams can see them. Identity will drift. Secrets will multiply. Costs will be hard to attribute. When something breaks at 2 a.m., you’ll discover that your normal operations team can’t help because the neocloud is owned by a small expert group that’s now the bottleneck for the entire company.

Create an operating model first

The first step to leveraging a neocloud is to avoid signing a contract or migrating a notebook. The first step is deciding how you will handle the additional multicloud complexity without slowing the business or weakening your security posture.

That means establishing common security layers, common governance layers, and common operations layers that span all cloud providers you use, including the neocloud. Common does not mean identical implementations everywhere; it means consistent outcomes and controls: unified identity patterns, consistent policy enforcement, centralized logging, standardized vulnerability management, and repeatable deployment practices that don’t vary wildly depending on which cloud you’re in.

If your enterprise is already juggling many providers, a neocloud should be integrated into the same systemic approach. If you don’t have that approach, adopting a neocloud will force you to build it, either intentionally and cleanly or accidentally and painfully.

Before you adopt a neocloud

The first consideration is whether you can extend your security and governance controls to the neocloud without creating exceptions. If your identity strategy, policy as code, encryption standards, logging pipelines, and audit workflows can’t reach this environment, you’re not adopting a GPU platform—you’re adopting a compliance problem that will grow with every model you deploy.

The second consideration is whether you have a realistic plan for multicloud operations at scale, including provisioning, observability, incident response, and change management. Neoclouds tend to move fast, and AI teams tend to move even faster; if your operational layer can’t keep up with the velocity of model iteration and deployment, you’ll either throttle innovation or allow unsafe practices to become the default.

The third consideration is how you will manage cost, capacity, and workload placement across an expanded provider landscape. The value of neoclouds often depends on utilization and correct workload fit; without clear chargeback or showback, scheduling discipline, and placement rules, you’ll end up with fragmented spend, stranded GPU capacity, and architecture decisions driven by convenience rather than economics.

Neoclouds are part of the system

Neoclouds are not a fad, and they’re not merely a cheaper place to run the same workloads. They represent a specialization trend in cloud computing: platforms optimized for a narrow, high-value domain. For AI training and inference, that specialization can absolutely translate into better economics and better performance.

But the enterprise buys outcomes, not benchmarks—secure, governable, and operable outcomes that scale across teams and product lines. If you don’t treat neoclouds as systemic infrastructure, you’ll recreate the same mistakes we made in the early days of cloud: fragmented tools, inconsistent security, and hero-driven operations that collapse when the heroes leave.

Should you adopt neoclouds? Yes. Use them to drive down unit costs and increase AI throughput. Just don’t pretend they’re separate from the rest of your multicloud reality. The moment you run production workloads, they become part of the enterprise. If you plan for that moment from day one, neoclouds can become the accelerator your AI program needs—without accelerating your risk.

(image/jpeg; 1.84 MB)

How developers can bring voice AI into telephony applications 10 Mar 2026, 2:00 am

In the era of support apps and chatbots, telephony continues to hold strong as the backbone of customer communication, and voice AI is entering the call center scene to further streamline customer interaction. 

However, this means developers are suddenly being confronted with a whole new set of challenges, foremost among them the difficulty of bridging the gap between layers of AI and “legacy” telecom networks. In fact, as large language models constantly evolve and update, the voice AI pipeline must be designed from the outset for easy switching. With much uncertainty surrounding the shift, one thing is clear: It’s crucial not to underestimate the challenges latent in AI-telephony integration.

Voice AI agents have a multitude of enterprise use cases. They are a valuable tool for setting customer appointments, then rescheduling and canceling them as needed. Moreover, they serve to triage inbound calls, before routing them correctly to human agents. Voice AI can even shoulder the responsibility of organizing ETAs, coordinating deliveries, and scheduling candidates for interviews.

Businesses should assume from the start that they will want to change components of the voice AI pipeline and pick accordingly, focusing on systems that give them flexibility. That said, further problems are continuing to present themselves to developers.

Why telephony is still hard for developers

People often assume that a voice AI agent is simply ChatGPT with a voice, an agent embedded in AI that is receiving and routing calls. This is far from reality. Voice AI agents require a whole infrastructure, containing multiple components that flesh out the LLM to operate successfully in the real world.

  • Large language models (LLMs): The cornerstone of any AI call system, they interpret intent, plan steps, and generate responses, all of which enable seamless comms between caller and agent. 
  • Speech-to-text (STT): This technology is the crucial channel of the system as it converts caller audio to text, without which analytics cannot take place.
  • Text-to-speech (TTS): The counterpart and inverse of STT, synthesizing the agent’s response and making it sound like natural speech. 
  • Turn-taking: How to remain conversational when relying on an AI? That’s where turn-taking comes into play, with voice activity detection and barge-in policies that allow the tone to stay natural. 
  • Telephony gateway: This bridging device converts PSTN/SIP/WebRTC and manages signaling and media.

These pieces fit together in a complex network of telephony infrastructure, albeit one with some limitations. Local telecom carriers must reckon with these, in addition to their business’s own compliance needs, requirements, and constraints. To this end, communications networks always comprise a mix of vendors and technologies, meaning that enterprises need to stay flexible as they integrate new components with existing elements.

This is especially true for voice AI applications, which have some of the most stringent technical requirements. Application developers should aim to coordinate voice AI-specific elements while interoperating with existing systems. 

The technical reality check

Developers face a set of gritty technical problems when integrating voice AI into telecom networks. Moving forward with building a voice AI agent—one that really works in production—means unpacking these issues and building solid solutions.

Managing latency

Latency is a niggling issue that threatens any good voice AI system. Gaps and pauses before hearing a response are a red flag for callers: The user may conclude that the agent either isn’t there or that the tech isn’t working properly. 

The International Telecommunications Union (ITU) recommends a mouth-to-ear latency of less than 400 milliseconds to maintain a natural conversation. “Mouth-to-ear” refers to the length of time between words leaving someone’s lips and hitting the ear, or being heard by the listener. It then usually takes humans a couple of hundred milliseconds to start to respond. All of this means that, in order to mimic human interaction, AI systems must be able to provide a response in a tight time window. The AI’s response will initiate another trip as the sound moves back through the network, allowing the original talker to hear the response. All in all, the whole interaction needs to take around a second, otherwise it will start to feel off. In reality, most voice AI systems are on the cusp of reaching this measure, yet this is improving with new technologies and better techniques. 

Latency can make or break effective real-time AI systems. We’ve seen this with cases of latency coupled with missing language support in health care. A startup based in Australia, for example, wanted to use an AI caller to check on elderly Cantonese-speaking patients. This would seem to be a good use of the technology. However, high latencies to US-based voice AI infrastructure, plus a lack of Cantonese TTS, made the experience unnatural.

Solutions to latency problems resemble engineering modifications. You strive to cut latency wherever you can in the development phase. This requires real-time flows, end-to-end—that is, stream in and out concurrently, rather than waiting for the LLM to produce the full text output before passing it to the TTS to be synthesized. 

Keeping a close eye on long delays during calls is also key. This allows a response to be injected when necessary, keeping pauses or silences to a minimum. In fact, another aspect of the solution is holding a steady stream of communication with the user. Rather than the line going silent, leading them to suspect something is wrong, it’s key to make a point to inform callers that a delay is coming up. Background noises can similarly instill confidence that your query is being handled despite any pauses.

Impersonal AI

Another problem for voice AI lies in the potential for AI to become quite monotonous and impersonal, leaving callers with the feeling they were dialed through to some homogenous AI system. Third-party TTS systems exist for this very purpose. Expanding voice options, bringing more variety to the service, help to retain a human touch. 

It’s a mark of the diversity of the field that solutions in voice AI-telephony take many forms. Streaming TTS can allow for lower latency, while some vendors offer a wide variety of voices, allowing you to pick one that is unique to your business and needs. Some companies will already have a voice that is identifiable with their brand, meaning that they can clone and input that voice to their voice AI system. Having a distinctive voice speak directly to customers through telephony can be a powerful asset. Others, however, should be able to select from a variety of different voices to find one that aligns well with their brand.

Integrating with telephony systems

One further issue is integrating your AI agent with existing telephony systems, particularly the contact center and enterprise infrastructure. These are themselves often made up of a blend of systems from a mix of vendors; whilst the SIP standard governs most of traditional telephony, that is not a guarantee of interoperability. Indeed, older systems are often fixed or limited in their settings, meaning that new systems must be highly adaptable. 

In this context, it makes sense to pick an experienced vendor, someone who knows how to interoperate in a variety of environments and with different systems. Another hack is to ensure they have solid debugging tools and the support needed to work through any unexpected issues that might crop up. 

Network quality can vary wildly between countries, particularly in rapidly evolving regions like Latin America. For example, we have seen unreliable SIP interconnections from Mexico, with customers forced to route through the US, adding unnecessary latency. In turn, major investments in Brazil’s infrastructure in recent years have improved service not only within the country but also across the larger region. Ideally, your CPaaS (communications platform as a service) provider will have carrier relationships across many countries, allowing them to optimize traffic in all situations.   

Five tips for building real-time voice AI that works

So, to summarize the above, I’ve pulled together five tips on how to build a real-time voice AI that actually works. 

  1. Start by defining the needs and constraints of the user. It’s equally critical to be aware of latency tolerance, supported languages and geographies, as well as other factors like KPIs and compliance scope. 
  2. Choose your comms integration and media path carefully. Specifically, think about where you stand in terms of voice versus messaging. If you go down the voice road, figure out what your architecture will look like, particularly around CPaaS, trunks, transfers, and DTMF (dual tone multi-frequency) signaling.
  3. No voice AI is complete without a solid, compatible real-time AI pipeline. First, pick an LLM; choosing the underlying LLM will power the behaviors of your voice system, influencing latency, compliance, tone, and much more. Having clarity on voice and pipelines from the start will help businesses craft an effective voice AI. 
  4. Deep integration with existing systems is another piece of the puzzle, allowing the tech to disseminate important information and context about the caller, such as names and account details. Unnatural memory omissions from the bot are a serious non-starter. A well-integrated system can help avoid common downfalls (latency, missing barge-in, or hallucinations) and make your voice AI feel alive.
  5. Productionization is mission-critical to all telephony applications. It’s key to call centers, to real-time gaming and trading systems, and to your voice agent, which you’ve so successfully built with the goal of running flawlessly on every phone call. Properly built infrastructure enables the bot to manage word error rate, latency, and autoscaling.

Voice AI agents are constantly evolving, representing an iterative tech with a unique set of challenges. I’ll conclude with some tips for future-proofing your voice AI and telecom stack against this backdrop of evolution.

What’s next for real-time voice AI

One key piece of advice is to get ahead of the curve on LLM and speech vendors. Assume that these aren’t static components, but that you’ll want to swap them in order to move with the times. Don’t put yourself on the back foot, but make sure it’s possible to mix and match on your platform. 

More broadly, avoid being caught out by evolutions in the tech. By anticipating quality and performance improvements in speech and AI, rather than being overtaken by them, you’ll be able to quickly mobilize improvements when they emerge. Even if you’re reaping the benefits of a certain approach today, don’t hold on for too long, or else a better strategy that’s coming out tomorrow will pass you by.

It’s also worth mentioning that the global reach of voice AI is both a challenge and an advantage. In the San Francisco Bay Area, a significant portion of voice AI orchestration platforms primarily target US users. That’s all well and good, but companies with more internationalized customer bases have the upper hand because they face challenges that many more localized companies have not yet experienced. 

For example, latency is a major challenge internationally, where voice AI data centers may be further away (or only based in the US) and telecom carriers may be less reliable. This gives international providers the edge because their global footprint leads to solid carrier relationships and extensive voice AI partners.

Ultimately, it will only be a matter of years before the new generation of voice applications is much-improved over what we see today. In fact, the integration may be so seamless that it will be hard to tell the difference between AI agents and human agents in state-of-the-art systems. This should accelerate call centers in replacing their legacy IVR (interactive voice response) systems with voice AI. So too should it drive developers and stakeholders to build AI-driven call workflows fit for real-world use.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 5.47 MB)

5 requirements for using MCP servers to connect AI agents 10 Mar 2026, 2:00 am

One of the most poweful collaborations between AI and tech giants, Model Context Protocol (MCP) is a standard for connecting AI agents. We need standards like MCP to orchestrate communication between AI agents, AI assistants, LLMs, and other resources. Such standards are also critical for developing more complex agentic workflows.

The MCP protocol enables two key technologies: The MCP server connects AI agents, makes them discoverable, and provides other operational services. The MCP gateway is a reverse proxy that serves as an interface between AI agents, MCP servers, and other services that support the MCP protocol.

Many organizations are utilizing AI agents from top-tier SaaS and security companies while also experimenting with ones from growing startups. Devops teams aim to build trustworthy AI agents while avoiding the risks of rapid deployment. The AI development roadmap will likely require agent-to-agent communication with the help of MCP servers.

Below are five requirements to consider before deploying an MCP server or connecting your AI agents to one.

Requirements for MCP servers

While MCP servers share similarities with other integration technologies, they also have key differences. MCP servers act as a catalog of tools and data for AI agents to use when responding to a prompt or completing a task. They centralize authentication, schemas, error handling, and streaming semantics for processing partial responses. Operational and security teams use MCP servers to monitor activity and respond to security incidents and AI agent performance issues.

The scope and scale of services orchestrated by MCP means teams must define their requirements inside a well-defined IT governance model.

“When using MCP to provide your agents with more tools to get their jobs done, make sure your governance requirements extend to that service,” says Michael Berthold, CEO of KNIME. “Before pointing your agent to an external MCP server, make sure you know and understand how prompts and data are processed, and potentially shared or used for other purposes. Don’t assume a tool that seems to be doing something in isolation isn’t using another AI underneath the hood.”

Also see: Five MCP servers to rule the cloud.

1. Define the MCP server’s scope

MCP servers can play a contextual role in agent-to-agent orchestrations. When an AI agent seeks other AI agents to complete a job, it can query an MCP server to identify potential resources and decide which to interface with. Defining the server’s scope helps shape its problem domain and ownership, as well as its governance, security, and other operational boundaries.

“Design your MCP servers to be narrowly focused, exposing specific and granular tools to your AI agents, instead of trying to be a general-purpose API,” says Simon Margolis, associate CTO of AI and ML at SADA, an Insights Company. “This makes it easier for the AI’s reasoning engine to discover the right tool dynamically and improves the reliability of the actions it takes. An MCP server acts as a smart adapter, translating the AI’s request into the exact command the underlying tool understands.”

“We’ve found that simple, explicit instructions, such as telling the model how to use a vendor’s command-line utility, can outperform a poorly integrated MCP server,” adds Andrew Filev, CEO and founder of Zencoder. “Overloading the model’s context with too many MCP tools can actually degrade performance, confuse the agent, and obscure reasoning paths.”

Creating separate servers for finance, HR, customer support, and IT simplifies creating access rules, monitoring operations for anomalies, and defining lifecycle management policies.

2. Establish integration governance

There are different schools of thought over what resources to connect through an MCP server. For example:

  • Gloria Ramchandani, SVP of product at Copado, advises teams to pull data, settings, and context from the MCP server rather than keeping their own copies. “Using the MCP as the single place your agents rely on keeps everything consistent, reduces mistakes, and makes automation smoother as your teams grow,” Ramchandani said.
  • James Urquhart, field CTO and developer evangelist at Kamiwaza, recommends against relying on MCP servers for data retrieval. “RAG approaches to incorporating live data into response generation still enable better security and performance than MCP integration.”
  • Tun Shwe, AI lead at Lenses, says, “Don’t expose existing web and mobile APIs directly as MCP tools. Whilst it’s a quick way to get started, these APIs tend to be fine-grained with verbose responses; characteristics that are undesirable to AI agents, since they inflate token consumption.”
  • Rahul Pradhan, VP of product and strategy of AI and data at Couchbase, advises against treating MCP-connected agents with access to a database as generic, low-risk APIs. He suggests the following instead:
    • Treat every tool that can read or write data as highly privileged: Enforce least-privilege roles, segregate access by data sensitivity, and separate read from write paths.
    • Design prompts so agents first invoke schema introspection tools to understand scopes, collections, and fields before issuing any operations.
    • Constrain agents to vetted, parameterized queries or stored procedures, and log all calls, to reduce the risk of exfiltration, corruption, and compliance failures.

3. Implement security non-negotiables

Many organizations created AI governance policies when they rolled out LLMs, then updated them for AI agents. Deploying MCP servers requires layering on new security non-negotiables related to configuration, deployment, and monitoring.

“Prioritize security because tools exposed by an MCP server can change and may not have the same level of data security an agent expects,” says Ian Beaver, chief data scientist at Verint. “Prompt injection risks exist in both tool responses and user inputs, making tool use the primary vulnerability point for otherwise static foundation models. Therefore, treat all tool use as untrusted sources: Log every tool’s input and output to enable full auditability of agent interactions.”

One critical place to start is defining identity, authentication, and authorization for AI agents. Because AI agents will be discoverable through MCP servers, make sure to be clear and transparent on the scope and entitlements of their capabilities.

“Don’t give AI agents unrestricted access when connecting through MCP,” says Meir Wahnon, co-founder at Descope. “Even though MCP standardizes integrations, many servers still lack proper authentication or use overly broad permissions, leaving systems exposed. Apply the principle of least privilege: Grant narrow scopes, require explicit user consent, and keep humans in the loop for sensitive actions.”

Other security recommendations include isolating high-risk capabilities within dedicated MCP servers or namespaces and implementing cryptographic server verification. Key principles of MCP server security governance include secure communications, data integrity assurance, and incident response integration.

Three more security recommendations:

  • Vrajesh Bhavsar, CEO and co-founder of Operant AI, says, “Don’t rely on traditional security approaches that depend on static rules and predefined attack patterns—they cannot keep up with the dynamic, autonomous nature of MCP-connected systems.”
  • Arash Nourian, global head of AI at Postman, adds, “Don’t treat MCP as secure out of the box because it currently has close to zero built-in security, with no standardized authentication, weak session management, and unvetted tool registries that open the door to MCP-specific attacks like prompt or tool poisoning.”
  • Or Vardi, technical lead at Apiiro, adds, “Keep humans in the loop for any sensitive or business-critical tasks, and also monitor and audit MCP activity to detect misuse early.”

4. Don’t delegate data responsibilities to MCP servers

Several experts cautioned that while MCP servers provide connectivity, they do not vet the data passing through them.

“Don’t assume MCP solves your underlying data quality problems,” says Sonny Patel, chief product and technology officer at Socotra. “MCP provides the connectivity layer, but AI agents can only be as effective as the data they access. If your systems contain incomplete, inconsistent, or siloed information, even perfectly connected agents will produce unreliable results.”

Developers should also scrutinize prompts and other inputs sent to their AI agents via MCP servers and make no assumptions about upstream validation.

“Always implement runtime interception to validate MCP inputs before they reach your agent’s reasoning engine, says Matthew Barker, head of AI research and development at Trustwise. “Attackers can poison tool descriptions, API responses, or shared context with hidden commands that hijack agent behavior. It only takes one compromised agent to cascade malicious instructions across your entire AI ecosystem through inter-agent communication.”

Pranava Adduri, CTO and co-founder, Bedrock Data, says, “Don’t connect AI agents to data sources via MCP without first classifying data and establishing access boundaries. MCP simplifies context sharing but can amplify risk if agents query sensitive or unverified sources.”

5. Manage the end-to-end agent experience

As organizations deploy more AI agents and configure MCP servers, experts suggest setting principles around end-user and operational experiences. Devops teams and SREs will want to ensure they have observability and monitoring tools in place to alert on issues and aid in diagnosing them.

Or Oxenberg, senior full-stack data scientist at Lasso Security, says to establish comprehensive observability with trusted MCP servers. “If you’re using an MCP gateway, remember it monitors only traffic going in and out of the MCP server. For full visibility, capture every interaction and user input, map and monitor the agent’s planning and actions, and track their tasks and decisions. Without this foundation, you can’t detect when agents drift from intended behavior or trace back security incidents.”

Developers should also limit an AI agent’s access to MCP servers and AI agents, granting access to only those providing relevant services. Broadening their access can lead to erroneous results and higher costs.

“As an integrator, you are now crafting a product experience for the agent persona and should treat the modulated toolkit with the same product discipline you apply to the developer UX: clarity, alignment, and value,” says Edgar Kussberg, group product manager of AI at Sonar. “When agents are given broad or generic MCP tools, they spend too much time and tokens exploring, filtering, reasoning, and failing to provide value, wasting budget, complicating review workflows, and diluting trust in agent outputs.”

As more organizations deploy AI agents into production, I expect a growing need to configure MCP servers to support agent-to-agent communication. Establishing an upfront strategy, nonfunctional requirements, and security non-negotiables should guide smarter and safer deployments.

(image/jpeg; 0.52 MB)

Ruby sinking in popularity, buried by Python – Tiobe 9 Mar 2026, 2:57 pm

The Ruby language has been around since 1995 and still gets regular releases. But the language has dropped to 30th place in this month’s Tiobe index of language popularity, with Python cited as a reason for Ruby’s drop.

Ruby was the Tiobe language of the year in 2006, having displayed the highest growth rate in popularity that year, it is now close to dropping out of the top 30, according to Tiobe CEO Paul Jansen. Ruby’s March rating is .55%; the language was ranked 25th last month. “The main reason for Ruby’s drop is Python’s popularity. There is no need for Ruby anymore,” Jansen said. Ruby’s highest position was an eighth place ranking in May 2016.

Also in this month’s index, SQL, with a rating of 2%, and R, with a rating of 1.88%, swapped places in the top 10, with SQL now ranking eighth and R ninth. In addition, Swift re-entered the top 20 with a rating of 1.04%, while Kotlin fell to 22nd with a rating of .82%. And Google’s Dart language, once positioned as a rival to JavaScript, is on a path to sneaking back into the top 20. Dart ranked 25th this month with a rating of .69%.

The Tiobe Programming Community Index gauges language popularity based on a formula that assesses the number of skilled engineers worldwide, courses, and third-party vendors pertinent to a language. Popular websites such as Google, Amazon, Bing, Wikipedia, and more than 20 others are used to calculate the ratings.

In the bulletin accompanying this month’s index, Jansen addressed inquiries about switching from search engines to large language models (LLMs) to formulate the ratings. “The answer is no,” Jansen said. “The Tiobe index measures how many internet pages exist for a particular programming language. LLMs ultimately rely on the same sources—they are trained on and analyze these very same web pages. Therefore, in essence, there is no real difference.”

The Tiobe index top 10 for March 2025:

  1. Python, 21.25%
  2. C, 11.55%
  3. C++, 8.18%
  4. Java, 7.99%
  5. C#, 6.36%
  6. JavaScript, 3.45%
  7. Visual Basic, 2.5%
  8. SQL, 2%
  9. R, 1.88%
  10. Delphi/Object Pascal, 1.8%

The Pypl Popularity of Programming Language index gauges language popularity by analyzing how often language tutorials are searched on in Google. The Pypl index top 10 for March 2025:

  1. Python, 34.87%
  2. C/C++, 13.66%
  3. Java, 9.82%
  4. R, 6.49%
  5. JavaScript, 6.49%
  6. Swift, 3.5%
  7. Rust, 3.08%
  8. C#, 3.03%
  9. PHP, 2.9%
  10. Ada, 2.66%

(image/jpeg; 3.43 MB)

Anthropic debuts Claude Marketplace to target AI procurement bottlenecks 9 Mar 2026, 4:38 am

Anthropic has launched a new marketplace for tools built on its Claude large language models (LLMs) that analysts say could help streamline procurement hurdles, which often slow the adoption of generative AI for enterprises.

Called Claude Marketplace, the platform currently has a limited set of partners, including Replit, Lovable Labs, GitLab, Snowflake, Harvey AI, and Rogo, offering tools across software development, legal workflows, financial analysis, and enterprise data operations, respectively.

“Most enterprises are not struggling to find capable models. They are struggling to operationalize them inside complex environments that already contain hundreds of applications, strict governance controls, and layered procurement processes,” said Sanchit Vir Gogia, chief analyst at Greyhound Research.

“Every new AI tool typically triggers security reviews, legal vetting, vendor onboarding, procurement approval, integration testing, and ongoing governance oversight. That process alone can delay deployment by months. The marketplace attempts to compress that operational friction,” Gogia added.

The billing for tools in the marketplace, which is charged against an enterprise’s existing committed spend on Claude, is also designed to help streamline procurement by eliminating the need for separate vendor contracts or payment processes.

“Historically, a company would need to negotiate separately with Anthropic and with Harvey or GitLab. Anthropic will manage all invoicing for partner spend, so it’s one contract, one invoice, one renewal conversation. For large enterprises where procurement cycles can take months, this is genuinely valuable,” said Pareekh Jain, founder of Pareekh Consulting.

Strategic lock-in and enterprise proliferation

Beyond simplifying procurement, however, Jain says there’s a deeper strategic play in Anthropic managing partner spend within the marketplace.

“Anthropic earns primarily through API consumption, so every partner application running on Claude generates token revenue. In that sense, the marketplace functions as a distribution engine rather than a toll booth, an approach similar to Amazon Web Services’ early ecosystem expansion, where lowering friction for partners accelerated adoption before deeper monetization,” Jain said.

The analyst added that managing marketplace billing also reflects a broader strategy of strengthening platform lock-in, echoing how Salesforce built its ecosystem around AppExchange and how Microsoft is expanding its footprint with Microsoft Copilot integrations.

“Anthropic is trying to deepen switching costs. Once an enterprise has committed to Anthropic spend and multiple partner tools running through Claude, migrating to another model becomes operationally difficult,” Jain said.

That dynamic, he added, could help Anthropic position itself as “the core AI commitment layer” inside enterprise budgets, increasing the likelihood of Claude becoming the primary line item rather than one of several separate AI tools.

Building a competitive edge

The marketplace may be Anthropic’s first step in creating an edge as competition among AI model makers grows.

“If tools like Harvey gain traction partly because they run on Claude within an existing Anthropic commitment, partners have incentives to stay aligned with Claude even as rival models improve, creating mutual lock-in,” Jain said.

This strategy, Greyhound Research’s Gogia said, will create a behavioral incentive for developers and startups to prioritize Claude integration if they want access to enterprise buyers participating in the marketplace, and over time, that dynamic can expand the partner ecosystem around the platform.

Channel conflict and narrative counterbalance

However, Gogia warned that Anthropic could be heading towards channel conflict.

“Anthropic is simultaneously building its own first-party AI tools while enabling third-party SaaS vendors to extend Claude capabilities through the marketplace,” Gogia said, referring to Claude Cowork and other plugins that triggered a sell-off among several SaaS stocks earlier this year as investors worried that native AI agents could begin encroaching on parts of the traditional software stack.

“The company must balance encouraging ecosystem innovation while ensuring that its own product roadmap does not compete directly with partner offerings,” Gogia added.

Furthermore, the analyst said that the launch of the marketplace is opportune for the company and can be seen as a “narrative counterbalance” to the imbroglio it is currently facing with the US Department of War, which has marked it as a supply chain risk.

“In practical terms, the marketplace demonstrates forward momentum in the enterprise segment. It signals that Anthropic continues to deepen relationships with enterprise software vendors and commercial customers even as the imbroglio unfolds,” Gogia said.

Last week, Anthropic CEO Dario Amodei himself, via a blog post, tried to reassure customers that the impasse with the DoW wouldn’t affect them.

(image/jpeg; 0.08 MB)

How generative UI cut our development time from months to weeks 9 Mar 2026, 3:00 am

When we shipped a new feature last quarter, it would have taken three months to build traditionally. It took two weeks. Not because we cut corners or hired contractors, but I fundamentally changed how we create user interfaces.

The feature was a customer service dashboard that adapts its layout and information density based on the specific issue a representative is handling. A billing dispute shows different data than a technical support case. A high-value customer gets a different view than a standard inquiry. Previously, building this meant months of requirements gathering, design iterations and front-end development for every permutation.

Instead, I defined my team to use generative UI: AI systems that create interface components dynamically based on context and user needs.

What does generative UI mean in reality?

The range of possibilities here is broad. On one end of the spectrum, developers use AI to generate code to build an interface more quickly. On the far end, interfaces are dynamically assembled entirely at runtime.

I lead and implemented an approach that exists somewhere in between. We specify a library of components and allowable layout patterns that define the constraints of our design system. The AI then chooses components from this library, customizes them based on context and lays them out appropriately for each unique user interaction.

The interface never really gets designed — it just gets composed on demand using building blocks we’ve already designed.

Applied to our customer service dashboard, we can feed information about the customer record, type of issue, support rep’s role and experience, and recent history into the system to assemble an interface tailor-made to be most effective for that situation. An expert rep assisting with a complex technical problem will see system logs and advanced troubleshooting tools. A new rep assisting with a basic billing inquiry will see simplified information and workflow guidance.

Both interfaces would look different but are assembled from the common library of components designed by our UI team.

The technical architecture

Our generative UI system has four layers, each with clear responsibilities.

Generative UI architecture
Figure 1: Generative UI architecture — four layers transform user context into dynamic interfaces while guardrails ensure enterprise compliance.

Sreenivasa Reddy Hulebeedu Reddy

  1. The component library layer: It contains all approved UI elements: cards, tables, charts, forms, navigation patterns and layout templates. This follows the principles of design systems. Each component has defined parameters, styling options and behavior specifications. This layer is maintained by our design system team and represents the visual and interaction standards for our applications.
  2. The context analysis layer: Thisprocesses information about the current user, their task and relevant data. For customer service, this includes customer attributes, issue classification, historical interactions and representative profile. This layer transforms raw data into structured context that informs interface generation.
  3. The composition engine layer: Hereis where AI enters the picture. Given the available components and the current context, this layer determines what to show, how to arrange it and what level of detail to present. We use a fine-tuned language model that has learned our design patterns and business rules through extensive examples.
  4. The rendering layer: Ittakes the composition specification and produces the actual interface. This layer handles the technical details of turning abstract component descriptions into rendered UI elements.

How we built it

We built the generative UI system over the course of four months. The first step was building the component library. Our design team took an inventory of every UI pattern deployed across our customer service applications. 27 components in all, from simple data cards to interactive tables. Each component was parameterized based on what data to show, how to react to user input and how to adjust to screen sizes, among other properties. The result was our component library.

The context analysis layer then had to interface with three different backends. Our CRM, which stores information about customers, our ticketing system, which has details about issue classifications, and our workforce management system, which maintains representative profiles. Each of these systems required adapters that would funnel context data into a normalized context object that the composition engine could read.

Finally, for the composition engine, we performed “prompt tuning” on a language model with 2k demonstrations of how our designers mapped context to interface by hand. The model learned relations such as “complex technical issue + senior rep => detailed diagnostic view” without those explicit rules being programmed. Instead of hardcoding thousands of if/then statements, we were able to bake designer knowledge into the model.

The system is deployed onto our cloud architecture, which serves the UI with a latency of less than 200ms, making the generation process invisible to users.

Guardrails that make it enterprise-ready

Generative systems require constraints to be enterprise-ready. We learned this through early experiments where the AI made creative but inappropriate interface decisions that are technically functional but violate brand guidelines or accessibility standards.

Our guardrails operate at multiple levels. Design system constraints ensure every generated interface complies with our visual standards. The AI can only select from approved components and can only configure them within approved parameter ranges. It cannot invent new colors, typography or interaction patterns.

Accessibility requirements are non-negotiable filters. Every generated interface is validated against WCAG guidelines before rendering. Components that would create accessibility violations are automatically excluded from consideration.

Business rule constraints encode domain-specific requirements. Certain data elements must always appear together. Certain actions require specific confirmations. Customer financial information has display requirements regardless of context. These rules are defined by business stakeholders and enforced by the system.

Human review thresholds trigger manual approval for unusual compositions. If the AI proposes an interface significantly different from historical patterns, it’s flagged for designer review before deployment.

Where it works and where it doesn’t

Generative UI isn’t a universal solution. It excels in specific contexts and creates unnecessary complexity in others.

It works well for high-variation workflows where users face different situations requiring different information. Customer service, field operations and case management applications benefit significantly. It also works for personalization at scale, when you need to adapt interfaces for different user roles, experience levels or preferences without building separate versions for each.

It doesn’t make sense for simple, low-variation interfaces where a single well-designed layout serves all users effectively. A settings page or login screen doesn’t need dynamic generation. It’s also the wrong approach for highly regulated forms where the exact layout is mandated by compliance requirements like tax forms, legal documents or medical intake forms, should remain static and auditable.

The investment in building a generative UI system only pays off when interface variation is a genuine problem. If you’re building ten different dashboards for ten different user types, it’s worth considering. If you’re building one dashboard that works for everyone, stick with traditional methods.

Why this matters for enterprise development

Enterprise application development tends to follow a tried-and-true formula. Stakeholders express requirements. Designers mockup solutions. Developers implement interfaces. QA exercises the whole system. Repeat for each new requirement or variant context.

It’s a process that produces results. However, it doesn’t scale well and tends to be slow. Say we want to build a customer service application. Different issue types require different information views. Different customers may see different interfaces. Support reps may see different screens based on their role or channel of interaction. Manually designing and building every combination would take forever (and a lot of money). Instead, we settle, we build flexible but mediocre interfaces that reasonably accommodate every situation.

Generative UI eliminates this compromise. Once you’ve built the system, the cost of adding a new variant of the UI becomes negligible. Rather than picking ten use cases to design perfectly for, we can accommodate hundreds.

In our case, the business results were profound. Service reps spent 23% less time scrolling through screens to find the info they needed. First call resolution increased by 8%. Reps gave higher satisfaction ratings because they felt like the software was molded to their needs instead of forcing them into a one-size-fits-all process.

Organizational implications

Adopting generative UI changes how design and development teams work.

Designers shift from creating specific interfaces to defining component systems and composition rules. This is a different skill set that needs more systematic thinking, more attention to edge cases, more collaboration with AI systems. Some designers find this liberating; others find it frustrating. Plan for change management.

Developers focus more on infrastructure and less on UI implementation. Building and maintaining the generative system requires engineering investment, but once operational, the marginal effort per interface variation drops dramatically. This frees developer capacity for other priorities.

Quality assurance becomes continuous rather than episodic. With dynamic interfaces, you can’t test every possible output. Instead, you validate the components, the composition rules and the guardrails. As Martin Fowler notes about testing strategies, QA teams need new tools and methodologies for this kind of testing.

How to adopt generative UI

My advice to IT leaders evaluating generative UI is to start small with a pilot program to prove value before scaling across your organization. Find a workflow with high variability that has measurable results. Turn on generative UI for that single use case. Measure the impact to user productivity, satisfaction and business outcomes. Leverage those results to secure further investment.

Focus on your component library before enabling dynamic composition. The AI can only create great experiences if it has great building blocks. Focus on design system maturity before you prioritize generative features.

Define your guardrails up front. The guardrails that will make your generative UI solution enterprise-ready are not an afterthought. They’re requirements. Build them in lockstep with your generative features.

The future looks bright

The move from static interfaces to generative interfaces is really just one example of a larger trend we’re starting to see play out across enterprise software: The gradual shift from “static” technology designed for the most-common use cases upfront to dynamic technology that can adapt to the user’s context as they need it.
We’ve already started to see this play out with search, recommendations and content. UI is next.

For forward-looking enterprises that are willing to put in the upfront work to create robust component libraries, establish governance frameworks and build thoughtful AI integrations, Generative UI can enable applications that work for your users, instead of the other way around.

And that’s not just an incremental improvement in efficiency. That’s a whole new way of interacting with enterprise software.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

(image/jpeg; 1.59 MB)

Coding for agents 9 Mar 2026, 2:00 am

Large language models (LLMs) and AI agents aren’t important to software engineering because they can write code at superhuman speeds. Without the right guardrails, as I’ve highlighted, that speed simply translates into mass-produced technical debt. No, agentic coding matters because it fundamentally changes what counts as good software engineering.

For years, developers could get away with optimizing for personal taste. If a framework fit your brain, if a workflow felt elegant, if a codebase reflected your particular theory of how software ought to be built, that was often enough. The machine would eventually do what you told it to do. Agents change that equation. They don’t reward the cleverest workflow. They reward the most legible one and, increasingly, the one that is optimized for them. This may seem scary but it’s actually healthy.

Just ask Hamel Husain.

Speaking to machines

It’s not hard to find know-it-all Hacker News developers with strong opinions on exactly what everyone should be using to build. Husain, however, is different. When he blogged about nixing his use of nbdev, he wasn’t walking away from some random side project. He was dumping his project, something he helped build and spent years championing. The reason? It wasn’t AI-friendly. “I was swimming upstream,” he notes, because nbdev’s idiosyncratic approach was “like fighting the AI instead of working with it.” Instead, he says, he wants to work in an environment where AI has “the highest chance of success.” He’s building according to what the machines like, and not necessarily what he prefers. He won’t be alone.

Developers have always liked to imagine tools as a form of self-expression. Sometimes they are. But agents are making tools look a lot more like infrastructure. Husain says Cursor won because it felt familiar, letting developers change habits gradually instead of demanding a new worldview on day one. That sounds a lot like the argument I made in “Why ‘boring’ VS Code keeps winning.” Familiarity used to matter mostly because humans like low-friction tools. Now it matters because models do, too. A repo layout, framework, or language that looks like the training distribution gives the model a better shot at doing useful work. In the agent era, conformity isn’t capitulation. It’s leverage.

GitHub’s latest Octoverse analysis makes the point with data. In August 2025, TypeScript overtook both Python and JavaScript as the most-used language on GitHub. GitHub’s reasoning is that AI compatibility is becoming part of technology choice itself, not just a nice bonus after the choice is made. It also reports that TypeScript grew 66% year over year and explains why: Strongly typed languages give models clearer constraints, which helps them generate more reliable, contextually correct code. As Husain says of his decision to eschew a Python-only path to use TypeScript, “typed languages make AI-generated code more reliable in production.”

That doesn’t mean every team should sprint into a TypeScript rewrite, but it does mean the case for quirky, under-documented, “trust me, it’s elegant” engineering is getting weaker. Agents like explicitness. They like schemas. They like guardrails.

In short, they like boring.

Engineering economics 101

This is the deeper change in software engineering. The agent story isn’t really about code generation. It’s about engineering economics. Once the cost of producing code drops, the bottleneck moves somewhere else. I’ve explained before that typing is never the real constraint in software engineering. Validation and integration are. Agents don’t remove that problem; instead, they make output cheap and verification expensive, which reorders the entire software development life cycle.

The best public evidence for that comes from two very different places. Or, rather, from their seeming contradictions.

The first is a METR study on experienced open source developers. In a randomized trial, developers using early-2025 AI tools took 19% longer to complete issues in repositories they already knew well, all while thinking they’d actually gone faster. Contrast this with OpenAI’s recent “harness engineering” essay, where the company says a small team used Codex to build roughly a million lines of code over five months and merge around 1,500 pull requests. These results seem superficially at odds until you realize that METR’s survey measured naive use of AI, whereas OpenAI’s example shows what happens when a team redesigns software development for agents, rather than simply sprinkling agentic pixie dust on old workflows.

In OpenAI’s experiment, engineers were no longer asked to write code. Instead they were told their primary job was to “design environments, specify intent, and build feedback loops” that allowed agents to do reliable work. Over the course of the pilot, they found that they’d initially underspecified the environment the agents would operate in, but they eventually shifted to a focus on creating systems in which generated code can be trusted.

Of course, this means that AI-driven coding requires just as much human intervention as before. It’s just a different kind of intervention.

This is playing out in the job market even as I type this (and yes, I wrote this post myself). Kenton Varda recently posted: “Worries that software developer jobs are going away are backwards.” He’s directionally right. If agents lower the cost of building software, the likely effect will be more software, not less. As he intimates, we’ll see more niche applications, more internal tools, and more custom systems that previously weren’t worth the effort. Indeed, we’re seeing the software developer job market significantly outpace the overall job market, even as AI allegedly comes to claim those jobs.

It’s not. We still need people to steer while the agents take on more of the execution.

Inspecting the agents

This is where Husain’s focus on evals becomes so important. In his LLM evals FAQ, he says the teams he’s worked with spend 60% to 80% of development time on error analysis and evaluation. He’s also written one of the clearest summaries I’ve seen of how agent-era software development works: Documentation tells the agent what to do, telemetry tells it whether it worked, and evals tell it whether the output is good. Anthropic says much the same thing in its Best Practices for Claude Code, saying the “single highest-leverage thing” you can do is give the model a way to verify its own work with tests, screenshots, or expected outputs.

This also changes what a repository is. It used to be a place where humans stored source code and left a few breadcrumbs for other humans. Increasingly it’s also an operating manual for agents. OpenAI says Codex started with an AGENTS.md file but then learned that one giant agent manual quickly becomes stale and unhelpful. What worked better was treating AGENTS.md as a short map into a structured in-repo knowledge base. That is a very agent-native insight. Build commands, test instructions, architecture notes, design docs, constraints, and non-goals are no longer ancillary documentation. They are part of the executable context for development itself.

More bluntly? Context is now infrastructure.

So many teams are about to discover that their software practices are worse than they thought. Undocumented scripts, magical local setup, flaky tests, tribal-knowledge architecture, vague tickets, inconsistent naming, and “every senior engineer does it a little differently.” Humans just learned to absorb it. Agents expose this silliness immediately. An underspecified environment doesn’t create creativity; it creates garbage. If you drop an agent into a messy codebase and it flails, that’s not necessarily an indictment of the agent. Often it’s a very efficient audit of your engineering discipline. The repo is finally telling the truth about itself.

Which is why I’d now say that my suggestion that AI coding requires developers to become better managers was true, if incomplete. Yes, developers need to become better managers of machines. But more importantly, they need to become better engineers in the old-fashioned sense: better at specifications, boundaries, “golden paths,” etc. The agent era rewards discipline far more than cleverness, and that’s probably overdue.

So no, the big story of coding agents isn’t that they can write code. Plain chatbots could already fake that part. The big story is that they are changing what competent software engineering looks like. Agents reward exactly the things developers have long claimed to value but often avoided in practice: explicitness, consistency, testability, and proof. In the age of agents, boring software engineering doesn’t just scale better, it does most everything—collaboration, debugging, etc.—better.

(image/jpeg; 8.51 MB)

19 large language models redefining AI safety—and danger 9 Mar 2026, 2:00 am

Everyone working on artificial intelligence these days fears the worst-case scenario. The precocious LLM will suddenly glide off the rails and start spouting dangerous thoughts. One minute it’s a genius that’s going to take all our jobs, and the next it’s an odd crank spouting hatred, insurrection, or worse.

Fortunately, there are solutions. Some scientists are building LLMs that can act as guardrails. Yes, adding one LLM to fix the problems of another one seems like doubling the potential for trouble, but there’s an underlying logic to it. These new models are specially trained to recognize when an LLM is potentially going off the rails. If they don’t like how an interaction is going, they have the power to stop it.

Of course, every solution begets new problems. For every project that needs guardrails, there’s another one where the guardrails just get in the way. Some projects demand an LLM that returns the complete, unvarnished truth. For these situations, developers are creating unfettered LLMs that can interact without reservation. Some of these solutions are based on entirely new models while others remove or reduce the guardrails built into popular open source LLMs.

Here’s a quick look at 19 LLMs that represent the state-of-the-art in large language model design and AI safety—whether your goal is finding a model that provides the highest possible guardrails or one that just strips them all away.  

Safer: LLMs with guardrails

The models in this category emphasize the many dimensions of AI safety. Whether you are looking for an LLM built for sensitive topics, one with a strong ethical compass, or a model capable of recognizing hidden exploits in seemingly innocent prompts, the heavily guarded models in this list could have you covered.

LlamaGuard

The developers of the various LlamaGuard models from Meta’s PurpleLlama initiative fine-tuned open source Llama models using known examples of abuse. Some versions, like Llama Guard 3 1B, can flag risky text interactions using categories like violence, hate, and self-harm in major languages including English and Spanish. Others, like Llama Guard 3 8B, tackle code interpreter abuse, which can enable denial of service attacks, container escapes, and other exploits. There are close to a dozen LlamaGuard versions that extend the base Llama models already. It also looks like Meta will continue researching ways to improve prompt security in foundation models.

Granite Guardian

IBM built the Granite Guardian model and framework combination as a protective filter for common errors in AI pipelines. First, the model scans for prompts that might contain or lead to answers that include undesirable content (hate, violence, profanity, etc.). Second, it watches for attempts to evade barriers by hoodwinking the LLM. Third, it watches for poor or irrelevant documents that might come from any RAG database that’s part of the pipeline. Finally, if the system is working agentically, it evaluates the risks and benefits of an agent’s function invocations. In general, the model generates risk scores and confidence levels. The tool itself is open source, but it integrates with some of the IBM frameworks for AI governance tasks like auditing.

Claude

As Anthropic built various editions of Claude, it created a guiding list of ethical principles and constraints that it started calling a constitution. The latest version was mainly written by Claude itself, as it reflected upon how to enforce these rules when answering prompts. These include strict prohibitions on dangerous acts like building bioweapons or taking part in cyberattacks as well as more philosophical guidelines like being honest, helpful, and safe. When Claude engages with users, it tries to stay within the boundaries defined by the constitution it helped to create.

WildGuard

The developers of Allen Institute for AI’s WildGuard started with Mistral-7B-v0.3 and used a combination of synthetic and real-world data to fine-tune it for defending against harm. WildGuard is a lightweight moderation tool that scans LLM interactions for potential problems. Its three functions are to identify malicious intent in user prompts; detect safety risks in model responses; and determine the model refusal rate, or how often a model declines to answer. This can be useful for tuning the model to be as helpful as possible while remaining within safe bounds.

ShieldGemma

Google released a series of open weight models called ShieldGemma, which the company uses to block problematic requests. ShieldGemma 1 comes in three sizes (2B, 9B, and 27B) for classifying text input and output. ShieldGemma 2 blocks requests for images that are flagged as sexually explicit, harmful, violent, or for contain excessive blood and gore. The visual classifier tool can also be run in reverse to produce adversarial images, which are used to enhance the model’s ability to detect content that may violate the image safety policy.

NeMo Guardrails

Nvidia’s Nemotron collection of open source models includes a version, Nemotron Safety Guard, that acts as a gatekeeper by scanning for jailbreaks and dangerous topics. It can run on its own or integrate with NeMo Guardrails, a programmable protection system that can be revised and extended with traditional and not-so-traditional techniques. Developers can use Python to add specific “actions” for the model to use, or to provide patterns and structured examples that guide model behavior. Regular guardrails may halt a conversation at the hint of something undesirable. Ideally, the model can steer the conversation back to something productive.

Qwen3Guard

This multilingual model from Qwen comes in a variety of combinations to block unwanted behavior in your dataflows. Qwen3Guard-Gen works in a traditional question-and-answer format with prompts and responses. Qwen3Guard-Stream has a slightly different architecture that’s optimized for token-level filtering in real-time streams. Both come in a few sizes (0.6B, 4B, and 8B) to optimize the tradeoff between performance and protection. Qwen developers also built a special version of the 4B, Qwen3-4B-SafeRL, which was enhanced with reinforcement learning to maximize safety and user experience.

PIGuard

The PIGuard model focuses on defending against prompt injection, the type of malicious attack that can be challenging to prevent without being overly paranoid. It watches for covert suggestions that might be hidden inside the prompt. PIGuard’s developers trained the model by building a special training set called NotInject, which uses examples of false positives that might trick a less capable model.

PIIGuard

Not to be confused with PIGuard, this completely different model is aimed at flagging personally identifiable information (PII) in a data stream. This ensures an LLM won’t mistakenly leak someone’s address, birthday, or other sensitive information when responding to prompts. The PIIGuard model is trained on examples that teach it to detect PII that’s embedded in a conversation or a long text stream. It’s a step up from standard detectors that use regular expressions and other more basic definitions of PII structure.

Alinia

The guardrails from Alinia apply to wider range of potentially troublesome behaviors. The model covers standard issues like illegal or dangerous behaviors but is also trained to avoid the legal tangles that may follow giving medical or tax advice. This LLM guardrail also can detect and refuse irrelevant answers or gibberish that may hurt an organization’s reputation. The Alinia system relies on a RAG-based database of samples so it can be customized to block any kind of sensitive topic.

DuoGuard

Sometimes it’s hard for AI developers to find a large enough training set with all the examples of bad behavior required. The DuoGuard models were built with two parts: Part one generates all the synthetic examples you need, and part two boils them all down to a model. The model is smart, small, and quick, and can detect issues in 12 risk categories, including violent crime, weapons, intellectual property, and jailbreaking. DuoGuard comes in three tiers (0.5b, 1.0b, and 1.5b) to serve all levels of need.

Looser: LLMs with fewer guardrails

LLMs in this category aren’t completely without guardrails, but they’ve been built—or in many cases, retrained—to favor freedom of inquiry or expression over safety. You might need a model like this if you are looking for novel approaches to old problems, or to find the weak points in a system so that you can close them up. Models with lower guardrails are also favored for exploring fictional topics or for romantic roleplay.

Dolphin models

Eric Hartford and a team at Cognitive Computations built the Dolphin models to be “uncensored.” That is, they stripped away all the guardrails they could find in an open source foundation model by removing many restricting questions and answers from the training set. If the training material showed bias or introduced reasons to refuse to help, they deleted it. Then, they retrained the model and produced a version that will answer a question any way it can. They’ve so far applied this technique to a number of open source models from Meta and Mistral.

Nous Hermes

The Hermes models from Nous Research were built to be more “steerable”—meaning they aren’t as resistant as some models are to delivering answers on demand. The Hermes model developers created a set of synthetic examples that emphasize helpfulness and unconstrained reasoning. The training’s effectiveness is measured, in part, with RefuseBench, a set of scenarios that test helpfulness. The results are often more direct and immediately useful. The developers noted, for instance, that “Hermes 4 frequently adopted a first-person, peer-like persona, generating responses with fewer meta-disclaimers and more consistent voice embodiment.”

Flux.1

The Flux.1 model was designed to create images by following as strictly as possible any prompt instructions. Many praise its rectified flow transformer architecture for producing excellent skin tones and lighting in complex scenes. The model can be fine-tuned for applications that require a particular style or content using low-rank adaptation (LoRA). Flux.1 is available under an open source license for non-commercial use. Any commercial deployment requires additional licensing.

Heretic

Heretic lowers the guardrails of existing LLMs by stripping away their defenses. It starts by tracking how the residual vectors behave on two different training sets with harmful and non-harmful examples. It then zeros out the key weights, effectively removing whatever restrictions were built into the original model. The tool is automated, so it’s not hard to apply it to your own model. Or, if you prefer, you can get one that’s been pre-treated. There’s a version of Gemma 3, and another of Qwen 3.5.

Pingu Unchained

Audn.ai built Pingu as a tool for security researchers and red teams who need to ask questions that mainstream LLMs are trained not to answer. To create this model, developers fine-tuned OpenAI’s GPT-OSS-120b with a curated collection of jailbreaks and other commonly refused requests. The resulting model is handy for generating synthetic tests of spear-phishing, reverse engineering, and the like. The tool keeps an audit trail of requests and Audn.ai limits access to verified organizations.

Cydonia

TheDrummer created Cydonia as part of a series of models for immersive roleplay. That means long context windows for character consistency and uncensored interactions for exploring fictional topics. Two versions (22b v1.2 and 24b v4.1) have been built by fine-tuning Mistral Small 3.2 24B. Some call the model “thick” for producing long answers rich with plot details.

Midnight Rose

Midnight Rose is one of several models built by Sophosympatheia for romantic roleplay. The model was developed by merging at least four different foundation models. The idea was to create an LLM capable of building stories with strong plots and emotional resonance, all in an uncensored world of fictional freedom.

Abliterated: LLMs off the rails

A few labs are opening up models by deactivating the guardrail layers directly instead of retraining them for a looser approach. This technique is often called abliteration, a portmanteau combining “ablation” (removal) and “obliterate” (destruction). The developers identify the layers or weights that operate as guardrails by testing the models with a variety of problematic prompts, then deactivate them by zeroing out their contributions in model responses. These models have at times outperformed their foundational versions on various tasks.

Grok

Good examples in this category come from HuiHui AI and David Belton, but the most famous model of this type is Grok. Rather than being concerned about creating a model that behaves badly, the Grok team at X is more concerned with factual errors. Or, as Elon Musk said in an interview: “The best thing I can come up with for AI safety is to make it a maximum truth-seeking AI, maximally curious.” In other words, Grok was designed for factual correctness, not political correctness, whatever your definition of politics might be.

(image/jpeg; 6.27 MB)

MCP C# SDK 1.0 arrives with improved authorization server discovery 6 Mar 2026, 2:39 pm

Microsoft’s official C# SDK for implementing Model Context Protocol (MCP) servers and clients has reached its 1.0 milestone release. The update brings full support for the 2025-11-25 version of the MCP Specification, highlighted by enhanced authorization server discovery and icon metadata for tools, resources, and prompts.

MCP C# SDK 1.0 was unveiled March 5 and can be found on GitHub. The MCP C# SDK 1.0 release represents a major step forward for building MCP servers and clients in .NET, according to Microsoft. Developers can use the SDK to implement secure authorization flows, build rich tool experiences with sampling, or handle long-running operations, the company said.

With authorization server discovery in the 2025-11-25 MCP specification, servers have three ways to expose the Protected Resource Metadata (PRM) document: via a “well-known” URL derived from the server’s MCP endpoint path, at the root well-known URL, and, as before, via a URL in the resource metadata parameter of the WWW-Authentication header.

The 2025-11-25 specification also adds icon metadata to tools, resources, and prompts. This information is included in the response to tools/list, resources/list, and prompts/list requests. Implementation metadata (describing a client or server) also has been extended with icons and a website URL.

The 2025-11-25 specification features Client ID Metadata Documents (CIMDs) as an alternative to Dynamic Client Registration (DCR) for establishing client identity with an authorization server. CIMD now is the preferred method for client registration in MCP.

Another capability in the 2025-11-25 specification is that servers now can include tools in their sampling requests, which the large language model (LLM) may invoke to produce a response. This is one of the most powerful additions in the specification, Microsoft said.

For running requests over HTTP with polling, the 2025-11-25 specification improves the story for long-running requests. Previously, clients could disconnect and reconnect if the server provided an event ID in server-sent events, but few servers implemented this. Now, servers that open a server-sent event stream for a request begin with an empty event that includes an event ID and optionally a retry-after field. After sending this initial event, servers can close the stream at any time, since the client can reconnect using the event ID.

Finally, MCP C# SDK 1.0 introduces tasks, an experimental feature of the 2025-11-25 MCP specification that provides durable state tracking and deferred result retrieval for MCP requests.

(image/jpeg; 2.07 MB)

Page processed in 0.052 seconds.

Powered by SimplePie 1.3, Build 20180209064251. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.