Or try one of the following: 詹姆斯.com, adult swim, Afterdawn, Ajaxian, Andy Budd, Ask a Ninja, AtomEnabled.org, BBC News, BBC Arabic, BBC China, BBC Russia, Brent Simmons, Channel Frederator, CNN, Digg, Diggnation, Flickr, Google News, Google Video, Harvard Law, Hebrew Language, InfoWorld, iTunes, Japanese Language, Korean Language, mir.aculo.us, Movie Trailers, Newspond, Nick Bradbury, OK/Cancel, OS News, Phil Ringnalda, Photoshop Videocast, reddit, Romanian Language, Russian Language, Ryan Parman, Traditional Chinese Language, Technorati, Tim Bray, TUAW, TVgasm, UNEASYsilence, Web 2.0 Show, Windows Vista Blog, XKCD, Yahoo! News, You Tube, Zeldman
Oracle delivers semantic search without LLMs | InfoWorld
Technology insight for the enterpriseOracle delivers semantic search without LLMs 17 Apr 2026, 10:07 am
Oracle says its new Trusted Answer Search can deliver reliable results at scale in the enterprise by scouring a governed set of approved documents using vector search instead of large language models (LLMs) and retrieval-augmented generation (RAG).
Available for download or accessible through APIs, it works by having enterprises define a curated “search space” of approved reports, documents, or application endpoints paired with metadata, and then using vector-based similarity to match a user’s natural language query to the most relevant of pre-approved target, said Tirthankar Lahiri, SVP of mission-critical data and AI engines at Oracle.
Instead of retrieving raw text and generating a response, as is typical in RAG systems that rely on LLMs, Trusted Answer Search’s underlying system deterministically maps the query to a specific “match document,” extracts any required parameters, and returns a structured, verifiable outcome such as a report, URL, or action, Lahiri said.
A feedback loop enables users to flag incorrect matches and specify the expected result.
Lahiri sees a growing enterprise need for more deterministic natural language query systems that eliminate inconsistent responses and provide auditability for compliance purposes.
Independent consultant David Linthicum agreed about the potential market for Trusted Answer Search.
“The buyer is any enterprise that values predictability over creativity and wants to lower operational risk, especially in regulated industries, such as finance and healthcare,” he said.
Trade-offs
That said, the approach comes with trade-offs that CIOs need to consider, according to Robert Kramer, managing partner at KramerERP. While Trusted Answer Search can reduce inference costs by avoiding heavy LLM usage, it shifts spending toward data curation, governance, and ongoing maintenance, he said.
Linthicum, too, sees enterprises adopting the technology having to spend on document curation, taxonomy design, approvals, change management, and ongoing tuning.
Scott Bickley, advisory fellow at Info-Tech Research Group, warned of the challenges of keeping curated data current.
“As the source data scales upwards to include externally sourced content such as regulatory updates or supplier certifications or market updates that are updated more frequently and where the documents may number in the many thousands, the risk increases,” he said.
“The issue comes down to the ability to provide precise answers across a massive data set, especially where documents may contradict one another across versions or when similar language appears different in regulatory contexts. The risk of being served up results that are plausible but wrong goes up,” Bickley added.
Oracle’s Lahiri, however, said some of these concerns may be mitigated by how Trusted Answer Search retrieves content.
Rather than relying solely on large volumes of static, curated documents that require constant updating, the system can treat “trusted documents” as parameterized URLs that pull in dynamically rendered content from underlying systems, according to Lahiri.
Live data sources
This enables it to generate answers from live data sources such as enterprise applications, APIs, or regularly updated web endpoints, reducing dependence on manually maintained document repositories, he said.
Linthicum was not fully convinced by Lahiri’s argument, agreeing only that Oracle’s approach could help reduce content churn.
“In fast-moving domains, keeping descriptions, synonyms, and mappings current still needs disciplined owners, approvals, and feedback review. It can scale to thousands of targets, but semantic overlap raises maintenance complexity,” he said.
Trusted Answer Search puts Oracle in contention with offerings from rival hyperscalers. Products such as Amazon Kendra, Azure AI Search, Vertex AI Search, and IBM Watson Discovery already support semantic search over enterprise data, often combined with access controls and hybrid retrieval techniques.
One key distinction, between these offerings and Oracle’s, according to Ashish Chaturvedi, leader of executive research at HFS Research, is that the rival products typically layer generative AI capabilities on top to produce answers.
Enterprises can evaluate Trusted Answer Search by downloading a package that includes components such as vector search, an embedding model to process user queries, and APIs for integration into existing applications and user interfaces. They can also run it through APIs or built-in GUI applications, which are included in the package as two APEX-based applications, an administrator interface for managing the system and a portal for end users.
Exciting Python features are on the way 17 Apr 2026, 2:00 am
Transformative new Python features are coming in Python 3.15. In addition to lazy imports and an immutable frozendict type, the new Python release will deliver significant improvements to the native JIT compiler and introduce a more explicit agenda for how Python will support WebAssembly.
Top picks for Python readers on InfoWorld
Speed-boost your Python programs with the new lazy imports feature
Starting with Python 3.15, Python imports can work lazily, deferring the cost of loading big libraries. And you don’t have to rewrite your Python apps to use it.
How Python is getting serious about Wasm
Python is slowly but surely becoming a first-class citizen in the WebAssembly world. A new Python Enhancement Proposal, PEP 816, describes how that will happen.
Get started with Python’s new frozendict type
A new immutable dictionary type in Python 3.15 fills a long-desired niche in Python — and can be used in more places than ordinary dictionaries.
How to use Python dataclasses
Python dataclasses work behind the scenes to make your Python classes less verbose and more powerful all at once.
More good reads and Python updates elsewhere
Progress on the “Rust for CPython” project
The plan to enhance the Python interpreter by using the Rust language stirred controversy. Now it’s taking a new shape: use Rust to build components of the Python standard library.
Profiling-explorer: Spelunk data generated by Python’s profilers
Python’s built-in profilers generate reports in the opaque pstats format. This tool turns those binary blobs into interactive, explorable views.
The many failures that led to the LiteLLM compromise
How did a popular Python package for working with multiple LLMs turn into a vector for malware? This article reveals the many weak links that made it possible.
Slightly off-topic: Why open source contributions sit untouched for months on end
CPython has more than 2,200 open pull requests. The fix, according to this blog, isn’t adding more maintainers, but “changing how work flows through the one maintainer you have.”
When cloud giants neglect resilience 17 Apr 2026, 2:00 am
In a recent article chronicling the history of Microsoft Azure and its intensifying woes, we see a narrative that has been building throughout the industry for years. As cloud computing evolved from a buzzword to the backbone of digital infrastructure, major providers like Microsoft, Amazon, and Google have had to make compromises. Their promises of near-perfect uptime shifted from an expectation to “good enough,” influenced by economic pressures that have seen the cloud giants prioritize cost cuts and staff reductions over previously non-negotiable service reliability.
Frankly, many who follow the cloud space closely, including myself, have been warning about this situation for some time. Cloud outages are no longer rare, freak events. They are ingrained in the model as accepted collateral for the rapid growth and relentless cost-cutting that define this era of cloud computing. The story of Azure, as discussed in the referenced Register piece, is simply the latest and most prominent example of a much larger, industrywide trend.
This is not to say that cloud computing is inherently unstable or that its advantages—agility, scalability, rapid deployment—are a mirage. Enterprises aren’t abandoning the cloud. Far from it. Adoption continues at pace, even as these high-profile outages occur. The question is not whether the cloud is worth it, but rather, how much unreliability is acceptable for all that innovation and efficiency?
The price of cost optimization
If you trace the decisions of major public cloud players, a clear theme emerges. Competitive pressure from rivals translates to constant cost control, rushing services to market, shaving operational budgets, automating wherever possible, and reducing (or outright eliminating) teams of deeply experienced engineering talent who once ensured continuity and institutional knowledge. The comments from a former Azure engineer clearly illustrate how an exodus of talent, paired with an almost single-minded focus on AI and automation, is having downstream effects on the platform’s stability and support.
The irony is sharp: As cloud providers trumpet their AI prowess and machine-driven automation, the human expertise that built and reliably ran these platforms is no longer considered mission-critical. Automation isn’t a cure-all; companies still need experienced architects and operators who understand system limits, manage dependencies, handle failures, and respond deftly to unpredictable failures. Recent major outages reflect the slow but sure loss of that critically embedded human knowledge. Meanwhile, engineering decisions are increasingly made by those tasked with juggling ever-larger portfolios, new feature launches, and cost-reduction mandates, rather than contributing a methodical focus on resilience and craftsmanship.
Azure faces growing pains at scale, with tens of thousands of AI-generated lines of code created, tested, and deployed daily—sometimes by other AI agents —creating a self-reinforcing cycle of complexity and opacity. The resulting “compute crunch” puts even more strain on infrastructure, which, despite its sophistication, now handles heavier loads with fewer people providing oversight.
Outages aren’t driving users away
A natural question emerges: With reliability clearly taking a back seat, why aren’t enterprises reconsidering cloud altogether? I’ve argued for years that the game has changed. The benefits of cloud centralization, automation, and connectivity have become so fundamental to operations that the industry has quietly recalibrated its tolerance for outages. Public cloud is so deeply embedded into the business and digital operations that stepping back would mean undoing years, and often decades, of progress.
Headline-grabbing outages are dramatic but usually survivable. Disaster recovery plans, multi-region deployments, and architectural workarounds are now essentials for all major cloud-based companies. Building with failure in mind is a standard cost, not an avoidable exception. For most CIOs, the persistent risk of downtime is a manageable variable, balanced against the unmatchable benefits of cloud agility and in-house scale.
Providers know this well, and their actions reflect it. Outages may sting a bit in the press, but the real-world consequences have yet to outweigh the benefits to companies that push further into the cloud. As such, the providers’ logic is simple: As long as customers accept outages, however grudgingly, there’s little incentive to switch to costlier, less scalable systems.
How enterprises can adapt
With outages now the price of admission, enterprises should recognize that neither staff cuts nor the blind pursuit of automation will stop anytime soon. Cloud providers may promise improvements, but their incentives will remain focused on cost control over reliability. Organizations must adapt to this new normal, but they can still make choices that reduce their risk.
First, enterprises should prioritize fault-resistant cloud architecture. Adopting multicloud and hybrid cloud strategies, while complex, reduces the technical risk associated with reliance on a single provider.
Second, it’s crucial to invest in in-house expertise that understands both the workloads and the nuances of cloud service behavior. While the providers may treat their operations talent as expendable, nothing will replace the value of an enterprise’s in-house team to independently monitor, test, and prepare for the unexpected.
Finally, enterprises must enforce strict vendor management. This means holding providers accountable for promised service-level agreements, monitoring transparency in communication and incident reporting, and leveraging contracted services to their fullest extent, especially as the cloud market matures and customer influence grows.
The era of the infallible cloud is over. As public cloud providers pursue operational efficiency and AI dominance, resilience has taken a hit, and both providers and users must adapt. The challenge for today’s enterprises is to strategically mitigate the most likely consequences before the next outage strikes.
Anthropic’s latest model is deliberately less powerful than Mythos (and that’s the point) 16 Apr 2026, 7:33 pm
Anthropic has today released a new, improved Claude model, Opus 4.7, but has deliberately built it to be less capable than the highly-anticipated Claude Mythos.
Anthropic calls Opus 4.7 a “notable improvement” over Opus 4.6, offering advanced software engineering capabilities and improved visioning, memory, instruction-following, and financial analysis.
However, the yet-to-be-released (and inadvertently leaked) Mythos seems to overshadow the Opus 4.7 release. Interestingly, Anthropic itself is downplaying Opus 4.7 to an extent, calling it “not as advanced” and “less broadly capable” than the Claude Mythos Preview.
The Opus upgrade also comes on the heels of the launch of Project Glasswing, Anthropic’s security initiative that uses Claude Mythos Preview to identify and fix cybersecurity vulnerabilities.
“For once in technological history, a product is being released with a marketing message that is focused more on what it does not do than on what it does,” said technology analyst Carmi Levy. “Anthropic’s messaging makes it clear that Opus 4.7 is a safer model, with capabilities that are deliberately dialed down compared to Mythos.”
‘Not fully ideal’ in some safety scenarios
Anthropic touts Opus 4.7’s “substantially better” instruction-following compared to Opus 4.6, its ability to handle complex, long-running tasks, and the “precise attention” it pays to instructions. Users report that they’re able to hand off their “hardest coding work” to the model, whose memory is better than that of prior versions. It can remember notes across long, multi-session work and apply them to new tasks, thus requiring less up-front context.
Opus 4.7 has 3x more vision capabilities than prior models, Anthropic said, accepting high-resolution images of up to 2,576 pixels. This allows the model to support multimodal tasks requiring fine visual detail, such as computer-use agents analyzing dense screenshots or extracting data from complex diagrams.
Further, the company reported that Opus 4.7 is a more effective financial analyst, producing “rigorous analyses and models” and more professional presentations.
Opus 4.7 is relatively on par with its predecessor in safety, Anthropic said, showing low rates of concerning behavior such as “deception, sycophancy, and cooperation with misuse.” However, the company pointed out, while it improves in areas like honesty and resistance to malicious prompt injection, it is “modestly weaker” than Opus 4.6 elsewhere, such as in responding to harmful prompts, and is “not fully ideal in its behavior.”
Opus 4.7 comes amidst intense anticipation of the release of Claude Mythos 2, a general-purpose frontier model that Anthropic calls the “best-aligned” of all the models it has trained. Interestingly, in its release blog today, the company revealed that Mythos Preview scored better than Opus 4.7 on a few major benchmarks, in some cases by more than ten percentage points.
The Mythos Preview boasted higher scores on SWE-Bench Pro and SWE-Bench Verified (agentic coding); Humanity’s Last Exam (multidisciplinary reasoning); and agentic search (BrowseComp), while the two had relatively the same scores for agentic computer use, graduate-level reasoning, and visual reasoning.
Opus 4.7 is available in all Claude products and in its API, as well as in Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry. Pricing remains the same as Opus 4.6: $5 per million input tokens, and $25 per million output tokens.
What sets Opus 4.7 apart
Claude Opus is being branded in the industry as a “practical frontier” model, and represents Anthropic’s “most capable intelligent and multifaceted automation model,” said Yaz Palanichamy, senior advisory analyst at Info-Tech Research Group. Its core use cases include complex coding, deep research, and comprehensive agentic workflows.
The model’s core product differentiators have to do with how well-coordinated and composable its embedded algorithms are at scaling up various operational use case scenarios, he explained.
Claude Opus 4.7 is a “technically inclined” platform requiring a fair amount of deep personalization to fine-tune prompts and generate work outputs, he noted. It retains a strong lead over rival Google Gemini in terms of applied engineering use cases, even though Gemini 3.1 Pro has a larger context window (2M tokens versus Claude’s 1M tokens), although, he said, “certain [comparable] models do tend to converge on raw reasoning.”
The 4.7 update moves Opus beyond basic chatbot workflows, and positions it as more of “a copilot for complex, technical roles,” Levy noted. “It’s more capable than ever, and an even better copilot for knowledge workers.” At the same time, it poses less risk, making it a “carefully calculated compromise.”
He also pointed out that the Opus 4.7 release comes just two months after Opus 4.6 was introduced. That itself is “a signal of just how overheated the AI development cycle has become, and how brutally competitive the market now is.”
A guinea pig for Mythos?
Last week, Anthropic also announced Project Glasswing, which applies Mythos Preview to defensive security. The company is working with enterprises like AWS and Google, as well as with 30-plus cybersecurity organizations, on the initiative, and claims that Glasswing has already discovered “thousands” of high-severity vulnerabilities, including some in every major operating system and web browser.
Anthropic is intentionally keeping Claude Mythos Preview’s release limited, first testing new cyber safeguards on “less capable models.” This includes Opus 4.7, whose cyber capabilities are not as advanced as those in Mythos. In fact, during training, Anthropic experimented to “differentially reduce” these capabilities, the company acknowledged.
Opus 4.7 has safeguards that automatically detect and block requests that suggest “prohibited or high-risk” cybersecurity uses, Anthropic explained. Lessons learned will be applied to Mythos models.
This is “an admission of sorts that the new model is somewhat intentionally dumber than its higher-end stablemate,” Levy observed, “all in an attempt to reinforce its cyber risk detection and blocking bona fides.”
From a marketing perspective, this allows Anthropic to position Opus 4.7 as an ideal balance between capability and risk, he noted, but without all the “cybersecurity baggage” of the limited availability higher-end model.
Mythos may very well be the “ultimate sacrificial lamb” at the root of broader Opus 4.7 mass adoption, Levy said. Even in the “increasing likelihood” that Mythos is never publicly released, it will serve as “an ideal means of glorifying Opus as the one model that strikes the ideal compromise for most enterprise decision-makers.”
Palanichamy agreed, noting that Opus 4.7 could serve as a public-facing guinea pig to live-test and fine-tune the automated cybersecurity safeguards that will ultimately “become a mandatory precursory requirement for an eventual broader release of Mythos-class frontier models.”
This article originally appeared on Computerworld.
Salesforce launches Headless 360 to support agent‑first enterprise workflows 16 Apr 2026, 2:00 am
Salesforce is packaging its developer and AI tooling, including its vibe coding environment Agentforce Vibes, into a new platform named Headless 360, designed to help enterprise teams build agent-first workflows.
The CRM software provider defines agent-first workflows as enterprise processes in which software agents, rather than human users, carry out tasks by directly invoking APIs, tools, and predefined business logic.
To support this approach, Headless 360 exposes Salesforce’s underlying data, workflows, and governance controls as APIs, MCP tools, and CLI commands, via its existing offerings, such as Data 360, Customer 360, and Agentforce, Joe Inzerillo, president of AI technology at Salesforce, said during a press briefing.
This allows agents to operate directly on the platform’s existing business logic and datasets, rather than relying on separate integrations or user interfaces, Inzerillo added.
Push to become a control layer for enterprise AI agents
Analysts, however, see Headless 360 as an effort by Salesforce to position itself as a central layer for managing agent-driven operations across different business functions in enterprises, moving from a system of record to being the system of execution.
“Salesforce knows the center of gravity is moving toward coding agents, conversational interfaces, agent harnesses, and external runtimes, so it is trying to keep Salesforce relevant as the system underneath,” said Dion Hinchcliffe, VP of the CIO practice at The Futurum Group.
With Headless 360, Hinchcliffe added, Salesforce is trying to move its positioning beyond “AI agents inside Salesforce” to framing “Salesforce as a programmable platform for agents operating across external tools, interfaces, and environments.”
Analysts warn that CIOs need caution before adopting Headless 360.
Scott Bickley, advisory fellow at Info-Tech Research Group, said modern data stacks can replicate much of Headless 360’s functionality with more flexibility and less vendor concentration.
There are other issues that Bickley thinks should worry CIOs: “There is no mention of cost or the underlying licensing model for this ‘headless’ experience. Are all tools included at no cost?”
“Salesforce’s MO seems to be to announce new capabilities that require SKUs. CIOs should be asking about pricing now, before building in architectural dependencies on features that might land in a premium cost tier,” Bickley cautioned.
Also, the analyst pointed out that Salesforce’s announcement is silent on SLAs for operations such as MCP tool calls, which matter materially for real-time agent workflows.
Incremental gains for developers despite broader concerns
Despite these concerns, Bickley sees some of the new Headless 360 features, although undifferentiated from the competition, as offering practical benefits for developers in their daily tasks.
The analyst was referring to newer updates, such as new MCP tools that give external coding agents full access to Salesforce’s platform, the DevOps Center MCP, the Agentforce Experience Layer, and newer governance features.
Enabling full access to external coding agents, such as Claude Code and Codex, in particular, Bickley said, helps Salesforce to meet the developer where they are or let them continue using the tool of their choice.
“Historically, developers were forced into Salesforce’s proprietary toolchain that included clunky VS Code extensions, painful metadata APIs, and quirky development pipelines that required Salesforce-specific expertise. Expanding the dev environment helps alleviate this pain,” Bickley pointed out.
The other updates, according to Hinchcliffe, should help curtail developer friction by helping avoid frequent switching between development tools, expanding real-time awareness of organization data, reducing the need for custom plumbing to expose business logic, and decreasing the effort needed to move from prototype to deployment.
Focusing specifically on the new DevOps Center MCP, which is a set of AI-powered tools that enable the use of natural language across the entire DevOps lifecycle, Bickley said that it will help developers alleviate pains around CI/CD processes.
“Salesforce development pipelines are notoriously fragile with metadata dependencies, org-specific configurations, artificial limits on work items, and UI response issues, among others,” Bickley added.
Concerns around the maturity of governance capabilities
The governance tools, specifically the updates to the Testing Center, Custom Scoring Evals, Session Tracing, and A/B Testing API, according to Hinchcliffe, too, address real gaps that enterprise development teams face, especially moving agentic workflows or applications into production.
“Salesforce is correctly identifying that enterprise agent adoption will stall unless buyers can properly measure, govern, debug, and tune agent behavior over time,” the analyst said.
However, Bickley cautioned about the efficacy of these tools, as most of these tools are in the very early stages of their release. In fact, the analyst suggested that enterprises should expect to supplement these tools with their own evaluation frameworks for the next 12-18 months.
The analyst also flagged additional concerns around newer components such as the Agentforce Experience Layer, which is a new UI service that allows developers to decouple what an agent does from how it surfaces across various services and applications.
“Ironically, this adds yet another layer to contend with in the development process for what is already considered a painful development experience. Salesforce has a pattern of shipping v1 tools that work great in demos but fall in real-world scenarios,” Bickley said.
“Development teams intending to avail themselves of these new feature sets should insist that Salesforce provide them an extended pilot and sandbox free of charge to validate the maturity level and ease of use of these new features,” Bickley added.
All the updates to Headless 360, Salesforce said, are expected to be released in phases. Generally available features include Agentforce Vibes 2.0, the DevOps Center MCP, Session Tracing, and the Agentforce Experience Layer. Features that are in early access include Custom Scoring Evals. Other features, such as the Testing Center and the Salesforce Catalog, are scheduled for rollout in May and June, respectively.
This story has been updated to correctly identify the Agentforce Experience Layer product and to remove remarks by an analyst about Headless 360’s software dependencies.
The agent tier: Rethinking runtime architecture for context-driven enterprise workflows 16 Apr 2026, 2:00 am
Most large enterprises run on deterministic software foundations. Business rules are embedded within workflows, state transitions are modeled explicitly and escalation paths are defined in advance. System behavior is specified in advance, making outcomes predictable. Meaningful scenarios are encoded as conditional branches and validated before release. For decades, this approach has delivered the reliability and control required for mission-critical operations.
This model assumes most situations can be anticipated and expressed in logic. It works well when variation is limited and conditions remain manageable. If new requirements can be added as workflow branches, the structure holds. It begins to strain when processes must respond to context — not just thresholds, but the broader circumstances of a case.
In my experience, customer onboarding in banking makes this tension visible. Onboarding sits at the intersection of digital channels, fraud detection, regulatory obligations and revenue goals. It must satisfy Know Your Customer (KYC) and Anti-Money Laundering (AML) requirements while minimizing abandonment and resisting synthetic identity attacks.
During my involvement in digital account opening initiatives at a major North American bank, cross-functional design sessions repeatedly surfaced the same trade-off. Product teams pushed to reduce friction and improve conversion while fraud teams responded to bot-driven account creation and mule schemes with additional safeguards. Compliance insisted regulatory standards be met without exception and engineering absorbed each new requirement into the orchestration framework. Individually, these decisions were rational. Collectively, they made the workflow more complex.
The underlying challenge was not a shortage of rules but expressing contextual judgment within a static branching structure. Differentiation occurred only at predefined checkpoints and information was often collected in bulk rather than adapting to known facts. Collect too little and the institution risks regulatory exposure or fraud; collect too much and abandonment rises. Attempt to encode every variation as additional branches and the workflow becomes increasingly fragile.
Adaptive scoring and contextual models can complement deterministic logic. Rather than enumerating every scenario in advance, they help determine whether additional verification is warranted or whether progression can continue with existing evidence. Deterministic workflows still enforce regulatory requirements and final state transitions; the adaptive layer informs how the system navigates toward those outcomes.
Although onboarding illustrates the issue clearly, the same pattern appears in credit adjudication, claims processing and dispute management. As adaptive signals enter these workflows, the architectural question shifts from adding branches to deciding where contextual judgment should reside. In my view, what is missing is not another conditional path but a different runtime model — one that interprets context and determines the next appropriate action within defined limits. This architectural layer, which I refer to as the Agent Tier, separates contextual reasoning from deterministic execution.
Introducing the agent tier: Separating execution from contextual judgment
In many enterprises, orchestration logic does not reside in a formal workflow platform. It is embedded in SPA applications, implemented in APIs, supported by rule engines and coordinated through service calls across systems. User journeys are assembled through API calls in predefined sequences, with eligibility or routing conditions evaluated at specific checkpoints.
This approach works well for repeatable, well-understood paths. When inputs are complete, risk signals are low and no exception handling is required, the clean path can be executed deterministically. State transitions are known in advance. Service calls follow predictable patterns. Human tasks are invoked at predefined points.
The difficulty arises when the workflow encounters ambiguity. Inputs may be incomplete. Signals may require interpretation rather than simple threshold comparison. Multiple systems may need to be coordinated in a sequence not explicitly modeled. Attempting to encode every such situation into SPA logic or orchestration APIs leads to increasingly complex condition trees and harder-to-maintain code. Instead of expanding hard-coded branching indefinitely, the runtime separates into two complementary lanes: Repeatable execution and contextual reasoning.
Conceptually, the enterprise runtime evolves into a two-lane structure, illustrated below.

Nitesh Varma
The deterministic lane retains control over authoritative state changes and rule enforcement. It manages eligibility checks, applies regulatory criteria, invokes known service sequences and finalizes cases in core systems. It continues to handle most predictable scenarios.
The runtime invokes the Agent Tier when contextual judgment is required. This may occur when additional evidence must be gathered before a rule can be evaluated, when multiple signals must be interpreted together rather than independently or when coordination across systems cannot be expressed through a fixed sequence. It evaluates available actions and returns a bounded recommendation that allows deterministic execution to resume.
The movement between lanes is explicit. The deterministic workflow hands off when it reaches a point where static branching is insufficient. The Agent Tier performs synthesis or dynamic coordination. Once the Agent Tier produces a structured result, such as a completed evidence bundle, a validated set of inputs or a recommended next step, control returns to the deterministic lane for controlled progression and final state transition.
This separation allows incremental adoption. Existing SPA logic and orchestration APIs remain intact; ambiguity points can be redirected to the Agent Tier without destabilizing deterministic execution.
What happens inside the agent tier
The Agent Tier is not a single “AI decision.” It is a structured reasoning cycle that combines interpretation with controlled action.
When the deterministic workflow hands off a case, the Agent Tier interprets the current situation by assembling available context — user inputs, existing customer relationships, fraud signals, journey state and relevant policy constraints. Based on that composite view, it selects the next action from an approved set of enterprise capabilities. That action might involve retrieving additional information, invoking a verification service, requesting clarification from the user or coordinating multiple systems in sequence. Once the action completes, the result is evaluated and the cycle continues until deterministic execution can resume.
This alternating pattern of reasoning and action is common in agentic system design. In technical literature, it is often referred to as the ReAct (Reason and Act) pattern, which interleaves reasoning steps with structured action selection. Rather than attempting to reach a final answer in a single pass, the system gathers evidence, reassesses its position and proceeds incrementally. In enterprise settings, this pattern becomes a disciplined way to manage contextual interpretation.
Reasoning in the Agent Tier does not involve free-form system access. It proceeds through approved operations exposed via governed interfaces. In practice, these tools are enterprise primitives such as:
- APIs that retrieve or update enterprise data
- event triggers that initiate downstream processing
- workflow actions that advance a case
- controlled service calls into core or third-party systems
Each operation is defined by explicit input/output contracts and permission boundaries and carries metadata describing its purpose and constraints. The runtime selects from this governed catalog — a mechanism commonly referred to as tool calling. Some frameworks further group related tools into higher-level capabilities known as skills, reusable functions for objectives such as identity verification or KYC evidence assembly.
Before control returns to the deterministic lane, the agentic runtime can also perform a structured self-check. It can verify that required conditions are satisfied, confirm alignment with policy constraints and ensure that any necessary approvals have been identified. In technical discussions, this is often described as reflection.
Taken together, these patterns do not introduce unchecked autonomy. They provide a structured way to manage contextual synthesis and dynamic coordination without allowing adaptive logic to diffuse across SPA code and orchestration services. Deterministic systems continue to enforce authoritative state transitions. The Agent Tier prepares the conditions under which those transitions occur.
In many implementations, the Agent Tier does not directly control the workflow. Instead, it recommends the next step based on the available context. The deterministic tier remains responsible for execution. After each step is completed — retrieving evidence, invoking a verification service or preparing a review case — the updated context is returned to the Agent Tier, which evaluates the new state and recommends the next action. In this model, contextual reasoning informs progression while deterministic systems continue to enforce authoritative state transitions.
Returning to the onboarding example, the Agent Tier changes how the journey adapts to each applicant. The deterministic tier still executes core steps such as creating the customer profile, enforcing regulatory checks and committing account state in core systems. The Agent Tier evaluates the evolving context — customer relationships, fraud signals, identity verification results and available documentation — and recommends whether the workflow can proceed along the clean path, trigger additional verification or escalate to manual review. The result is not a new onboarding process but a workflow that adapts its progression dynamically while preserving the deterministic controls required for regulated operations.
Conceptually, the interaction between contextual reasoning and deterministic execution can be understood as a simple runtime loop, as illustrated below.

Nitesh Varma
The workflow progresses through a continuous loop in which contextual reasoning recommends the next step, deterministic systems execute it and the resulting context feeds back into the next recommendation.
Governing adaptive systems without losing control
Separating contextual reasoning from deterministic execution clarifies responsibility but does not eliminate risk. In regulated environments, adaptive sequencing must operate within explicit governance boundaries.
The trust and operations overlay represents cross-cutting controls across the runtime: Audit logging, approval gates, observability, security enforcement and lifecycle management. Within this structure, authoritative state transitions remain deterministic. Core systems continue to create client profiles, enforce limits, record disclosures and apply regulatory thresholds. The Agent Tier may influence progression, but final state changes occur only through controlled interfaces.
This containment boundary preserves explainability. When progression changes — for example, when additional verification is triggered or escalation occurs — institutions must be able to reconstruct why. Which signals were assembled? Which tools were invoked? What reasoning produced the recommendation? Concentrating contextual evaluation within a defined runtime layer makes that traceability possible.
Operational experience reinforces the need for these guardrails. Engineering discussions of production agent systems emphasize constrained tool access, explicit action catalogs, bounded iteration and strong observability. In enterprise environments, contextual reasoning must likewise operate through governed tools and visible control points.
Approval gates remain part of this structure. High-risk actions such as credit issuance, account restrictions, large payments or regulatory filings may still require human authorization regardless of how the progression was determined. Reflection inside the Agent Tier can validate readiness, but authorization remains explicit.
Lifecycle discipline is equally important. Changes to models, identity providers, tool contracts or orchestration logic can alter workflow behavior. The Agent Tier should therefore operate as a governed platform capability with versioned reasoning logic, controlled tool catalogs and defined testing and rollback mechanisms.
The objective is not to eliminate probabilistic reasoning but to contain it within observable workflows and governed boundaries. As adaptive capabilities expand, the architectural question is not whether contextual reasoning will exist, but whether it is diffused across the stack or concentrated within a controlled runtime layer.
Architectural leadership in an adaptive era
Introducing an Agent Tier adds a new runtime component, but enterprise complexity is not new; it is already dispersed across channel code, orchestration services, rule engines and proliferating conditional branches. The architectural question is not whether complexity exists, but where it resides. As fraud models evolve, verification technologies improve and regulatory expectations shift, adaptive capabilities will continue to expand.
I believe architecture must evolve from enumerating state transitions to defining containment boundaries. Deterministic systems enforce regulatory and operational requirements and remain responsible for authoritative state changes. Adaptive reasoning operates within explicit policy constraints and informs how workflows progress toward those outcomes. Instead of encoding every possible path in advance, enterprises can move toward context-driven workflows in which deterministic execution handles authoritative actions while the Agent Tier determines the next appropriate step based on evolving context.
This evolution does not require wholesale reinvention. It can begin with a single high-impact workflow where contextual variability is already evident. By introducing a disciplined runtime layer that mediates uncertainty while preserving deterministic control, organizations can modernize incrementally. In that sense, the Agent Tier is not simply a new feature; it is a structural response to a changing runtime reality, one that allows adaptive systems to operate within clear architectural and governance boundaries.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Ease into Azure Kubernetes Application Network 16 Apr 2026, 2:00 am
If you’re using Kubernetes, especially a managed version like Azure Kubernetes Service (AKS), you don’t need to think about the underlying hardware. All you need to do is build your application and it should run, its containers managed by the service’s orchestrator.
At least that’s the theory. However, implementing a platform that abstracts your code from the servers and network that support it brings its own problems, and a whole new discipline. Platform engineers fill the gap between software and hardware, supporting security and networking, as well as managing storage and other key services.
Kubernetes is part of an ecosystem of cloud-native services that provide the supporting framework for running and managing scalable distributed systems, including the tools needed to package and deploy applications, as well as components that extend the functionality of Kubernetes’ own nodes and pods.
Key components of this growing ecosystem are the various service meshes. These offer a way to manage connectivity between nodes and between your applications and the outside network, with tools for handling basic network security. Often implemented as “sidecar” containers, running alongside Kubernetes pods, these network proxies can consume added resources as your applications scale. That means more configuration and management, ensuring that configurations are kept up-to-date and that secrets are secure.
Istio goes ambient
One of the key service mesh implementations, Istio, has developed an alternate way of operating, what the project calls “ambient mode”. Here, instead of having individual sidecars for each pod, your service mesh is implemented as per-node proxies or as a single proxy that supports an entire Kubernetes namespace. It’s an approach that allows you to start implementing a service mesh without increasing the complexity of your platform, making it easy to go from a basic development Kubernetes implementation to a production environment without having to change your application pods.
It’s called ambient mode because there’s no need to add new service mesh elements as your application scales. Instead, the service mesh is always there, and your pods simply join it and take advantage of the existing configuration. The resulting implementation is both easier to use and easier to understand.
Microsoft has used Istio as part of Azure Kubernetes Service for many years. Istio is one of a suite of open-source tools that provide the backbone of Azure’s cloud-native computing platform.
Introducing Azure Kubernetes Application Network
So, it’s not surprising to learn that Microsoft is using Istio’s ambient mesh as the basis of Azure Kubernetes Application Network. The new service (available in preview) allows application developers to add managed network services to their applications without needing the support of a platform engineering team to implement a service mesh. It will even help you migrate away from the now-deprecated ingress-nginx by providing access to the recommended Kubernetes Gateway API without needing more sidecars and letting you use your existing ingress-nginx configurations while you complete your migration.
Microsoft describes the preview of Azure Kubernetes Application Network as “a fully managed, ambient-based service network solution for Azure Kubernetes Service (AKS).” The underlying data and control planes are managed by AKS, so all you need to do is connect your AKS clusters to an Application Network and AKS will then manage the service mesh for you, without any changes to your applications.
Like other implementations of Istio’s ambient mesh, there are two levels to Application Network: a core set of node-level application proxies that handle connectivity and security for application services, and an optional set of lower-level proxies that support routing and apply network policies, acting as a software-defined network inside your Kubernetes environment.
This approach lets you build and test a Kubernetes application on your local development hardware without using Application Network features, then deploy it to AKS along with the required network configuration — simplifying both development and deployment. It also reduces development overheads, both in compute and developer resources.
Using Azure Kubernetes Application Network
Once deployed Application Network connects the services in your application securely, managing encrypted connections automatically and managing the required certificates. It can support unencrypted connections, for when you aren’t sending confidential data and don’t need the associated overhead. As the service is managed by AKS, new pods are automatically provisioned as they are deployed, with the ambient mesh supporting both scale-up and scale-down operations.
The architecture of Application Network is much like that of an Istio ambient mesh. The main difference is that the service’s management and control planes are managed by Azure, with application owners limited to working with the service’s data plane, configuring operations and setting policies for their application workloads. Azure’s control of the management plane automates certificate management, ensuring that connections stay secure and there is little risk of certificate expiration, using the tools built into Azure Key Vault.
The Application Network data plane holds proxies and gateways used by the service mesh, and these are deployed when the service is launched, along with the required Kubernetes configurations. The key to operation is ztunnel, a proxy that intercepts inter-service requests, secures the connection, and routes requests to another ztunnel running with the destination service. A gateway oversees connections between ztunnels running in remote clusters, allowing your service mesh to scale out with demand.
Building your first ambient service mesh in AKS
Getting started with Azure Kubernetes Application Network requires the Azure CLI. If you’re working with an existing AKS cluster, then you will need to enable integration with Microsoft Entra and enable OpenID Connect.
As the Application Network service is in preview, start by registering it in your account. This can take some time, but once it’s registered you can install the AppNet CLI extension that’s used to manage and control Application Network for your AKS clusters. You can now start to set up the ambient service mesh, either creating new clusters to use it, or adding the service mesh to existing AKS deployments.
Starting from scratch is the easiest way, as it ensures that you’re running in the same tenant. AKS clusters and Application Network can be in the same resource group if you want, but it’s not necessary. You’re free to use separate resource groups for management.
The appnet command makes it easy to create an Application Network from the command line; all you need is a name for the network, a resource group, a location, and an identity type. Once you’ve run the command to create your ambient mesh, wait for the mesh to be provisioned before joining a cluster to your network. This again simply needs a resource group, a name for the member cluster, and its resource group and cluster name. At the same time, you define how the network will be managed, i.e. whether you manage upgrades yourself or leave Azure to manage them for you. Additional clusters can be added to the network the same way.
With an Application Network and member clusters in place, the next step is to use Kubernetes’ own tooling to add support for the ambient mesh to your applications. Microsoft provides a useful example that shows how to use Application Network with the Kubernetes Gateway API to manage ingress. You need to use kubectl and istioctl commands to enable gateways and verify their operation, adding services and ensuring that they are visible to each other through their respective ztunnels.
Securing applications with policies
Policies can be used to control access from the application ingress to specific services as well as between services, reducing the risk of breaches and ensuring that you control how traffic is routed in your application. These policies can be locked down to ensure only specific methods can be used, so only allowing HTTP GET operations on a read-only service, and POST where data needs to be delivered. Other options can be used to enforce OpenID Connect authorization at a mesh level.
Not all Azure Kubernetes clusters are supported in the preview, which is only available in Azure’s largest regions. For now, Application Network won’t work with private clusters or with Windows node pools. Once running you can’t switch upgrade modes, and as it’s based on Istio, you can’t enable Istio service meshes in your cluster. These requirements aren’t showstoppers, and you should be able to get started experimenting with the service as it’s still in preview.
AKS Application Network is a powerful tool that helps simplify and secure the process of building and running inter-cluster networks in an AKS application. As it is an ambient service, it’s possible to scale as necessary, and can help provide secure bridges between clusters. By working at a Kubernetes level, it’s possible to use Application Network to provide policy driven production network rules, allowing developers to build and test code in unrestricted environments before moving to test and production clusters.
As Application Network uses familiar Kubernetes and Istio constructions, it’s possible to build configurations into Helm charts and other deployment tools, ensuring configurations are part of your build artifacts and that network configurations and policies are delivered with your code every time you push a new build – without needing platform engineering support.
The two-pass compiler is back – this time, it’s fixing AI code generation 16 Apr 2026, 2:00 am
If you came up building software in the 1990s or early 2000s, you remember the visceral satisfaction of determinism. You wrote code. The compiler analyzed it, optimized it, and emitted precisely the machine instructions you expected. Same input, same output. Every single time. There was an engineering rigor to it that shaped how an entire generation of developers thought about building systems.
Then large language models (LLMs) arrived and, almost overnight, code generation became a stochastic process. Prompt an AI model twice with identical inputs and you’ll get structurally different outputs—sometimes brilliant, sometimes subtly broken, occasionally hallucinated beyond repair. For quick prototyping that’s fine. For enterprise-grade software—the kind where a misplaced null check costs you a production outage at 2am—it’s a non-starter.
We stared at this problem for a while. And then something clicked. It felt familiar, like a pattern we’d encountered before, buried somewhere in our CS fundamentals. Then it hit us: the two-pass compiler.
A quick refresher
Early compilers were single-pass: read source, emit machine code, hope for the best. They were fast but brittle—limited optimization, poor error handling, fragile output. The industry’s answer was the multi-pass compiler, and it fundamentally changed how we build languages. The first pass analyzes, parses, and produces an intermediate representation (IR). The second pass optimizes and generates the final target code. This separation of concerns is what gave us C, C++, Java—and frankly, modern software engineering as we know it.

The structural parallel between classical two-pass compilation and AI-driven code generation.
WaveMaker
The analogy to AI code generation is almost eerily direct. Today’s LLM-based tools are, architecturally, single-pass compilers. You feed in a prompt, the model generates code, and you get whatever comes out the other end. The quality ceiling is the model itself. There’s no intermediate analysis, no optimization pass, no structural validation. It’s 1970s compiler design with 2020s marketing.
Applying the two-pass model to AI code generation
Here’s where it gets interesting. What if, instead of asking an LLM to go from prompt to production code in one shot, you split the process into two architecturally distinct passes—just like the compilers that built our industry?
Pass 1 is where the LLM does what LLMs are genuinely good at: understanding intent, decomposing design, and reasoning about structure. The model analyzes the design spec, identifies components, maps APIs, resolves layout semantics—and emits an intermediate representation, an IR. Not HTML. Not Angular or React. A well-defined meta-language markup that captures what needs to be built without committing to how.
This is critical. By constraining the LLM’s output to a structured meta-language rather than raw framework code, you eliminate entire categories of failure. The model can’t inject malformed tags if it’s not emitting HTML. It can’t hallucinate nonexistent React hooks if it’s outputting component descriptors. You’ve reduced the stochastic surface area dramatically.
Pass 2 is entirely deterministic. A platform-level code generator—no LLM involved—takes that validated intermediate markup and emits production-grade Angular, React, or React Native code. This is the pass that plugs in battle-tested libraries, enforces security patterns, and applies framework-specific optimizations. Same IR in, same code out. Every time.
First pass gives you speed. Second pass gives you reliability. The separation of concerns is what makes it work.
Why this matters now
The advantages of this architecture compound in exactly the ways that matter for enterprise development. The meta-language IR becomes your durable context for iterative development—you’re not re-prompting the LLM from scratch every time you refine a component. Security concerns like script injection and SQL injection are structurally eliminated, not patched after the fact. Hallucinated properties and tokens get caught and stripped at the IR boundary before they ever reach generated code. And because Pass 2 is deterministic, you get reproducible, auditable, deployable output.
| Pass 1 — LLM-powered • Translates design/spec to structured components and design tokens • Enables iterative dev with meta-markup as persistent context Eliminates script/SQL injection by design | Pass 2 — Deterministic • Generates optimized, secure, performant framework code • Validates and strips hallucinated markup and tokens Plugs in battle-tested libraries for reliability |
If you’ve spent your career building systems where correctness isn’t optional, this should resonate. The industry spent decades learning that single-pass compilation couldn’t produce reliable software at scale. The two-pass architecture wasn’t just an optimization, but an engineering philosophy: separate understanding from generation, validate before you emit, and never let a single phase carry the entire burden of correctness.
We’re at the same inflection point with AI code generation right now. The models are powerful. The architecture around them has been naive. The fix isn’t to wait for a smarter model. It’s to apply the engineering discipline we’ve always known, and build systems where stochastic brilliance and deterministic reliability each do what they do best—in the right pass, at the right time.
Deterministic software engineering is cool again. Turns out it never really left.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
MuleSoft Agent Fabric adds new ways to keep AI agents in line 15 Apr 2026, 11:20 am
Salesforce first sought to tackle AI agent sprawl last year with Agent Fabric, a suite of capabilities and tools inside its MuleSoft AnyPoint Platform. Now, it’s seeking to further rein in unruly AI agents on its platform and those of other vendors too, with new governance tools and deterministic controls.
When enterprises adopt multiple agentic AI products, they can end up redundant or siloed workflows or scattered across teams and platforms, undermining operational efficiency and complicating governance as they try to scale AI safely and responsibly.
Agent Fabric, introduced in September 2025, started out as a place for enterprises to register, view, interconnect and govern agents. In January it added a deterministic scripting tool and the ability to scan for new agents and add them to the registry.
But enterprises still need more help to bring their AI agents under control, so Salesforce is adding more features.
First up is an expansion of the deterministic controls in the form of Agent Script for Agent Broker, an intelligent routing service inside Agent Fabric that is designed to connect agents across domains, dynamically matching user tasks with the best-fit agent. Salesforce said the controls will help developers codify workflows in multi-agent systems in order to ensure consistent and reliable outputs.
Rather than leave probabilistic agents to make all the decisions about how to resolve a problem, introducing an element of unpredictability, Agent Script for Agent Broker enables enterprises to steer some of the decision-making according to predetermined rules that require fewer computing resources than running a large language model.
That’s welcome news for Robert Kramer, managing parter at KramerERP.
“Pure autonomous agents don’t necessarily work in production as enterprises need to ensure predictable outcomes. The deterministic controls should facilitate a secure handoff of control and rules while still allowing the model to engage in reasoning when it’s appropriate,” he said. “It’s a balance between control and flexibility, which is the norm for most real deployments.”
For Rebecca Wettemann, principal analyst at Valoir, providing both deterministic and probabilistic options within Agent Fabric enables developers and agent builders to take the lower-cost route to more accurate and predictable results from agentic systems.
Enterprises will have to wait to put this deterministic orchestration feature into production, though: Still in beta testing, it won’t be generally available until June 2026.
Centralized LLM governance tackles cost
Beyond orchestration, Salesforce has added a new LLM Governance capability in AI Gateway, the control layer within Agent Fabric that provides centralized visibility of token usage, costs, and data flows for third-party model.
Enterprises will be able to use LLM Governance, now generally available, to help them keep their AI operations on budget, Salesforce said.
This is becoming increasingly important as CIOs seek to bring disparate AI systems under centralized control and justify spiralling AI costs.
Info-Tech Research Group advisory fellow Scott Bickley warned that without centralized governance like this, different teams around a company may choose different models, negotiate their own API contracts, and manage token budgets locally.
“This results in sprawling costs, inconsistent security postures, and no enterprise-wide policy enforcement,” he said. “By positioning AI Gateway as the choke point through which all LLM traffic flows, enterprises gain visibility into AI usage patterns, the models in use, purpose of the usage, and cost data.”
MCP additions simplify integration
Salesforce is also adding new Model Control Protocol features, including MCP Bridge to make it easier to access legacy APIs, and Informatica-hosted MCPs, that it says will simplify how agents interact with enterprise data and APIs.
These could save developers time and simplify the building of cross-environment, multi-agent systems.
Bickley said MCP Bridge will help enterprises with thousands of legacy APIs (REST, SOAP, GraphQL) built long before MCP existed.
“Agents speaking MCP cannot call those APIs natively so they require wrappers around the API endpoint; this would be a massive engineering lift. MCP Bridge allows these APIs to be exposed as MCP-compatible tools without modifying the underlying code,” he said.
And Wettemann said Informatica-hosted MCPs will further reduce development overhead by bringing built-in data quality and governance capabilities into agent workflow, particularly critical for enterprises in regulated industries and those with heightened risk concerns.
But Bickley added a note of caution. “APIs can behave oddly and have their own nuanced behavior,” he said. “Enterprises should test how MCP Bridge handles edge cases.”
Informatica-hosted MCPs will not be a miracle solution either, he warned: “Even if the Informatica data quality and governance capabilities are cleanly integrated in the Agent Fabric registry, these are not instantaneous operations. Checking data fields for accuracy, deduplication, and cross-system matching take time and carry latency measured in milliseconds or even multiple seconds, and that is pre-integration.”
A pivot for MuleSoft?
Bickley sees the updates as a broader strategy for Salesforce to reposition MuleSoft, which it acquired in 2018 for $5.7 billion, from a traditional API integration platform to an infrastructure layer for enterprise AI agents.
By layering orchestration, governance, and connectivity into Agent Fabric, Salesforce appears to be trying to position MuleSoft as the system of record for how agents are discovered, routed, and governed across the enterprise, deepening its role beyond API management into core AI infrastructure, he said.
Not all CIOs will welcome that move.
“If your agent control plane runs on Agent Fabric, switching costs rise materially, and the more agents you register, the more orchestration rules and governance policies defined, the more difficult it becomes to move to an alternative solution,” the analyst said.
As with any critical infrastructure dependency, “CIOs need to ask: What is the exit path? What components of Agent Fabric are portable and what is locked in? What’s the pricing model? What is the integration depth with non-Salesforce agents and data sources?” he said.
For now, though, enterprises have plenty of AI agent orchestration options to choose from.
Tap into the AI APIs of Google Chrome and Microsoft Edge 15 Apr 2026, 2:00 am
With every passing year, local AI models get smaller, more efficient, and more comparable in power with their higher-end, cloud-hosted counterparts. You can run many of the same inference jobs on your own hardware, without needing an internet connection or even a particularly powerful GPU.
The hard part has been standing up the infrastructure to do it. Applications like ComfyUI and LM Studio offer ways to run models locally, but they’re big third-party apps that still require their own setup and maintenance. Wouldn’t it be great to run local AI models right in the browser?
Google Chrome and Microsoft Edge now offer that as a feature, by way of an experimental API set. With Chrome and Edge, you can perform a slew of AI-powered tasks, like summarizing a document, translating text between languages, or generating text from a prompt. All of these are accomplished with models downloaded and run locally on demand.
In this article I’ll show a simple example of Chrome and Edge’s experimental local AI APIs in action. While both browsers are in theory based on the same set of experimental APIs, they do support different varieties of functionality, and use different models. For Chrome, it’s Gemini Nano; for Edge, it’s the Phi-4-mini models.
The following demo of the Summarizer API works on both browsers, although the performance may differ between them. In my experience, Summarizer ran significantly slower on Edge.
The available AI APIs in Chrome and Edge
Chrome and Edge share a common codebase — the Chromium project — and the AI APIs available to both stem from what that project supports. As of April 2026, the available AI APIs in Chrome are:
- Translator API: Translate text from one language to another, assuming a model is available for that language pair.
- Language Detector API: Determine the language for a given input text.
- Summarizer API: Condense text into headlines, summaries, and bullet-point rundowns.
All three of these APIs are available immediately to Chrome users. All except the language detector API are also available to Edge users, although that is planned for future support.
Several other APIs, which are in a more experimental state, are available in both browsers on an opt-in basis:
- Writer API: Generate text from a given prompt.
- Rewriter API: Rewrite an existing text based on instructions from a prompt.
- Prompt API: Make natural language requests directly to the model (e.g., “Search the web for up-to-date information about visiting Italy”).
- Proofreader API: Examine a text for spelling and grammatical errors and suggest corrections.
The long-term ambition is to have these APIs accepted as general web standards, but for now they’re specific to Chrome and Edge.
Using the Summarizer API
We’ll use the Summarizer API as an example for how to use these APIs generally. The Summarizer API is available on both Chrome and Edge, and the way it’s used serves as a good model for how the other APIs also work.
First, create a web page which you’ll access through some kind of local web server. If you have Python installed, you can create an index.html file in a directory, open that directory in the terminal, and use py -m http.server to serve the contents on port 8080. You can’t, and shouldn’t, try to open the web page as a local file, as that may cause content-restriction rules to kick in and break things.
Here’s the source code of the page to create:
div style="display: flex;">
textarea style="width:50%; height:24em" id="input" placeholder="Type text to be summarized">textarea>br>
textarea style="width:50%; height:24em" id="output" placeholder="Summarization results">textarea>br>
div>
textarea style="width:100%; height:4em" id="context" placeholder="Additional context">textarea>
label for="type">Type of summarization:label>
select id="type" name="type">
option value="teaser">Teaseroption>
option value="tldr">tl;droption>
option value="headline">Headlineoption>
option value="key-points">Key pointsoption>
select>
label for="length">Length:label>
select id="length" name="length">
option value="short">Shortoption>
option value="medium">Mediumoption>
option value="long">Longoption>
select>
button type="button" onclick="go();">Startbutton>
div style="background-color:beige" id="log">div>
script>
const $log = document.getElementById("log")
const $input = document.getElementById("input")
const $output = document.getElementById("output")
const $context = document.getElementById("context")
const $type = document.getElementById("type")
const $length = document.getElementById("length")
function log(text) {
$log.innerHTML += text + "
";
}
async function summarize() {
$log.innerHTML = "";
if (!'Summarizer' in self) {
log("Summarizer not available")
return false
};
const availability = await Summarizer.availability();
log(`Summarizer status: ${availability}`);
const summarizer = await Summarizer.create({
sharedContext: $context.value,
type: $type.value,
length: $length.value,
format: 'markdown',
monitor(m) {
m.addEventListener('downloadprogress', (e) => {
log(`Downloaded ${e.loaded * 100}%`);
});
}
});
log("Summarizer created, starting summarization");
$output.value = "";
const stream = summarizer.summarizeStreaming($input.value)
for await (const chunk of stream) {
$output.value += chunk;
}
log("Finished.")
}
function go() {
summarize();
}
script>
Most of what we want to pay attention to is in the summarize() function. Let’s walk through the steps.
Step 1: Verify the API is available
The line if (!'Summarizer' in self) will determine if the summarizer API is even available on the browser. The follow-up, const availability = await Summarizer.availability(); returns the status of the model required for the API:
downloadable: The model needs to be downloaded, so you’ll want to provide some kind of progress feedback for the download. (The above code has an example of how this could be implemented, via themonitor()function passed to theSummarizer.create()method.)available: The model is on the device and can be used right away.
Step 2: Create the Summarizer object
The next step is to create the Summarizer object, which can take several parameters:
sharedContext: A text which gives the summarizer additional context for how to do its work (e.g. “Format the output as a bullet list of questions”).type: One of four values that describes the format for the summary.teasertries to create interest in the text’s contents without revealing full details;tldrprovides a quick and concise summary, no more than a sentence or two;headlinegenerates a suitable headline for the text; andkey-pointsproduces a bullet list of takeaways.length: One ofshort,medium, orlong; this parameter controls how long the output should be.format: The format of the input text.markdownis the default; another allowed value isplain-text. If you are using HTML as your source, you may want to use.innerTextto derive a text-only version of the input.
Step 3: Stream and iterate over the output
Most of the time, we want to see the output streamed a token at a time, so we have some sense that the model is working. To do this, we use const stream = summarizer.summarizeStreaming($input.value) to create an object we can iterate over ($input.value is the text to summarize). We then use for await (const chunk of stream){} to iterate over each chunk and add it to the $output field.
Here’s an example of some input and output:

Example output for built-in text summarizer AI model in Chrome and Edge. The model runs entirely on the device hosting the browser and does not call out to an external service to deliver its results.
Foundry
Caveats for using Summarizer (and other local AI APIs)
The first thing to keep in mind is that the model will take some time to download on first use. The sizes of the models vary, but you can expect them to be in the gigabyte range. That’s why it’s a good idea to provide some kind of UI feedback for the download process. Ideally, you’d want to provide some way to run the model download process and then ping the user when it’s ready for use.
Once models are downloaded, there’s no programmatic interface to how they’re managed — at least, not yet. On Google Chrome there’s a local URL, chrome://on-device-internals/, that shows which models have been loaded and provides statistics about them. You can use this page to remove models manually or inspect their stats for the sake of debugging, but the JavaScript APIs don’t expose any such functionality.
When you start the inference process, there may be a noticeable delay between the time the summarization starts and the appearance of the first token. Right now there’s no way for the API to give us feedback about what’s happening during that time, so you’ll want to at least let the user know the process has started.
Finally, while Chrome and Edge support a small number of local AI APIs now, how the future of browser-based local AI will play out is still open-ended. For instance, we might see a more generic standard emerge for how local models work, rather than the task-specific versions shown here. But you can still get going right now.
Where will developer wisdom come from? 15 Apr 2026, 2:00 am
I am a completely self-taught software developer. I’ve never taken a computer science course in my life. I was lucky enough to attend a junior high school in the 1970s that taught me BASIC. I loved it, and used to stay after school to write and play simple text-based games.
Now this may be hard to believe, but at that time, being a computer nerd wasn’t as cool as it is today, so I left it alone until the 1990s when the PC revolution was getting underway. Shareware was all the rage, and Windows was brand new. I combined the two to write some modest little applications in Turbo Pascal for Windows that had some success.
Looking back on those apps, the code I wrote was really, really bad. Like “I had no idea about passing parameters to functions, so everything was a global variable” kind of bad. I distinctly remember an enormous struggle with strings because I had no idea that I needed to explicitly allocate memory for them. I banged my head against the keyboard for many hours trying to get things to work.
Becoming developer wise
There was no Internet then, so I learned by asking questions on CompuServe forums and reading books. But mostly I learned through experimentation, trial and error, and by actually doing — by writing code, failing, and trying new ideas. Lightbulbs would go off in my head as I started to notice how to pass parameters to move information from one place to another, or when I finally understood object-oriented programming.
Eventually, I moved from asking to answering questions online and from reading to writing books on good software development techniques. I figured it out, and now I like to think I know what I’m doing when it comes to designing good systems.
This is all a rather long-winded way of saying that it took many years, but I somehow acquired wisdom in the domain of developing software. I didn’t need classroom training to get it, and I’m not sure that one can even gain developer wisdom in the classroom. One gains it by writing lots of bad code, reading lots of good (and bad) code written by others, and by seeing code work and seeing code fail. A lot of developer wisdom is gained by revisiting crappy code that one wrote six months ago, and seeing how hard it is to maintain, and vowing not to make that mistake again.
And perhaps we have arrived at a point today where all that wisdom that we longtime developers have gained is simply not needed anymore. Agentic coding has put us in the curious position of being able to create software without wisdom. In theory, all the wisdom of all the developers in the world is at your fingertips, and all you have to do now is ask. I asked Claude Code to implement an idea for a website, and he created it. It works.
And here’s my confession: I haven’t looked at the code. I didn’t even feel the need to do so. If there was a problem with the site, I would tell Claude about it, and he’d fix it. The site works. It works great, actually. Not only that, but it does things that I would have taken hours and hours to figure out. Things like making sure that contact forms don’t get spammed and that APIs are properly rate-limited. I asked Claude to review the site for vulnerabilities, and he found and fixed them.
The sum of all developer wisdom
Or put another way, Claude Code is a lot wiser than I am about how to build good, safe, properly functioning code. He’s a pretty good programmer, and he’s getting better every day. It’s amazing because having the wisdom of millions of developers at your fingertips is cool. It is terrifying because where will we be if acquiring wisdom becomes passé? The wisdom captured in Claude is a collection of all the smarts encapsulated in billions of lines of code on GitHub. If we do nothing but leverage existing wisdom, what will feed the next generation of Claude?
In the end, I lean towards amazing. New software developers will be gaining wisdom at a level of abstraction well above the code. Somehow, they will build software using a wisdom that my fellow code warriors and I don’t even understand yet. Debugging will involve understanding how to tell Claude what the problem is and what the solution needs to be. Building software will always require sound judgment — but the things we make judgments about are continually changing.
Eventually, writing code will be like learning Latin — cool, interesting, and good for your brain, but not necessary to function in the day-to-day world.
Curity looks to reinvent IAM with runtime authorization for AI agents 14 Apr 2026, 8:31 pm
In 2026, enterprise developers are building and deploying the first generation of powerful, increasingly autonomous AI agents at incredible speed. Now comes the hard part: working out how to secure them.
Vendors in the space are facing multiple challenges. To begin with, traditional identity and access management (IAM) tools were never designed to secure anything as complex as agentic AI. In addition, the number of agents, both those sanctioned by the enterprise and the undocumented ‘shadow’ agents created by a new generation of powerful tools that barely existed a year ago, is increasing at unprecedented speed. And now it has started to dawn on organizations that this risks leaving yawning governance and security gaps whose weaknesses could one day return to haunt their creators.
While a growing list of companies, including large cloud platforms such as Okta, Ping Identity, and Microsoft’s Entra ID, is vying to fill the vacuum, a smaller competitor, Sweden’s Curity, argues that agents can’t be secured using traditional IAM. Instead, it is offering a different approach to the problem: This week, it announced Access Intelligence, an extension to its existing API identity and access management (IAM) platform, Identity Server.
The problem it addresses is that traditional IAM tools assume that applications are being accessed by human users or machine identities, governed by a one-time authentication process. But agents, which assume long chains of actions conducted at incredible speed, don’t work like this. Instead, access becomes ephemeral, complex, and non-deterministic, which is to say, hugely unpredictable. Lock them down too much and they stop working; let them run free, and weak security follows in their wake.
Runtime enforcement
Curity’s approach is to treat agents as a special type of application. Like applications, agents call APIs, MCP servers, and each other, and are credentialed using OAuth tokens. Through a feature called Token Intelligence, Curity extends the role of OAuth tokens to not simply permit access, but to carry information on the agent’s purpose and intent. In Curity’s scheme, an agent can only access resources based on that purpose.
Instead of using static, pre-granted permissions, agent access is granted at runtime, on-the-fly. Each requested action generates a separate token that describes the access it needs. When an agent starts a new task, it needs a new token specifying a new set of permissions. If necessary, human authorization can be required when an agent is trying to perform a high-risk action such as transferring funds.
“Curity has always been application-centric,” said Cofounder and CTO Jacob Ideskog. “Our focus has always been on how we broker access.”
Multiple approaches to agent security
Today, agent security falls into one of several camps, which include increasingly inadequate inline approaches such as API gateways and web application firewalls (WAFs), and out-of-band analysis systems that infer intent by analyzing agent behavior against a baseline.
Curity’s Access Intelligence, by contrast, is a self-hosted microservice that acts as a glorified IAM layer through which every agent request must pass. “Because we let an agent do something now doesn’t mean we should be allowing it to do this a minute later,” Ideskog explained.
Access Intelligence also uses Identity Server’s centralized token validation to ensure that developers can fire up agents or APIs without registering them. If they lack this validation, agents are isolated from real-world actions.
Nothing does the whole job
The appearance of systems such as Access Intelligence is good news for enterprises. It indicates that vendors are starting to address the problem of agent security, often by extending existing API security platforms. But that still leaves open the question of which approach to take.
Ideskog believes it would be a mistake to see the different approaches as mutually exclusive. Curity’s Access Intelligence can be used in combination with other layers of agent security, he emphasized. In short, no one solution can do the whole job.
“Up to this point, the IAM industry has focused on the identity part. But the real question is the access. Enterprises are asking their privilege access management (PAM) vendors how they’re going to deal with this [agent security] and I don’t think the PAM vendors have good answers yet,” he said.
This article originally appeared on CSOonline.
GitHub adds Stacked PRs to speed complex code reviews 14 Apr 2026, 10:12 am
AI-aided development tools are churning out more lines of code than ever, presenting a challenge for reviewers who must review ever larger pull requests. After toying with the idea of closing the door to AI-aided code submissions, GitHub is now looking to help enterprises manage big code changes in a more incremental way. It says a new feature, Stacked PRs, can improve the speed and quality of code reviews by breaking large changes into smaller units.
“Large pull requests are hard to review, slow to merge, and prone to conflicts. Reviewers lose context, feedback quality drops, and the whole team slows down,” the company said, announcing GitHub Stacked PRs on its website.
With the new, stacked approach it aims to reduce the overhead of managing dependent pull requests by minimizing rebasing effort, improving continuous integration (CI) and policy visibility across stacked changes, and preserving review context to enhance code quality.
Stacked PRs tracks how requests in a stack relate to one other, propagating changes automatically so developers don’t have to keep rebasing their code and letting reviewers assess each step in context, the company explained in the documentation.
The feature, GitHub wrote, is delivered through gh-stack, a new extension to GitHub CLI that manages the local workflow, including branch creation, rebasing, pushing changes, and opening pull requests with the correct base branches.
On the front end, all changes created via gh-stack are surfaced in the GitHub interface, where reviewers can navigate them through a stack map, with each layer presented as a focused diff and subject to standard rules and checks, the company added.
Developers can merge individual pull requests or entire stacks, including via the merge queue, after which any remaining changes are automatically rebased so the next unmerged PR targets the base branch.
Monorepos and platform engineering drive shift to modular development
For Pareekh Jain, principal analyst at Pareekh Consulting, Stacked PR is GitHub’s response to a structural shift being driven by large-scale monorepos and platform engineering, which are pushing teams toward more modular, parallel workflows.
“GitHub’s traditional PR model created a bottleneck where developers either waited long cycles for reviews or bundled work into large, hard-to-review PRs that increased risk and slowed merges. Stacking solves this by letting developers break a feature into smaller, dependent PRs such as database, API, and UI layers, so reviews happen incrementally while development continues in parallel,” Jain said.
“Stacked PRs is likely to see rapid adoption in mid-to-large enterprises, especially those managing monorepos. Its biggest impact is eliminating rebase hell — the manual effort of updating multiple dependent branches when the base changes,” Jain noted, adding that the feature’s integration into both the GitHub CLI and UI will also drive adoption as it removes the need for third-party tools.
Change management
The biggest obstacle to adoption of Stacked PRs will not involve changes to the code, but changes to coders’ habits, said Phil Fersht, CEO of HFS Research. “The constraint will not be the feature itself, but whether development teams adjust their workflow discipline to use stacking properly.”
That will involve them learning to organize large pull requests into neat stacks for the reviewer, which may be as challenging as reviewing a large PR.
That was echoed by Paul Chada, co-founder of agentic AI-based software startup Doozer AI: “Workflow shifts only happen when the pain of not changing exceeds the friction of learning,” he said.
AI-driven code velocity driving a new pressure point
The release of Stacked PRs comes amid a deeper structural shift in software development: the rise of AI-assisted coding. This is accelerating the pace of code generation, increasing the volume of changes and making traditional, linear review workflows harder to sustain.
“AI-assisted coding has changed the math. When humans wrote the code, big PRs were annoying but tolerable,” said Chada. “Now agents produce 2,000-line diffs across 40 files in seconds, and GitHub is staring down 14 billion projected commits this year versus 1 billion last year. That’s not a workflow problem, it’s a survival problem.”
GitHub appears to be betting that Stacked PRs changes the way development teams view a unit in software development by making it small, attributable, and revertible, regardless of whether the author is a senior engineer or an agent, Chada said.
But, he cautioned, integrating Stacked PRs with coding agents risks adding to toolchain sprawl for enterprises.
“The current dev toolchain — IDE plus Copilot plus Claude Code plus Codex plus stacking tools plus review bots plus CI/CD plus security scanners plus MCP servers — is squarely in the Cambrian explosion phase,” Chada pointed out.
Competitive pressures
GitHub Stacked PRs isn’t an entirely novel idea: There are third-party tools that work with GitHub already offering the similar functionality.
Jain said GitHub’s addition of the feature will likely impact Graphite CLI, a GitHub-focused tool that allowed stacking PRs when the functionality wasn’t natively available.
“Graphite has been the market leader in this space. GitHub’s entry validates the Stacking category but poses an existential threat to Graphite’s core value proposition,” Jain said. “To survive, Graphite will likely need to double down on superior UI/UX, faster performance, and features GitHub won’t touch like cross-platform stacking for GitLab and Bitbucket.”
That competitive pressure also reflects a broader platform play.
Stacked PRs, Jain further noted, represents a “strategic move” to internalize a workflow long used by high-velocity teams at companies like Google, Meta, and Uber, referring to the stacked differential code review model popularized by tools like Phabricator.
Stacked differentials, much like Stacked PRs, are a series of small, dependent code changes reviewed individually but designed to build on each other and land as a cohesive whole.
In effect, this means that GitHub is trying to pull enterprises away from such tools by making it easier to adopt these advanced workflows natively within its own platform, reducing the need for external tooling.
There is also a quieter platform economics angle emerging, Chada pointed out.
“GitHub is effectively building out infrastructure to absorb a surge of machine-generated activity that does not yet translate into proportional revenue, from third-party coding agents that compete with its own GitHub Copilot to the very workflows those agents are accelerating,” Chada said.
In that light, Stacked PRs looks as much like a scaling response as a developer experience upgrade — one that could foreshadow a shift in how GitHub monetizes its AI layer, with Copilot pricing likely to move toward more usage-based models over time, Chada added.
The hyperscalers are pricing themselves out of AI workloads 14 Apr 2026, 2:00 am
Large cloud providers still want the market to believe that AI infrastructure is a premium business where customers pay premium prices. That argument worked when buyers had few alternatives, when access to advanced GPUs was restricted, and the operational maturity of the hyperscalers created an advantage that smaller competitors could not easily match. However, the market is rapidly changing, making economics unavoidable. Recent comparisons show that neocloud providers are often much cheaper than major public clouds, with hyperscalers costing about three times to six times as much as specialized competitors for similar compute capacity.
That gap is not a rounding error. Enterprises cannot dismiss this as just the cost of doing business with a trusted vendor. The bills are significant enough to influence architectural choices, vendor strategies, and even the locations of AI innovation. One commonly cited example in current pricing comparisons shows that NVIDIA H100-class compute costs about $2.01 per hour on Spheron versus approximately $6.88 per hour on AWS for a similar workload category. That is roughly a difference of 3.4 times for comparable AI processing. Whether a specific enterprise secures better rates is almost irrelevant. The market now knows that lower-cost alternatives exist, and knowledge changes behavior.
In addition to neoclouds, private clouds, sovereign clouds, and even on-premises GPU strategies are becoming more appealing as buyers increasingly view AI infrastructure as a long-term operating expense rather than a short-term experiment. Once that shift occurs, even small differences in unit costs become strategic. Large cost gaps become hard to justify. That’s when a premium vendor stops appearing premium and begins to seem overpriced.
When ‘premium’ isn’t enough
For years, hyperscalers benefited from a straightforward value proposition. They could provide global reach, mature security controls, integrated tools, elastic capacity, and an ecosystem that minimized operational friction. These factors still matter and remain valuable. However, AI is revealing a flaw in the traditional cloud pricing model. When compute is the core and can be sourced elsewhere at a significantly lower cost, the value of the surrounding ecosystem must be exceptional to justify the markup. Today, in many cases, it is not.
This is where hyperscalers are making a strategic mistake. They seem to assume that AI buyers will continue to accept the same pricing strategies that worked for traditional cloud migrations. That assumption is risky. AI buyers are not just lifting and shifting old enterprise applications. They are training, fine-tuning, and deploying models in environments where utilization, throughput, latency, and token economics are monitored in real time. Their boards are asking tougher questions. Their investors are asking tougher questions. Their finance teams are asking the toughest questions of all. If the answer is that the enterprise is paying several times more for the same class of compute because it’s easier to stick with a familiar brand, that decision won’t go over well.
The real issue is not that AWS, Microsoft Azure, and Google Cloud are expensive in absolute terms. The issue is that they are becoming expensive relative to an expanding set of credible alternatives. That distinction matters. Buyers will always pay more for better outcomes. They will resist paying much more for little or no proportional benefit. In AI, proportional benefit is increasingly difficult for the hyperscalers to prove. A customer does not receive higher model accuracy just because the invoice came from a household cloud brand. A workload does not become inherently more strategic because it runs in a famous control plane. The chip is still the chip. The cluster is still the cluster. The economics are still the economics.
AI buyers become more rational
The next phase of the AI market won’t be about who can generate the most headlines. Instead, success will be based on consistently delivering reliable performance at sustainable costs. This shift favors disciplined operators and providers that are optimized for GPU availability, efficient scheduling, and simple commercial models. It also benefits enterprises willing to blend different environments rather than always relying on the largest cloud vendor for every workload.
The conversation is moving away from simple cloud preference and toward workload placement strategies. Enterprises are becoming more comfortable with the idea that different AI jobs belong in different places. Some workloads will stay on hyperscalers because the integration benefits are real. Others will move to private cloud because security, data gravity, or regulatory concerns demand it. Still others will land on sovereign platforms because national and industry-specific requirements leave no other option. A growing number will be routed to neoclouds because the price-performance equation is too compelling to ignore.
This isn’t a rejection of hyperscalers. It’s a rejection of careless pricing. The biggest cloud providers will continue to be highly important for AI. However, their role is shifting from the default choice to one option among many. This represents a major strategic downgrade, driven not by technological weakness but by pricing practices.
The market rewards discipline
The cloud industry has experienced this cycle before. Established companies believe that their size safeguards them, that customers prioritize convenience above everything else, and that their pricing power is everlasting. Then, a new group of competitors appears with a sharper value proposition and fewer outdated assumptions. Initially, incumbents dismiss them as niche players. However, these players improve, specialize, and attract the most cost-conscious innovators. By the time the incumbents take action, the market has already shifted.
That is exactly the risk hyperscalers face in AI today. If they continue treating GPU-driven workloads as a way to maintain high margins across compute, storage, networking, and managed services, they will train customers to look elsewhere. Once that becomes a habit, it will be hard to change. Customers who develop procurement discipline around lower-cost AI infrastructure won’t quickly return simply because a hyperscaler finally cuts prices.
The next winners in AI infrastructure may be the providers that understand a hard truth: When the market is scaling at this speed, adoption matters more than margin preservation. If AWS, Microsoft, and Google don’t learn that lesson quickly, they might find that they weren’t undercut by competitors, but that they priced themselves out all on their own.
HTMX 4.0: Hypermedia finds a new gear 14 Apr 2026, 2:00 am
HTMX has been considered feature-complete for some time. It is a successful project that achieves its ambitious goals and is widely hailed, not to mention widely deployed in production. HTMX 2.0 was considered the final word. The creator promised there would be no HTMX 3.0.
So of course, being developers, the HTMX team decided to rip out the engine and replace it with a new one based on JavaScript’s Fetch API. They named the new version HTMX 4.0 to keep the promise.
Here is a fascinating tale of architecture and implementation that gives us a beautiful window into the inner machinations of the front-end industry.
Simpler web development
When asked for comment on the 3.0 leap frogging, HTMX Carson Gross gave me a one word quote:
“Oops.” – Carson Gross, creator of HTMX
Gross is one of my favorite industry personalities. It’s easy to see why. He created the Grug Brained Developer as well as HTMX. The former contains all the hard-bitten advice from a veteran coder that young grugs need to survive, delivered in the voice of a caveman. The latter is the physical manifestation of those ideas: a library that leverages HTML and REST to bring simplicity back to the web.
The core of HTMX 4.0 is that the old transport protocol, the XHR object, is being removed and replaced with the modern (and universal) Fetch API.
This is massive on two fronts. First, it was a massive amount of work for the HMTX team. Second, it’s a massive performance and DX win.
Modern fetch() has a huge benefit in that it can stream responses. That means the front end can process and act on segments of the UI as they arrive, instead of waiting for the whole response to complete before taking action. This is similar to what React Server Components (RSC, a recent addition to React) do.
There is a certain infatuation with new and clever techniques in software development. That is understandable, but it can become pathological at times. As a result, there has always been a countercurrent that urges us to look closely at the requirements and pare away the unnecessary until the simplest solution is found.
HTMX is a banner carrier for that counterculture. In that spirit, the HTMX 2.0 line was declared the final version. It does what it was intended to, and is a complete and adequate solution.
But then there is the truth that sometimes a better way to do things is found and more engineering work is merited. That is HTMX 4.0.
Native streaming
In HTMX 2.x we used the XHR object, XMLHttpRequest, which has its roots in the late 1990’s Internet Explorer. The browser had to buffer the entire response before it could swap. By adopting ReadableStream via the Fetch API, HTMX 4.0 can process and inject HTML fragments as they arrive.
This achieves the holy grail of a streaming UI but does so with a tiny 14KB script. Moreover, we still are in the HTMX vernacular, meaning we can use any back end that can spit out a string. It’s a performance win that actually reduces your JavaScript footprint and makes the architecture simpler.
In short, the use of Fetch is a refactor that should net both more power and less complexity.
Idiomorph DOM merging
Although native streaming is the most fundamental new aspect in HTMX 4.0, there is another really cool thing that the 4.0 refactor opens up.
The idea of “morphing” or “diffing” HTML pages makes for superspeedy page changes with no additional complexity in many situations. It is perhaps best known from Hotwired. How it works: when a page (or fragment) arrives, an algorithm checks it against the existing content and replaces only the changes.
HTMX 4.0 enables the default use of the Idiomorph algorithm, a fancy new approach to morphing one DOM tree into another. (The Idiomorph project is also led by Carson.)
In HTMX 2.0, Idiomorph was an extension. In HTMX 4.0, you get it basically for free. Idiomorph, inspired by Hotwired’s Morphdom, evolved the idea of comparing two HTML docs and only updating the minimal set of data to the next level. Idiomorph has been adopted by Hotwired itself.
There is a great technical description of the bottom-up, nested ID diffing algorithm that Idiomorph uses on its GitHub page.
Idiomorph is not directly a benefit of the Fetch refactor, but is a benefit of the simplicity that Fetch has brought to the codebase. Using Fetch made it possible for the team to add Idiomorph to the core.
Prop inheritance (a breaking change)
HTMX 4.0 introduces a breaking change to prop inheritance. This change was not necessitated by the Fetch upgrade, but was based on real world experience. The team determined that it is safer and better DX to make HTMX props not inherit by default.
In previous versions, attributes like hx-target inherited implicitly. This led to weird cases in which the children of an element were affected when it wasn’t clear why. In HTMX 4.0, inheritance is now explicit. You use the :inherited modifier to indicate which child elements should inherit, e.g., hx-target:inherited="#div".
This respects the principle of locality of behavior (a Grug brained favorite). You no longer have to hunt through parent templates to find out where a button is supposed to swap its content.
The history hack is history
HTMX 2.x included a fancy history engine that tried to snapshot the local DOM and save it to localStorage. This was an optimization. It turned out to be brittle. HTMX 4.0 has abandoned this in favor of standard behavior, i.e., a page reload.
Error handling and status-specific swapping
Most InfoWorld readers will never have to face a 500 or a 404 error. But for those rare situations where a server returns an error code (yes, this is sarcasm), the new HTMX 4.0 behavior will be to swap in the content. Before, the request failed silently. This change will help developers provide immediate visual feedback to the user rather than leaving them with a broken or frozen screen.
The really cool part of this update is the new status-specific swapping syntax. You can now define different behaviors for different HTTP codes directly in the HTML. By using syntax like hx-status:404="#not-found" or hx-status:5xx="#error-notice", you can elegantly route different server errors to distinct UI elements.
The tag
The addition of is a major structural improvement. It allows you to wrap fragments in a response so they can target specific elements explicitly (e.g., ). It’s a cleaner, more readable alternative to the old “out-of-band” (OOB) swaps.
This is similar to Hotwired’s Turbo Streams. (So we see that HTMX and Hotwired are engaging in fruitful co-influence.)
The idea is that we allow the server to send a collection of fragments tagged with targets, and these are placed in the UI. This allows for complex, multi-point updates to the interface based on a single server response.
Native view transitions
To top it all off, HTMX 4.0 now integrates with the browser’s native View Transitions API by default. This allows for app-like, animated transitions between page states—like fading or sliding—with zero extra CSS or JavaScript required.
Sidestepping JavaScript complexity
Even if you are unfamiliar with HTMX, it is worth looking at to understand what exactly it is trying to do. It is an important angle on web development. Along with friends like Hotwired, HTMX demonstrates the possibilities of taking the simplest standard features and wringing the most power from them possible.
It’s impossible not to look at the whole landscape slightly differently once you get it. Even if you still use and appreciate reactive front ends (and I do).
HTMX asks the question, what could we accomplish with Hypermedia if we were very clever? The answer is, we can accomplish quite a lot, and sidestep much JavaScript complexity in the process.
Google Cloud introduces QueryData to help AI agents create reliable database queries 13 Apr 2026, 9:50 am
A new tool from Google Cloud aims to improve the accuracy of AI agents querying databases in multi-agent systems or applications.
QueryData, which translates natural language into database queries with what the company claims is “near 100% accuracy,” is being pitched as an alternative to direct generation of queries by large language models (LLMs), which Google says can introduce inaccuracies due to their limited understanding of database schemas and their probabilistic reasoning.
However, to create that necessary understanding, enterprise teams using QueryData first need to define what Google describes as “context” describing how data should be accessed and queried, which involves encoding details about database schemas, including descriptions of tables, relationships, and business meaning, along with deterministic instructions that guide how queries are generated or executed.
Once the context and the guidelines are configured, teams can use the Context Engineering Assistant, a dedicated agent in Gemini CLI, to iteratively review the accuracy of queries against the Evalbench framework until they’re satisfied with the results.
After that, QueryData can be integrated into agent-driven workflows, where it acts as the execution layer between user requests and underlying databases.
It can be used within Google Cloud’s own data agents, currently available in BigQuery, or invoked via APIs by enterprises building custom agents and multi-agent systems. It currently supports AlloyDB, CloudSQL for MySQL, CloudSQL for PostgreSQL, and Spanner.
In custom setups, the agents handle reasoning and orchestration, while QueryData is responsible for generating, validating, and executing queries against data sources, returning results that can be used in downstream actions or decision-making, Google explained in a blog post.
Creates new workload category
The new tool, according to Pareekh Jain, principal analyst at Pareekh Consulting, marks a shift from tool-based AI to outcome-bound agents with built-in guardrails, which should help enterprises move multi-agentic systems and applications into production, and enable “decision-grade use cases” across finance, operations, and supply chain departments.
However, he cautioned, though QueryData reduces the need for prompt engineering for developers and improves reliability at runtime, it shifts the burden to upfront design and ongoing maintenance.
“It requires explicit schema understanding, deterministic instructions per data source, and ongoing maintenance as schemas evolve,” he pointed out. “This effectively creates a new workload category of data access engineering for agents.”
He said, “the tradeoff is clear. Without QueryData, systems are faster to build but unreliable in production, and with it, they are slower to build but viable at scale.”
This tradeoff, according to Jain, will ultimately influence enterprise usage patterns, with adoption likely to be strongest in regulated and mission-critical environments, while remaining slower in lightweight or experimental use cases.
Targets data layer as rivals bet on connectors, copilots
Further, Jain noted that the new tool also signals a broader strategic play by Google Cloud.
“QueryData shows Google is trying to create a standard way for AI agents to safely access and use data. While OpenAI focuses on APIs, AWS on connectors, and Microsoft on apps like Copilot, Google is focusing on the data layer itself, on how agents actually talk to databases,” Jain said.
“This approach has strengths, especially with tight integration into Google BigQuery and Google’s data expertise. But it also has challenges, as it needs more upfront setup and is less flexible across platforms. Microsoft, in this case, seems to have an edge, because its tools are already built into everyday apps that people use,” he noted.
The risk for Google, Jain added, is that simpler approaches from AWS or Microsoft could confine QueryData to advanced use cases instead of making it a mainstream standard.
QueryData is currently in preview.
Critical flaw in Marimo Python notebook exploited within 10 hours of disclosure 13 Apr 2026, 5:38 am
A critical pre-authentication remote code execution vulnerability in Marimo, an open-source Python notebook platform owned by AI cloud company CoreWeave, was exploited in the wild less than 10 hours after its public disclosure, according to the Sysdig Threat Research Team.
The vulnerability, tracked as CVE-2026-39987 with a severity score of 9.3 out of 10, affects all Marimo versions before 0.23.0.
It requires no login, no stolen credentials, and no complex exploit. An attacker only needs to send a single connection request to a specific endpoint on an exposed Marimo server to gain complete control of the system, the Sysdig team wrote in a blog post.
The flaw allows an unauthenticated attacker to obtain a full interactive shell and execute arbitrary system commands on any exposed Marimo instance through a single connection, with no credentials required, the post said.
“Marimo has a Pre-Auth RCE vulnerability,” the Marimo team wrote in its GitHub security advisory. “The terminal WebSocket endpoint /terminal/ws lacks authentication validation, allowing an unauthenticated attacker to obtain a full PTY shell and execute arbitrary system commands.”
Marimo is a Python-based reactive notebook with roughly 20,000 stars on GitHub and was acquired by CoreWeave in October 2025.
How the flaw works
Marimo’s server includes a built-in terminal feature that lets users run commands directly from the browser. That terminal was accessible over the network without any authentication check, while other parts of the same server correctly required users to log in before connecting, the post said.
“The terminal endpoint skips this check entirely, accepting connections from any unauthenticated user and granting a full interactive shell running with the privileges of the Marimo process,” the post added.
In practical terms, anyone who could reach the server over the internet could walk straight into a live command shell, often with administrator-level access, without ever entering a password, the team at Sysdig said.
Credentials stolen in under three minutes
To track real-world exploitation, deployed honeypot servers running vulnerable Marimo instances across multiple cloud providers and observed the first exploitation attempt within 9 hours and 41 minutes of disclosure. No ready-made exploit tool existed at the time. The attacker had built one using only the advisory description, Sysdig researchers wrote.
The attacker worked in stages across four sessions. A brief first session confirmed the vulnerability was exploitable. A second session involved manually browsing the server’s file system. By the third session, the attacker had located and read an environment file containing AWS access keys and other application credentials. The entire operation took under three minutes, the post said.
“This is a complete credential theft operation executed in under 3 minutes,” the Sysdig team wrote.
The attacker then returned over an hour later to re-check the same files. The behavior was consistent with a human operator working through a list of targets rather than an automated scanner, the post said.
Part of a widening pattern
The pace of exploitation aligns with a trend seen across AI and open-source tooling. A critical flaw in Langflow was weaponized within 20 hours of disclosure earlier this year, also tracked by Sysdig. The Marimo case cut that window roughly in half, with no public exploit code in circulation at the time.
“Niche or less popular software is not safer software,” the Sysdig post said. Any internet-facing application with a published critical advisory is a target within hours of disclosure, regardless of its install base, it added.
The Marimo case had no CVE number assigned at the time of the first attack, meaning organizations dependent on CVE-based scanning would not have flagged the advisory at all, Sysdig noted.
The flaw also fits a pattern of critical RCE vulnerabilities in AI-adjacent developer tools — including MLflow, n8n, and Langflow — in which code-execution features built for convenience become dangerous when exposed to the internet without consistent authentication controls.
What organizations should do
Marimo released a patched version, 0.23.0, which closes the authentication gap in the terminal endpoint. Organizations running any earlier version should update immediately, Sysdig said.
Teams that cannot update right away should block external access to Marimo servers using firewall rules or place them behind an authenticated proxy, the post said. Any instance that has been publicly reachable should be treated as potentially compromised.
“Credentials stored on those servers, including cloud access keys and API tokens, should be rotated as a precaution,” Sysdig advised.
CoreWeave did not immediately respond to a request for comment.
The article originally appeared in CSO.
Mastering the dull reality of sexy AI 13 Apr 2026, 2:00 am
This week in New York, my Oracle team ran workshops for enterprise developers on building retrieval-augmented generation and agentic applications. Interest was so strong that we quickly had to figure out how to double the room’s capacity (much to the fire marshal’s chagrin). Interest in AI was clearly off the charts. But AI fluency was not. It was a different vibe (and audience) from what we’ve seen in a course we built with DeepLearning.ai, which attracts a more advanced audience ready to build memory-aware agents.
I recently argued that enterprise AI is arriving unevenly across companies and even across teams within the same company. But after watching developers plow through these different workshops, I believe this uneven adoption points to something even more telling: uneven engineering capability.
Put differently, the real divide in enterprise AI isn’t just between companies moving fast and companies moving slow. It’s between teams treating AI as a prompt-driven demo and teams learning, often painfully, that production AI is mostly a data and software engineering problem. Enterprise AI isn’t really in the agent era yet. We’re in the prerequisite era.
Building the building blocks
What do I mean by “engineering capability”? I definitely don’t mean model access. Most everyone has that—or soon will. No, I mean the practical disciplines that turn a model into a system: data modeling, retrieval, evaluation, permissions, observability, and memory. You know, the unsexy, “boring” stuff that makes enterprise projects, particularly enterprise AI projects, succeed.
This informed how my team built our workshops. We didn’t start with “here’s how to build an autonomous employee.” We started with the AI data layer: heterogeneous data, multiple representations, embeddings, vector indexes, hybrid retrieval, and the trade-offs among different data types (relational, document, etc.). In other words, we started with the stuff most AI marketing tries to skip. Much of the AI world seems to think AI starts with a prompt when it actually begins with things like multimodel schema design, vector generation, indexing, and hybrid retrieval.
That matters because enterprise data isn’t tidy. It lives in tables, PDFs, tickets, dashboards, row-level policies, and 20 years of organizational improvisation. If you don’t know how to model that mess for retrieval, you won’t have enterprise AI. You’ll simply achieve a polished autocomplete system. As I’ve pointed out, the hard part isn’t getting a model to sound smart. It’s getting the model to work inside the weird, company-specific reality where actual decisions are made.
For example, the industry talks about retrieval-augmented generation as if it were a feature. It’s not. It’s an engineering discipline. Chunking strategy, metadata design, retrieval quality, context packing, precision and recall, correctness and relevance… Those aren’t implementation details to clean up later. They’re the thing. The whole point. If your retriever is weak, your model will confidently elaborate on bad context. If your chunking is sloppy, your answer quality degrades before the model ever starts reasoning. If your metadata is thin, filtering breaks. And if you have no evaluation loop, you won’t know any of this until a user tells you the system is wrong.
This is also where permissions and observability are so critical. In a demo, nobody asks the annoying questions like where an answer came from, or what the agent was authorized to touch. But in real-world production, those questions are the whole game. An enterprise agent with vague tool access isn’t sophisticated. It’s a massive security problem. In short, using AI tools is not the same thing as knowing how to build AI systems. Plenty of teams can prompt, but far fewer can measure retrieval quality, debug context assembly, define tool boundaries, or create feedback loops that improve the system.
Catching up with the enterprise
The contrast with the recent DeepLearning.AI short course on agent memory is useful here. That course is explicitly aimed at developers who want to go beyond single-session interactions, and it assumes familiarity with Python and basic concepts of large language models. In other words, that audience is already up the curve, talking about memory-aware agents as a next step. By contrast, my NYC enterprise-heavy audience was generally earlier in the journey. That’s not a criticism of enterprise developers. It’s a clue. Much of the “AI gap” in enterprise isn’t about willingness. It’s about how much explicit learning the teams still need before the tools become muscle memory.
That, in turn, is why I keep coming back to a much older argument I’ve made about MLOps. Back then, I wrote that machine learning gets hard the moment it leaves the notebook and enters the world of tools, integration, and operations. That was true in 2022, and it’s even more true now. Agentic AI has not repealed the basic law of enterprise software. It has simply added more moving parts and a bigger blast radius. The demo may be easier than ever, but the system is emphatically not.
I’d also caution that you probably shouldn’t tell enterprises they’re “behind” because they haven’t yet embraced multi-agent architectures or whatever the current fashion demands. In many cases, they’re learning exactly what they need to know: how to structure data for retrieval, how to evaluate outputs, how to constrain tools, how to inspect failures, and how to manage state. That may not make for sexy conference talks. It does, however, look suspiciously like how real platforms get built. As I’ve noted, most teams don’t need more architectural cleverness but do need much more engineering discipline.
So yes, uneven adoption is still a real thing. But I think the deeper, more useful story is this: Uneven adoption is mostly the surface expression of uneven AI engineering literacy. The real winners in AI will be those that teach their teams how to ground models in business data, evaluate what those models return, constrain what agents can do, and remember only what matters. That is, the winners will be those that know how to make AI boring.
Right now, boring is still very unevenly distributed.
Are AI certifications worth the investment? 13 Apr 2026, 2:00 am
Artificial intelligence has moved from the research lab into the boardroom, the data center and virtually every business function in between. Nearly 80 percent of organizations now use AI in at least one core business process, according to McKinsey, yet widespread adoption has surfaced a persistent problem: a deep shortage of professionals who can translate AI tools into measurable business results. That gap is driving extraordinary interest in AI certifications, and for good reason. The question most IT professionals are asking is not whether AI skills matter, but which credentials deliver the greatest return on investment.
The AI job market has responded accordingly. AI and machine learning hiring grew 88 percent year-over-year in 2025, according to Ravio’s 2026 Compensation Trends report, while administrative role hiring simultaneously dropped 35 percent. This is not a niche trend. Dice.com reports that approximately 36 percent of tech job postings now require AI skills, with major consulting firms such as Deloitte, Accenture, PwC and KPMG among the top 25 AI hirers in the United States. The window for early movers is open, but it will not remain so indefinitely.
The most valuable AI certifications for business
Not all AI credentials carry equal weight with employers. The certifications that consistently appear in job postings, command hiring manager attention and translate into measurable salary gains share a few common characteristics: they come from recognized vendors or institutions, require hands-on project work and address real-world business problems rather than theoretical exercises alone. The following credentials rank among the most valuable for professionals entering or advancing within AI-focused roles.
Google Professional Machine Learning Engineer
This certification is widely considered one of the highest-value credentials available for professionals working with cloud-based machine learning systems. At a $200 exam fee, it requires three to five months of focused preparation for most candidates, though experienced engineers sometimes complete preparation in under 30 days. Holders report average salaries near $130,318, and professionals already working in data or engineering roles frequently cite a salary bump of approximately 25 percent following certification, according to community data compiled by Nucamp. Google and AWS certifications appeared in 40 percent more job postings than competing credentials, with demand increasing 21 percent year-over-year, according to an analysis of more than 15,000 job postings from Q4 2025 through Q1 2026 published by Skillupgradehub.
AWS Machine Learning Specialty
Amazon Web Services offers one of the most employer-recognized AI credentials for professionals working in enterprise environments. The $300 exam assumes substantial hands-on experience with Amazon SageMaker and the broader AWS data stack, and most candidates invest four to six months in preparation. Hiring surveys consistently associate this certification with roughly a 20 percent salary boost in existing data and engineering roles, particularly within organizations that have standardized on AWS infrastructure, according to Nucamp’s certification ROI analysis.
Microsoft Certified: Azure AI Engineer Associate (AI-102)
For professionals operating within Microsoft-centric enterprises, the Azure AI Engineer Associate certification validates the ability to build and deploy AI solutions using Azure Cognitive Services, Azure Machine Learning and related tools. The exam costs approximately $165, and study typically requires three to six months. DigitalOcean’s 2025 analysis of top AI credentials lists this certification among the most recognized across Microsoft ecosystem environments, which represent a large share of the enterprise market.
IBM AI Developer Professional Certificate
IBM’s professional certificate program, available through Coursera for approximately $49 per month, functions as a comprehensive entry point for professionals transitioning into AI roles. The program covers machine learning, prompt engineering, data analysis, neural networks, Python libraries and deploying large language models. IBM refreshed the generative AI content in March 2025, keeping the curriculum current with production-grade techniques. For career changers, the ROI can be dramatic. Skillupgradehub’s 2026 analysis documents cases in which professionals moved from $52,000 salaries to $78,000 AI engineering positions following completion of this program.
PMI AI+ Certification
Launched in 2025 after PMI acquired Cognilytica, the PMI AI+ is the first major project management credential specifically designed for AI initiatives. It targets project managers, program managers, product owners and scrum masters who lead or support AI deployments. A distinctive feature is that preparation earns candidates 21 PDUs toward other PMI certifications, covering more than a third of PMP renewal requirements. Unlike most credentials in this space, it currently carries no expiration date, eliminating ongoing renewal fees. For business-side leaders who need credibility when overseeing AI programs without building models themselves, this certification fills a critical gap, according to Dataquest’s 2026 certification guide.
NVIDIA Deep Learning Institute Certifications
NVIDIA’s credential portfolio addresses advanced technical roles focused on computer vision, GPU optimization and deep learning model development. Costs vary by course, ranging from $2,500 to $4,700 per course, with a $325 application fee and a requirement to complete 16 or more days of coursework within 36 months. DigitalOcean lists NVIDIA credentials among the most recognized for highly specialized technical positions where deep learning and GPU-accelerated computing are central to the role.
The salary impact: What the data shows
The financial case for AI certification is compelling, though professionals should approach salary estimates with realistic expectations. The gains are real, but they vary significantly based on prior experience, role type, geographic market and the specific certification earned.
According to Payscale data cited by Dumpsgate, the average salary of a certified AI professional in the United States is approximately $144,000. Entry-level roles typically begin around $80,000, while experienced practitioners can reach $162,000 or more. Glassdoor data, reported by Coursera in February 2026, shows median total base pay ranging from $99,578 for AI researchers to $134,188 for AI engineers.
Certified AI professionals earn between 23 and 47 percent more than their non-certified peers in 2026, according to salary analysis by Skillupgradehub drawn from more than 10,000 job postings. PassItExams research documents that AI-certified professionals command salary premiums reaching 47 percent above non-certified peers in some roles. However, Ravio’s 2026 Compensation Trends report provides a more moderate benchmark from actual payroll data: AI and ML roles command a 12 percent salary premium at the Professional/Individual Contributor level and a 3 percent premium at the Management level compared to non-AI roles. The smaller premium for managers reflects employer emphasis on hands-on contributors who can directly integrate AI into workflows.
For specific certifications, estimated salary impacts include the following ranges. The Google Professional Machine Learning Engineer is associated with a target salary of approximately $130,318 and a 25 percent bump for data and engineering professionals. The AWS Machine Learning Specialty carries a target salary range of roughly $120,000 to $155,000 and a 20 percent boost for existing practitioners. Entry-level credentials such as the IBM AI Developer program can produce career transitions that lift salaries from the $65,000 to $75,000 range into the $90,000 to $115,000 range for junior AI engineering roles. Data engineers who add AI capabilities through specialized credentials command 25 to 35 percent salary premiums over traditional data engineering peers, according to PassItExams.
Indeed data indicates that generative AI skills specifically can boost salaries by as much as 47 percent, cited by USAII, reflecting employer urgency around LLM deployment skills that remain scarce in the market.
The pros: Why AI certifications deliver real value
The business case for AI certification extends well beyond salary projections. Pearson VUE’s 2025 Value of IT Certification Candidate Report, based on survey responses from nearly 24,000 IT professionals globally, documented organizational impacts that go directly to the bottom line.
- Quality improvement: 79 percent of certified respondents reported a better quality of work output following certification.
- Innovation: 76 percent reported an increased ability to innovate and enhance work processes and outcomes.
- Productivity: 70 percent reported greater on-the-job productivity.
- Career advancement: 82 percent of respondents gained concrete career benefits, including promotions, salary increases and expanded responsibilities.
- Lifelong learning signal: 84 percent of certified professionals planned to pursue another IT certification within 12 months.
For hiring managers, certifications from recognized vendors and institutions function as trusted quality signals in a market flooded with self-reported AI experience. LinkedIn research highlights that candidates listing well-known certifications see higher recruiter engagement compared to those without them, according to DigitalDefynd’s 2026 analysis of AI certification value.
Certifications also address a structural problem that employers cite repeatedly. While 79 percent of organizations now use AI, DigitalOcean’s 2025 Currents report found that 41 percent struggle to integrate AI into existing workflows, 35 percent face challenges in model selection and 30 percent navigate data privacy minefields. Certified professionals are specifically trained to solve these problems, which is why demand for them consistently outpaces supply.
The cons: Limitations and honest cautions
AI certifications are valuable tools, but they are not magic credentials. Professionals and employers alike should understand their limitations before making significant investments.
- Time commitment is substantial: Comprehensive AI certification programs frequently require 10 to 15 hours of study per week over several months. For full-time professionals, this means evenings and weekends dedicated to coursework. Coursera’s Global Skills Report and Statista both rank time constraints among the top three barriers to professional upskilling.
- Cost can be prohibitive: Exam fees range from approximately $100 for entry-level credentials to $8,780 for multi-day intensive programs. Premium programs through institutions such as MIT Sloan can reach $2,500 to $4,700 per course. Not all employers reimburse these costs upfront, creating a financial barrier for some professionals.
- No universal accreditation standard: Unlike regulated professions such as law or medicine, AI certifications remain largely market-driven. Two programs with similar titles may differ dramatically in academic depth and practical rigor. This creates information asymmetry that forces learners to do careful due diligence before committing time and money.
- Experience still matters most: Employer surveys consistently show that hiring decisions prioritize experience, problem-solving ability and demonstrated business impact over certifications alone, according to LinkedIn and McKinsey data cited by DigitalDefynd. One hiring manager quoted by The Interview Guys put it plainly: ‘We are desperate for people who actually understand RAG architecture, not just people who have used it through an API.’
- Rapid obsolescence risk: AI technology evolves faster than most certification bodies can update their curricula. A credential earned in 2023 may reflect concepts that have already been superseded by newer techniques. Professionals should prioritize programs with documented update cycles and verify content currency before enrolling.
- Theory-practice gap: Programs that lack hands-on labs, capstone projects or real-world datasets can leave learners with theoretical knowledge that does not transfer to production environments. Without applied components, certification may signal effort without demonstrating actual capability.
Choosing the right credential for your role
The best AI certification depends on where a professional is starting and where they want to go. DigitalOcean outlines a practical decision framework: consider your technical foundation, your time and financial investment capacity, the industry recognition of the credential and your specific career trajectory.
For hands-on technical roles such as machine learning engineer, data scientist or AI engineer, vendor certifications from Google, AWS, Microsoft or NVIDIA carry the greatest market weight. For business leaders, product managers and project managers who need to speak credibly about AI without building models, credentials such as the PMI AI+ or IBM’s business-focused programs provide a more appropriate entry point. Andrew Ng’s AI for Everyone, offered through Coursera, remains one of the most effective programs for non-technical professionals who need AI literacy without programming requirements, completing the program in under ten hours.
For cybersecurity professionals specifically, the AI security credential landscape is expanding rapidly. AI security roles are paying between $180,000 and $280,000 in 2026, according to Practical DevSecOps, driven by demand for professionals who can secure LLM deployments, prevent prompt injection attacks and lock down AI pipelines. Specialized credentials such as the Certified AI Security Professional carry a salary premium of 15 to 20 percent over peers holding only generalist security certifications.
AI certification learning path: Beginner to expert
Here is a learning path you can explore, with each tier showing the certifications, typical timeframe and cost estimates. The path is built around four tiers that map to experience level rather than time in tech, so someone coming from a non-technical background can still progress through it systematically.
Tier 1 is about AI literacy and foundational awareness. Andrew Ng’s AI for Everyone is the fastest on-ramp for business professionals, while the IBM AI Developer and cloud foundational certs (AWS AI Practitioner, Azure AI-900) begin building hands-on vocabulary. These do not require prior coding experience.

Paul Frenken
Tier 2 moves into applied skills, where professionals start building and deploying real solutions. The Microsoft AI-102 and TensorFlow Developer credentials are natural follow-ons for cloud engineers and developers. The PMI AI+ is the right branch for project managers and business leaders who want to lead AI initiatives rather than build them.

Paul Frenken
Tier 3 is where serious salary premiums kick in. The Google Professional ML Engineer and AWS ML Specialty are the two most employer-recognized advanced credentials and are tied to documented 20–25% salary bumps for professionals already working in data and engineering roles.

Paul Frenken
Tier 4 represents specialization tracks rather than a single next step. NVIDIA’s DLI targets computer vision and GPU-intensive work, the CAISP is the path for cybersecurity professionals pivoting into AI security and Stanford’s graduate certificate is the academic credentialing route for those targeting research leadership or senior advisory roles.

Paul Frenken
Note: The salary figures in the tables above are based on 2025–2026 market data from Glassdoor, Indeed, Payscale, PassItExams and Ravio. Individual outcomes vary by location, experience and employer. Costs reflect exam fees unless otherwise noted.
The bottom line
AI certifications represent one of the highest-return professional investments available to IT professionals in 2026. The market is generating enormous demand, the skills gap remains wide and salary data consistently confirms that certified professionals out-earn their non-certified counterparts. For entry-level professionals, the right credential can produce a $20,000 to $30,000 salary lift within the first year. For experienced practitioners adding AI specialization to an existing technical foundation, gains of 20 to 47 percent above current compensation are well-documented in the literature.
The cautions are real. Certification alone will not substitute for demonstrated experience, hands-on project work and business acumen. The best candidates combine credentials with portfolios, practical skills and the ability to translate AI capabilities into organizational outcomes. For professionals willing to make the investment, the question is not whether AI certification is worth it. The data answers that plainly. The real question is which credential best positions you for the role you are targeting, and whether you are prepared to back it up with the practical skills that employers are urgently seeking.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Hands-on with the Google Agent Development Kit 13 Apr 2026, 2:00 am
The Google Agent Development Kit (ADK) is a flexible and modular open-source framework for developing and deploying AI agents. It is optimized for Gemini and the Google ecosystem, but the ADK is model-agnostic, deployment-agnostic, and built for compatibility with other frameworks. The ADK was designed to make agent development feel more like software development, to help developers create, deploy, and orchestrate agentic architectures.
Google recommends deploying your ADK agent to its own Vertex AI Agent Engine Runtime, which is a fully managed Google Cloud service designed for deploying, managing, and scaling AI agents built with frameworks such as ADK. Alternatively, you can containerize your agent and deploy it essentially anywhere, including Google Cloud Run.
Direct competitors to the ADK include Amazon Bedrock AgentCore, Azure AI Foundry Agents, and Databricks Agent Bricks. Additional competitors include the OpenAI Agents SDK, LangChain/LangGraph, CrewAI, and SmythOS. I reviewed Amazon Bedrock AgentCore in March.
ADKs for Python, Go, Java, and TypeScript
There are currently four Google ADK languages: Python, Go, Java, and TypeScript. In addition, there is an ADK for Web that is a developer UI, which we will discuss later.
The Python ADK seems to be the oldest of the four, based on its commit history, and has the most samples. It features a rich tool ecosystem, code-first development, agent config (which lets you build an ADK workflow without writing code), a tool confirmation flow (HITL), modular multi-agent systems, and near-universal deployment. The newest features include custom service registration, the ability to rewind a session to before a previous invocation, and a class that supports executing agent-generated code using the Vertex AI Code Execution Sandbox API. The ADK supports the A2A protocol for remote agent-to-agent communication.
You can install the latest stable version of the Python ADK using pip (or pip3, depending on your Python installation):
pip install google-adk
There are quite a few dependencies. You can also install the ADK with uv, as demonstrated in some of the samples. Using a virtual environment is recommended.
The Go ADK offers essentially the same features as the Python ADK, plus idiomatic Go. You can add ADK Go to your project, by running:
go get google.golang.org/adk
The Java ADK boasts the same features, development UI, and interface as the Python ADK. The installation requires adding google-adk to your Maven dependencies.
The TypeScript ADK has the same features, development UI, and interface as the Python ADK. To install it, you can run:
npm install @google/adk
ADK Quickstarts
There are at least five quick starts for the ADK. One is for Python and the Vertex AI SDK. The other four, as you might expect, are for Python, Go, Java, and TypeScript. Those four follow the same pattern. Basically, you install the ADK, use the ADK to create an agent project, add a little code to the agent project, set your Gemini API key, and run your agent both as a CLI and using a web interface.

Running the sample ADK agent as a CLI. Note that the time is hard-wired (mocked) in the sample.
Foundry

Running the sample ADK agent as a web UI.
Foundry
ADK Web: Local development environment
ADK Web is the built-in dev UI that comes with the ADK for easier development and debugging. The prerequisites are npm, Node.js, Angular CLI, google-adk (Python), and google-adk (Java). To install, you clone the ADK Web repo, install its node dependencies, then run both the ADK Web and ADK API servers, in separate terminals.
If all is well, the UI will be at localhost:4200. It shows you events, traces, artifacts, and evaluations, and offers you an agent builder and assistant.

ADK Web agent builder and assistant.
Foundry
Core ADK concepts and capabilities
ADK agents can either use language models or be deterministic workflow agents or custom agents, which allow you to define arbitrary orchestration logic. Agents can call tools to interact with external APIs, search, or run code. They can also load and save artifacts.
You can provide callbacks to run at specific points in the agent’s process. The ADK handles the context of a session, its events, and the agent’s short-term state, much like a web server supports a web application. The ADK also supports long-term memory across multiple sessions.
Planning is a way to break down goals before trying to accomplish them. Runners manage the execution flow and orchestrate agent interactions.
The ADK supports applications composed of multiple, specialized agents that can interact.
The ADK includes a command-line interface (CLI) and a developer UI for running agents, inspecting execution steps (events, state changes), debugging interactions, and visualizing agent definitions.
The framework includes tools to create multi-turn evaluation data sets and run evaluations. It also strives to be open and extensible. As I mentioned earlier, the ADK is optimized for Gemini and the Google ecosystem, but is nevertheless model-agnostic, deployment-agnostic, and compatible with other frameworks.
Agent skills
Agent skills are a simple, open format for giving agents new capabilities and expertise. They are folders of instructions, scripts, and resources that agents can discover and use. Skills provide the context that agents need to do real work. Skills can enable domain expertise, new capabilities, repeatable workflows, and interoperability.
The agent skills format was originally developed by Anthropic and released as an open standard. It is supported by many AI development tools in addition to the ADK, such as Visual Studio Code, GitHub, Frontier LLMs, agentic coding tools, and AI-capable databases such as Snowflake and Databricks.
Agent runtimes
ADK offers several ways to run your agents for development and test. These include:
adk web, which launches a browser-based interfaceadk run, which lets you interact with your agents in the terminal, andadk api_server, which exposes your agents through a RESTful API.
ADK samples and community repos
The ADK Samples Repo contains ADK sample agents for the four supported languages, although Python agents dominate. The ADK Python community repo is home to a growing ecosystem of community-contributed tools, third-party service integrations, and deployment scripts that extend the core capabilities of the ADK.
Examining the customer service sample
The ADK Samples Customer Service example is a conversational, multimodal Python agent for a fictional big-box retailer specializing in home improvement, gardening, and related supplies. I chose it as the sample closest to the customer service agent I tried for my Amazon Bedrock AgentCore review. The agent’s flow diagram is below.
The customer service agent uses mocks for its tools, so not everything works exactly as you’d expect. To implement this agent with actual back-end integration, you will need to edit customer_service/tools/tools.py, which is where all the agent tools are currently mocked with wired-in responses, and replace the mocks with API calls.
customer_service/tools/agent.py imports those tools and lists them all in its root_agent/tools array, part of the Agent constructor. The tools/config.py logic defines this agent as deployable to Vertex AI in the us-central region, and deployment/deploy.py defines a Vertex AI bucket and agent wheel file.
Comparing this sample to the Amazon Bedrock AgentCore sample, I find the ADK sample more capable in general. The one feature of AgentCore that the ADK seems to lack is policies external to the agent that are implemented in the framework; in the ADK example, a limit on discounts is implemented in Python code, which seems like a more reasonable approach to me, and probably to most programmers.

Workflow diagram of the cymbal_retail_agent, which is the core of the ADK Customer Service Example from the ADK Samples repo.
Foundry
The bottom line
The Google Agent Development Kit is a capable and mostly complete framework for developing agents. It can build workflow agents and custom agents in addition to LLM agents, supports multi-agent architectures, and allows you to extend your agent capabilities with AI models, artifacts, tools, integrations, plug-ins, skills, and callbacks. An ADK agent can act as a Model Context Protocol (MCP) client, and you can expose ADK tools via an MCP server. The downside of all those capabilities would be the time and effort needed to learn the framework.
Overall, I like the Google ADK, and its architecture makes more sense to me than Amazon Bedrock AgentCore. The ADK also offers more programming language options than AgentCore, as well as better development tooling.
For my review, I only went deep on the Python ADK and its customer service example. If there are deficiencies in the other three language ADKs other than a paucity of examples, I wouldn’t know about them.
Cost
The Google ADK framework is free open source; Vertex AI Agent Engine pricing is primarily usage-based.
Platform
Development requires Python, TypeScript, Go, or Java environments. You can deploy to the Vertex AI Agent Engine, to Google Cloud Run, to Google Kubernetes Engine, or to any other container or Kubernetes environment.
Pros
- Capable and mostly complete framework for developing agents
- Supports Python, TypeScript, Go, and Java
- Optimized for Gemini models, but model-agnostic
- Many Python examples
Cons
- Few TypeScript, Go, or Java examples
- Extensive framework has a significant learning curve
Page processed in 0.276 seconds.
Powered by SimplePie 1.3, Build 20180209064251. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.
