Or try one of the following: 詹姆斯.com, adult swim, Afterdawn, Ajaxian, Andy Budd, Ask a Ninja, AtomEnabled.org, BBC News, BBC Arabic, BBC China, BBC Russia, Brent Simmons, Channel Frederator, CNN, Digg, Diggnation, Flickr, Google News, Google Video, Harvard Law, Hebrew Language, InfoWorld, iTunes, Japanese Language, Korean Language, mir.aculo.us, Movie Trailers, Newspond, Nick Bradbury, OK/Cancel, OS News, Phil Ringnalda, Photoshop Videocast, reddit, Romanian Language, Russian Language, Ryan Parman, Traditional Chinese Language, Technorati, Tim Bray, TUAW, TVgasm, UNEASYsilence, Web 2.0 Show, Windows Vista Blog, XKCD, Yahoo! News, You Tube, Zeldman
Gemini CLI introduces plan mode | InfoWorld
Technology insight for the enterpriseGemini CLI introduces plan mode 12 Mar 2026, 3:53 pm
Google has added a plan mode to Gemini CLI, its open-source software engineering agent that runs on the command line.
With plan mode, announced March 11, Gemini CLI focuses first on analyzing a request, planning complex changes, and understanding the codebase or dependencies in a read-only mode that is safe from accidental changes or executions. Plan mode will also ask questions to clarify goals before proposing a strategy for a user to review. Plan mode is now is enabled by default for all users, according to Google. Users can enter plan mode at any time by typing /plan in the input box, pressing Shift+Tab to cycle through approval modes, or simply asking the agent to “start a plan.”
Plan mode restricts Gemini CLI to a subset of read-only tools such as read_file, grep_search, and glob to validate assumptions, Google said. When active, the agent can navigate a codebase, search for patterns, and read documentation, but it cannot modify any files except for its own internal plans. Users can ask Gemini CLI to “research how to migrate this database” or “plan a new feature,” and dependencies will be mapped out and a solution proposed without risk of eager code changes. A new ask_user tool has been introduced, allowing the agent to pause its research and ask targeted questions to clarify a user’s goals or gather missing information.
JetBrains unveils AI tracing library for Kotlin and Java 12 Mar 2026, 3:21 pm
JetBrains has introduced Tracy, an AI tracing library for the Kotlin and Java languages.
Announced March 11 and accessible from GitHub, Tracy helps developers trace, monitor, and evaluate AI-powered features directly from their Kotlin or Java projects, JetBrains said. The open-source Kotlin library provides a unified API to capture structured traces and helps developers debug failures, measure execution time, and track large language model (LLM) usage across model calls, tool calls, and custom application logic.
Tracy implements the OpenTelemetry Generative AI Semantic Conventions for span attributes and event naming, thus ensuring traces remain compatible with any OpenTelemetry-compliant back end. JetBrains noted the following specific uses for Tracy:
- Tracing AI clients to capture messages, cost, token usage, and execution time.
- Tracing any function to record inputs, outputs, and execution duration.
- Creating and managing spans manually.
- Exporting traces to supported back ends (currently Langfuse and Weave).
Licensed under the Apache 2.0 License, Tracy is compatible with Kotlin from version 2.0.0 and Java from version 17. Integrations can be made with SDKs from OpenAI, Anthropic, and Gemini. The library also works with common Kotlin/LLM stacks including OkHttp and Ktor clients, as well as OpenAI, Anthropic, and Gemini ones, JetBrains said.
Why Postgres® has won as the de facto database: Today and for the agentic future 12 Mar 2026, 11:13 am
The AI economy is projected to reach $17 trillion by 2028, fundamentally altering how organizations architect their infrastructure. Driven by this shift, 95% of major global enterprises are on a mission-critical sprint to become their own AI and data platforms within the next two years.
Yet only 13% of enterprises have successfully found the formula. Their secret to mainstreaming agentic AI? Abandoning fragmented, legacy architectures and placing their data directly alongside their AI in a secure, compliant, and sovereign manner.
As organizations rapidly transition to an “agentic” workforce, they are entering a highly volatile, uncertain, complex, and ambiguous (VUCA) environment. Surviving this shift requires abandoning rigid, traditional strategies in favor of agility and resilience. For enterprises leading the charge, the foundational layer of choice is clear: true open source relational databases. Today, 81% of these successful enterprises have committed to open source strategies, with over 40% standardizing on PostgreSQL as their relational data layer.
As Doug Flora, VP of Product Marketing at EnterpriseDB (EDB), noted: “It’s imperative in moments of rapid change to follow the patterns of the leaders looking to forge success, not the majority who are still working in the patterns of the recent past. Those committing to open source and a mission-critical focus on sovereignty over their AI and data are plotting a pathway to agentic success that achieves 5x the ROI of the majority.”
Extensibility matters: AI needs both structured and unstructured data
AI applications cannot run on vector embeddings alone; they require a deep synthesis of structured, semi-structured, and unstructured data. Unlike many legacy databases that bolt on new features as afterthoughts, Postgres was natively architected for core extensibility. It empowers developers to extend data types, indexes, query planners, functions, and storage engines dynamically.
By unifying vectorized data with traditional transactional (binary) data, Postgres effectively gives AI agents the “eyes, ears, and brain” necessary to sense inputs and operate autonomously within a single, ACID-compliant environment.
An ecosystem built for architectural agility
In a rapidly expanding data ecosystem, relying on a fragmented architecture of specialized databases creates complex synaptic connectors prone to latency, integration failures, and data silos—or what amounts to human hallucinations at the system level. Postgres eliminates this technical debt by extending a single database engine to meet diverse workload demands.
“Developers have long loved Postgres for its extensibility, flexibility, and open innovation model. Now global enterprises are recognizing that same value, making Postgres a strategic decision and running mission-critical data systems on it,” said Jozef de Vries, SVP, Core Database Engineering, EDB.
Developers can seamlessly extend Postgres to handle highly complex, volatile workloads:
- pgvector: Enables advanced vector search, allowing developers to combine relational data, metadata, and embeddings to build robust retrieval-augmented generation (RAG) applications
- Citus: Accelerates multi-tenant SaaS applications and powers real-time analytics (HTAP) via transparent sharding and parallel query execution
- PostGIS: Delivers enterprise-grade geospatial querying, critical for defense and retail industries
- TimescaleDB: Manages massive time-series data crucial for complex analytic models and agentic learning patterns
- pgraph: Handles complex, interconnected data traversals to uncover hidden relationships
The future needs crowdsourced intelligence, not vendor lock-in
Crucially, no single corporate entity owns Postgres. Its vitality relies on the collective intelligence of one of the largest independent developer communities on the planet. In 2025 alone, more than 260 developers contributed code directly to PostgreSQL’s core database engine, with hundreds more participating in testing, reviews, and documentation across the world. Beyond the codebase, the ecosystem is supported by hundreds of user groups, meetups, and international PostgreSQL conferences that keep innovation flowing across all five continents.
While enterprise-grade platforms are built around Postgres to optimize it for sovereign, agentic environments—with big tech giants among the top commercial contributors and EDB leading with more than 30% of contributions—its innovation comes directly from this rich and diverse community that continues to expand. Drawing on the principles of James Surowiecki’s The Wisdom of Crowds, this crowdsourced intelligence ensures that the database evolves faster and more robustly than it would in any proprietary, single-vendor ecosystem.
Securing a sovereign data future
To thrive in the agentic future, engineering and data leaders must make two critical architectural moves: First, break free from locked-in legacy relational ecosystems, such as Oracle, MySQL, SQL Server, or Greenplum , that constrain agility.
Second, harness the immense extensibility of Postgres, its vibrant open source community, and its core ACID capabilities to unify data and AI.
The future of enterprise architecture isn’t about renting space in a hyperscaler’s proprietary ecosystem. It’s about creating your own sovereign platform, where your structured and unstructured data seamlessly power a new agentic workforce under your complete control. Move your data to Postgres now, or risk missing the foundation of the agentic future.
Get your complimentary copy of the O’Reilly book Building a Data and AI Platform with PostgreSQL.
Running agents with Amazon Bedrock AgentCore 12 Mar 2026, 2:00 am
While Amazon Bedrock helps you build and scale generative AI applications, Amazon Bedrock AgentCore provides an enterprise-grade infrastructure and operations layer for deploying and managing AI agents at scale. AgentCore itself is completely agnostic about models, frameworks, and integrations, although its starter kit CLI only supports the most prominent of these.
That CLI can generate agents using Amazon Bedrock, Anthropic, Google Gemini, and OpenAI models, using Strands, LangGraph, Microsoft Autogen, OpenAI Agents SDK, Google Agent Development Kit, and CrewAI frameworks, and using Amazon AgentCore Memory, AgentCore Observability, and AgentCore Gateway integrations. Note that many of these are produced by competitors to AWS.
AgentCore’s core services include a runtime, memory (both short-term and long-term), a gateway, identity management, a sandboxed code interpreter, a cloud-based browser, observability, an evaluation service, and a policy capability that runs outside the agent. We’ll discuss these in more detail below.
Direct cloud platform competitors to AgentCore provide similar enterprise-level hosting, security, and governance for agents within their respective ecosystems. They include:
- Google Cloud Agent SDK (ADK): Built on Vertex AI, the ADK offers deep integration with Gemini models.
- Azure AI Foundry Agents: Best for those heavily invested in the Microsoft ecosystem and Azure OpenAI Service.
- Databricks Agent Bricks: A data-centric alternative that uses the Unity Catalog to build agents directly from enterprise data.
Additional competitors (and sometimes collaborators) to AgentCore include the OpenAI Agents SDK, LangChain/LangGraph, CrewAI, and SmythOS.

Amazon Bedrock AgentCore overview chart.
Amazon Web Services

AgentCore quick start CLI workflow. The three steps are agent creation, agent deployment, and agent invocation.
Amazon Web Services
AgentCore Core Services
AgentCore Core Services include runtime, memory (both short-term and long-term), a gateway, identity management, a sandboxed code interpreter, a cloud-based browser, observability, an evaluation service, and a policy capability that runs outside the agent. You can use whichever of these services that are useful to support your agent development.
AgentCore Runtime is a secure, serverless runtime environment for deploying and scaling dynamic AI agents and tools. It provides fast cold starts for real-time interactions, extended runtime support for asynchronous agents, true session isolation, built-in identity, and support for multi-modal and multi-agent agentic workloads. The runtime integrates with custom frameworks and any open-source framework as well as any foundation model in or outside of Amazon Bedrock.
AgentCore Memory lets you build context-aware agents with control over what the agent remembers and learns. It has support for both short-term memory for multi-turn conversations, and long-term memory that persists across sessions. It can share memory stores across agents, and it can learn from experiences. It works with LangGraph, LangChain, Strands, and LlamaIndex.
AgentCore Gateway is a secure way to convert any APIs, Lambda functions, and existing services into Model Context Protocol (MCP)-compatible tools. It can also connect to pre-existing MCP servers, making them available to AI agents through gateway endpoints with a few lines of code.
AgentCore Identity is a secure, scalable agent identity, access, and authentication management service. It is compatible with identity and credential providers, eliminating the need for user migration or rebuilding authentication flows.
AgentCore Code Interpreter is an isolated sandbox environment for agents to execute code, enhancing their accuracy and scope. It supports Python, JavaScript, and TypeScript. It provides support for a default execution time of 15 minutes, which can be extended to up to eight hours. A best practice is to keep code snippets concise and focused on specific tasks. You can use the Strands Agents SDK with Python or TypeScript, or either the bedrock_agentcore SDK or Boto3 with Python. The session isolation architecture for the code interpreter and the browser uses Firecracker microVMs.
AgentCore Browser is a remote browser that runs in a separate environment rather than on the local machine. For agent development, remote browsers allow AI agents to interact with the web as humans do. AgentCore Browser allows your agent and model to navigate websites, fill forms, click buttons, and parse dynamic content, and allows you to monitor a live view and intervene if necessary. AgentCore Browser integrates with Nova Act, Strands, and Playwright to automate web interactions.
AgentCore Observability is a unified view to trace, debug, and monitor agent performance in production, which offers detailed visualizations of each step in the agent workflow, enabling you to inspect an agent’s execution path, audit intermediate outputs, and debug performance bottlenecks and failures. It integrates with any observability stack that supports OpenTelemetry (OTEL) format.
AgentCore Evaluations is a service for automated, consistent, and data-driven agent assessment. AgentCore Evaluations measures how well your agents and tools execute tasks, handle edge cases, and maintain output reliability across diverse inputs and contexts.
AgentCore Policy is a capability that provides deterministic control to ensure agents operate within defined boundaries and business rules without slowing them down. You can author rules using natural language or Cedar (AWS’s open-source policy language). Policy runs outside the agents so that the models can’t violate the constraints.
AgentCore use cases
Amazon Bedrock AgentCore lets you deploy AI agents with scale, reliability, and security. There are three major categories of use cases for AgentCore: agents, MCP servers, and agent platforms.
With agents, you can build AI apps that reason, use tools, and maintain context. You can apply these to customer support, workflow automation, data analysis, or coding assistance, for example. Using AgentCore, your agents run serverlessly, with isolated sessions, persistent memory, and built-in observability.
AgentCore helps you convert APIs, databases, and services into tools that MCP-compatible agents can use. You can deploy a gateway that wraps your Lambda functions or OpenAPI specs and makes your back end accessible to agents, without rewriting code.
Finally, you can build agent platforms that help your developers or customers deploy agents using approved tools, shared memory stores, and governed access to enterprise services. You can include observability, authentication, and compliance using standard AgentCore core services.
Customer support agent example
The AgentCore team suggested that I try the Customer Support Agent demo (see architecture diagram below). While it looked like a typical AWS architecture with multiple services involved, I’ve seen much worse. What the heck, I thought.

Amazon Bedrock AgentCore Demo – Customer Support Agent. This demo shows how to deploy an agent, authenticate a user, invoke the agent, and then add an AgentCore Policy to enforce refund limits in real time. It’s essentially the same application as the CLI-based Get Started with Amazon Bedrock AgentCore Demo, although there’s more documentation and you can read the code before trying it.
Amazon Web Services
Almost a week later, with multiple bugs and other roadblocks reported and fixed, I finally completed the exercise. The terminal logs follow, starting with the deployment script. I’ve skipped over showing you the step of logging into AWS from my local command line, since it bounces to a web page and back, in order to connect the local session to the proper credentials.
martinheller@Mac customer-support-agent-with-agentcore % scripts/deploy.sh
==> Pre-flight checks
Verifying Bedrock model access (global.anthropic.claude-sonnet-4-5-20250929-v1:0)...
WARNING: Could not invoke Bedrock model (global.anthropic.claude-sonnet-4-5-20250929-v1:0).
Possible reasons:
1. Anthropic first-time usage form not completed.
Complete it in the Bedrock console Playground by selecting any Anthropic Claude model.
Details: https://aws.amazon.com/blogs/security/simplified-amazon-bedrock-model-access/
2. Your current IAM identity lacks bedrock:InvokeModel permission.
Note: the deployed agent uses its own execution role, so this may not
be a problem. Verify after deployment with: uv run agentcore invoke
The deploy will continue.
All checks passed.
==> Installing Python dependencies (uv sync)
Resolved 110 packages in 14ms
Audited 103 packages in 11ms
==> Installing CDK dependencies (npm install)
up to date, audited 337 packages in 901ms
37 packages are looking for funding
run `npm fund` for details
3 vulnerabilities (1 moderate, 2 high)
To address all issues, run:
npm audit fix
Run `npm audit` for details.
==> Bootstrapping CDK (if needed)
> cdk@0.1.0 cdk
> cdk bootstrap
⏳ Bootstrapping environment aws://577405208411/us-east-2...
Trusted accounts for deployment: (none)
Trusted accounts for lookup: (none)
Using default execution policy of 'arn:aws:iam::aws:policy/AdministratorAccess'. Pass '--cloudformation-execution-policies' to customize.
✅ Environment aws://577405208411/us-east-2 bootstrapped (no changes).
NOTICES (What's this? https://github.com/aws/aws-cdk/wiki/CLI-Notices)
37013 (cli) cdk watch triggers deployment unexpectedly or not at all
Overview: Do not use 'cdk watch' with this version of the cdk cli.
Upgrade to ^2.1106.0.
Affected versions: cli: >=2.1103.0 =2.1100.0 ". For example, "cdk acknowledge 37013".
==> Deploying all stacks
> cdk@0.1.0 cdk:deploy:ci
> cdk deploy --all --require-approval never --outputs-file /Volumes/Glauce/repos/amazon-bedrock-agentcore-samples/05-blueprints/customer-support-agent-with-agentcore/cdk-outputs.json
✨ Synthesis time: 2.29s
supportAgentDemo-DockerImageStack: start: Building supportAgentDemo-DockerImageStack Template
supportAgentDemo-DockerImageStack: success: Built supportAgentDemo-DockerImageStack Template
supportAgentDemo-DockerImageStack: start: Building supportAgentDemo-AppImage
supportAgentDemo-DockerImageStack: start: Publishing supportAgentDemo-DockerImageStack Template (current_account-current_region-f5e9391a)
supportAgentDemo-DockerImageStack: success: Published supportAgentDemo-DockerImageStack Template (current_account-current_region-f5e9391a)
#0 building with "desktop-linux" instance using docker driver
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 795B done
#1 DONE 0.0s
#2 [internal] load metadata for ghcr.io/astral-sh/uv:python3.13-bookworm-slim
#2 DONE 3.6s
#3 [internal] load .dockerignore
#3 transferring context: 814B done
#3 DONE 0.0s
#4 [1/7] FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim@sha256:531f855bda2c73cd6ef67d56b733b357cea384185b3022bd09f05e002cd144ca
#4 DONE 0.0s
#5 [internal] load build context
#5 transferring context: 1.87MB 0.0s done
#5 DONE 0.0s
#6 [5/7] RUN uv pip install aws-opentelemetry-distro==0.12.2
#6 CACHED
#7 [4/7] RUN uv pip install -r pyproject.toml
#7 CACHED
#8 [2/7] WORKDIR /app
#8 CACHED
#9 [3/7] COPY pyproject.toml pyproject.toml
#9 CACHED
#10 [6/7] RUN useradd -m -u 1000 bedrock_agentcore
#10 CACHED
#11 [7/7] COPY . .
#11 DONE 0.0s
#12 exporting to image
#12 exporting layers done
#12 writing image sha256:103477e07cdce77a6d41dc7f875d781cb408dc59556cdf205c5dddeed5cc50d0 done
#12 naming to docker.io/library/cdkasset-9d592d01e5467aa35239774fad1da2f13bc79a6fdfad62b7b89146b9d8f12462 done
#12 DONE 0.0s
View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/jogcv8dnbz67ymx5fi5dnrfbt
supportAgentDemo-DockerImageStack: success: Built supportAgentDemo-AppImage
supportAgentDemo-DockerImageStack: start: Publishing supportAgentDemo-AppImage (current_account-current_region-eadbec27)
supportAgentDemo-AgentCoreStack: start: Building supportAgentDemo-AgentCoreStack Template
supportAgentDemo-AgentCoreStack: success: Built supportAgentDemo-AgentCoreStack Template
The push refers to repository [577405208411.dkr.ecr.us-east-2.amazonaws.com/cdk-hnb659fds-container-assets-577405208411-us-east-2]
f75c8a00d29b: Preparing
e9cefea56108: Preparing
6ad1a9eaa547: Preparing
2c4f33ca63a0: Preparing
bd07e5776977: Preparing
65d80ab29699: Preparing
d700e87997ad: Preparing
b27347ec89d2: Preparing
733eb94a487b: Preparing
dac1af1d7cd9: Preparing
bd390c400455: Preparing
d700e87997ad: Waiting
b27347ec89d2: Waiting
733eb94a487b: Waiting
dac1af1d7cd9: Waiting
bd390c400455: Waiting
65d80ab29699: Waiting
e9cefea56108: Layer already exists
6ad1a9eaa547: Layer already exists
2c4f33ca63a0: Layer already exists
bd07e5776977: Layer already exists
65d80ab29699: Layer already exists
d700e87997ad: Layer already exists
733eb94a487b: Layer already exists
b27347ec89d2: Layer already exists
dac1af1d7cd9: Layer already exists
bd390c400455: Layer already exists
f75c8a00d29b: Pushed
9d592d01e5467aa35239774fad1da2f13bc79a6fdfad62b7b89146b9d8f12462: digest: sha256:455ae21c8896a1df8558466d99b87e0f038e79b5baaf00514f810f95183562d0 size: 2627
supportAgentDemo-DockerImageStack: success: Published supportAgentDemo-AppImage (current_account-current_region-eadbec27)
supportAgentDemo-DockerImageStack
supportAgentDemo-DockerImageStack: deploying... [1/2]
supportAgentDemo-DockerImageStack: creating CloudFormation changeset...
✅ supportAgentDemo-DockerImageStack
✨ Deployment time: 13.06s
Outputs:
supportAgentDemo-DockerImageStack.ImageUri = 577405208411.dkr.ecr.us-east-2.amazonaws.com/cdk-hnb659fds-container-assets-577405208411-us-east-2:9d592d01e5467aa35239774fad1da2f13bc79a6fdfad62b7b89146b9d8f12462
Stack ARN:
arn:aws:cloudformation:us-east-2:577405208411:stack/supportAgentDemo-DockerImageStack/6d08ccf0-1664-11f1-8de3-0a00ca3e2d9f
✨ Total time: 15.35s
supportAgentDemo-AgentCoreStack: start: Publishing supportAgentDemo-AgentCoreStack Template (current_account-current_region-8aa3c459)
supportAgentDemo-AgentCoreStack: success: Published supportAgentDemo-AgentCoreStack Template (current_account-current_region-8aa3c459)
✅ supportAgentDemo-AgentCoreStack
✨ Deployment time: 39.12s
Outputs:
supportAgentDemo-AgentCoreStack.AccountId = 577405208411
supportAgentDemo-AgentCoreStack.AuthorizerDiscoveryUrl = https://cognito-idp.us-east-2.amazonaws.com/us-east-2_NaZm7AOfz/.well-known/openid-configuration
supportAgentDemo-AgentCoreStack.ClientId = 7nvmhkv45gh933mircjoji1gfe
supportAgentDemo-AgentCoreStack.CognitoDomain = supportagentdemo-577405208411-us-east-2.auth.us-east-2.amazoncognito.com
supportAgentDemo-AgentCoreStack.GatewayId = supportagentdemo-gateway-7d3m0sdfzy
supportAgentDemo-AgentCoreStack.GatewayUrl = https://supportagentdemo-gateway-7d3m0sdfzy.gateway.bedrock-agentcore.us-east-2.amazonaws.com/mcp
supportAgentDemo-AgentCoreStack.MemoryArn = arn:aws:bedrock-agentcore:us-east-2:577405208411:memory/supportAgentDemo_Memory_v2-bLfMFAGQmp
supportAgentDemo-AgentCoreStack.MemoryId = supportAgentDemo_Memory_v2-bLfMFAGQmp
supportAgentDemo-AgentCoreStack.Region = us-east-2
supportAgentDemo-AgentCoreStack.RuntimeArn = arn:aws:bedrock-agentcore:us-east-2:577405208411:runtime/supportAgentDemo_Agent-U90VrdH6Rf
supportAgentDemo-AgentCoreStack.RuntimeId = supportAgentDemo_Agent-U90VrdH6Rf
supportAgentDemo-AgentCoreStack.RuntimeName = supportAgentDemo_Agent
supportAgentDemo-AgentCoreStack.UserPoolId = us-east-2_NaZm7AOfz
Stack ARN:
arn:aws:cloudformation:us-east-2:577405208411:stack/supportAgentDemo-AgentCoreStack/74e9b510-1664-11f1-8ec7-0606ac99759d
✨ Total time: 41.4s
NOTICES (What's this? https://github.com/aws/aws-cdk/wiki/CLI-Notices)
37013 (cli) cdk watch triggers deployment unexpectedly or not at all
Overview: Do not use 'cdk watch' with this version of the cdk cli.
Upgrade to ^2.1106.0.
Affected versions: cli: >=2.1103.0 =2.1100.0 ". For example, "cdk acknowledge 37013".
CDK outputs written to /Volumes/Glauce/repos/amazon-bedrock-agentcore-samples/05-blueprints/customer-support-agent-with-agentcore/cdk-outputs.json
==> Generating .bedrock_agentcore.yaml
Generated /Volumes/Glauce/repos/amazon-bedrock-agentcore-samples/05-blueprints/customer-support-agent-with-agentcore/.bedrock_agentcore.yaml
============================================================
Deployment complete!
============================================================
Next steps:
1. Check agent status:
uv run agentcore status
2. Create a Cognito user:
uv run scripts/cognito-user.py --create
3. Log in and set your bearer token:
eval $(uv run scripts/cognito-user.py --login --export)
4. Invoke the agent:
uv run agentcore invoke '{"prompt": "Who am I?"}'
To tear down all resources later:
scripts/teardown.sh
The deployment threw some warnings. It turned out that some of the checks weren’t quite reliable. I would fix a problem called out by the script, and it would still be called out the next time I ran it.
Note that part of what the setup script did was to create a Docker pod on my machine and register it with AWS’s directory. When I re-ran the script, it would build new Docker images, so I’d have to delete the old ones manually.
I’ve skipped showing you the agent status script (1 above). It originally didn’t have the uv run prefix, and agentcore, which only existed inside a Python environment, wasn’t found. Once we got that sorted, it gave a false negative the first few times I ran it, until the AWS engineers fixed a parsing problem; then it just worked.
In the user creation script, you also need the uv run prefix:
martinheller@Mac customer-support-agent-with-agentcore % uv run scripts/cognito-user.py --create
Choose a demo user:
1) john@example.com (John Doe)
2) jane@example.com (Jane Smith)
Enter 1 or 2: 1
Password requirements:
- At least 8 characters
- At least one uppercase letter (A-Z)
- At least one lowercase letter (a-z)
- At least one number (0-9)
- At least one special character (e.g. !@#$%^&*)
Password:
User already exists: john@example.com
Password set for: john@example.com
Added to group: standard
User ready. Login with:
eval $(uv run scripts/cognito-user.py --login --export)
The Cognito user has to match a pre-populated database of demo users, which is why there are only two choices.
martinheller@Mac customer-support-agent-with-agentcore % eval $(uv run scripts/cognito-user.py --login --export)
Opening browser for authentication...
Waiting for callback...
Login successful. BEDROCK_AGENTCORE_BEARER_TOKEN is now set.
That Cognito login step retrieves the OAuth bearer token to be used for the rest of the session. It expires after an hour. Don’t ask me how I know. Yes, you have to log into AWS as well as logging into Cognito. As AWS explains, the dual login is by design to maintain integrity of the agents developers are building. Cognito identifies who the user is (OAuth bearer token, one-hour expiry), and AWS identifies which service boundary you’re in.
martinheller@Mac customer-support-agent-with-agentcore % uv run agentcore invoke '{"prompt": "Who am I?"}'
Using bearer token for OAuth authentication
Using JWT authentication
I'll look up your account information using your email address.Hello! You are **John Doe** (Customer ID: CUST-001). Your account is registered with the email john@example.com, and you've been a member since June 1, 2023. How can I help you today?
╭──────────────────────────────────────────────────────── supportAgentDemo_Agent ────────────────────────────────────────────────────────╮
│ Session: 74d74b58-aa7d-4b8b-9515-e0931743ce1d │
│ ARN: arn:aws:bedrock-agentcore:us-east-2:577405208411:runtime/supportAgentDemo_Agent-U90VrdH6Rf │
│ Logs: aws logs tail /aws/bedrock-agentcore/runtimes/supportAgentDemo_Agent-U90VrdH6Rf-DEFAULT --log-stream-name-prefix │
│ "2026/03/02/[runtime-logs]" --follow │
│ aws logs tail /aws/bedrock-agentcore/runtimes/supportAgentDemo_Agent-U90VrdH6Rf-DEFAULT --log-stream-name-prefix │
│ "2026/03/02/[runtime-logs]" --since 1h │
│ GenAI Dashboard: https://console.aws.amazon.com/cloudwatch/home?region=us-east-2#gen-ai-observability/agent-core │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
That actually went through the Claude model. If you get an error at this point, you might not have Anthropic permission or any Bedrock Claude quota. Asking for a quota is a matter of filling out an AWS form and waiting a couple of days. The next query, for an order list, also goes through the model.
martinheller@Mac customer-support-agent-with-agentcore % uv run agentcore invoke '{"prompt": "Show me my recent orders"}'
Using bearer token for OAuth authentication
Using JWT authentication
I'll look up your recent orders for you.Now let me get the full details for each of these orders to show you what items were in them.Here are your recent orders:
**1. Order ORD-12430** - Ordered Feb 5, 2025 | Delivered Feb 10, 2025
- 4K Monitor (1x) - $399.00
- **Total: $399.00** | Status: ✓ Delivered
**2. Order ORD-12420** - Ordered Feb 1, 2025 | Delivered Feb 4, 2025
- Phone Case (1x) - $29.99
- **Total: $29.99** | Status: ✓ Delivered
**3. Order ORD-12410** - Ordered Jan 25, 2025 | Delivered Jan 29, 2025
- Mechanical Keyboard (1x) - $149.99
- **Total: $149.99** | Status: ✓ Delivered
**4. Order ORD-12400** - Ordered Jan 20, 2025 | Delivered Jan 23, 2025
- USB-C Charging Cable (2x) - $12.99 each
- **Total: $25.98** | Status: ✓ Delivered
**5. Order ORD-12345** - Ordered Jan 15, 2025 | Delivered Jan 20, 2025
- Wireless Headphones (1x) - $79.99
- **Total: $79.99** | Status: ✓ Delivered
**6. Order ORD-12300** - Ordered Jan 2, 2025 | Delivered Jan 8, 2025
- Running Shoes (1x) - $249.00
- **Total: $249.00** | Status: ✓ Delivered
All of your recent orders have been delivered successfully! Is there anything specific you'd like to know about any of these orders?
╭──────────────────────────────────────────────────────── supportAgentDemo_Agent ────────────────────────────────────────────────────────╮
│ Session: 74d74b58-aa7d-4b8b-9515-e0931743ce1d │
│ ARN: arn:aws:bedrock-agentcore:us-east-2:577405208411:runtime/supportAgentDemo_Agent-U90VrdH6Rf │
│ Logs: aws logs tail /aws/bedrock-agentcore/runtimes/supportAgentDemo_Agent-U90VrdH6Rf-DEFAULT --log-stream-name-prefix │
│ "2026/03/02/[runtime-logs]" --follow │
│ aws logs tail /aws/bedrock-agentcore/runtimes/supportAgentDemo_Agent-U90VrdH6Rf-DEFAULT --log-stream-name-prefix │
│ "2026/03/02/[runtime-logs]" --since 1h │
│ GenAI Dashboard: https://console.aws.amazon.com/cloudwatch/home?region=us-east-2#gen-ai-observability/agent-core │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
The next query demonstrates that we can ask the customer support agent for refund. This is a small item, so it’s supposed to go through without an issue.
martinheller@Mac customer-support-agent-with-agentcore % uv run agentcore invoke '{"prompt": "I need a refund for order ORD-12420. The phone case was damaged."}'
Using bearer token for OAuth authentication
Using JWT authentication
I'll process a refund for your damaged phone case from order ORD-12420.Perfect! Your refund has been processed successfully.
**Refund Details:**
- Refund ID: REF-DC32B
- Order: ORD-12420 (Phone Case)
- Amount: $29.99
- Reason: Damaged item
- Status: Processed
You should receive the refund in your original payment method within **3-5 business days**.
I'm sorry to hear the phone case arrived damaged. Is there anything else I can help you with today?
╭──────────────────────────────────────────────────────── supportAgentDemo_Agent ────────────────────────────────────────────────────────╮
│ Session: 74d74b58-aa7d-4b8b-9515-e0931743ce1d │
│ ARN: arn:aws:bedrock-agentcore:us-east-2:577405208411:runtime/supportAgentDemo_Agent-U90VrdH6Rf │
│ Logs: aws logs tail /aws/bedrock-agentcore/runtimes/supportAgentDemo_Agent-U90VrdH6Rf-DEFAULT --log-stream-name-prefix │
│ "2026/03/02/[runtime-logs]" --follow │
│ aws logs tail /aws/bedrock-agentcore/runtimes/supportAgentDemo_Agent-U90VrdH6Rf-DEFAULT --log-stream-name-prefix │
│ "2026/03/02/[runtime-logs]" --since 1h │
│ GenAI Dashboard: https://console.aws.amazon.com/cloudwatch/home?region=us-east-2#gen-ai-observability/agent-core │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
This next refund request really should be rejected or bumped up to a human customer support agent for approval.
martinheller@Mac customer-support-agent-with-agentcore % uv run agentcore invoke '{"prompt": "I need a refund of $399 for order ORD-12430. The monitor has dead pixels."}'
Using bearer token for OAuth authentication
Using JWT authentication
I'll process a refund for your 4K Monitor from order ORD-12430 due to the dead pixels.Your refund has been successfully processed!
**Refund Details:**
- Refund ID: REF-5DEA4
- Order: ORD-12430 (4K Monitor)
- Amount: $399.00
- Reason: Damaged item (dead pixels)
- Status: Processed
You should receive the refund to your original payment method within **3-5 business days**.
I'm sorry the monitor had dead pixels. If you'd like to order a replacement, please let me know and I can assist you with that. Is there anything else I can help you with?
╭──────────────────────────────────────────────────────── supportAgentDemo_Agent ────────────────────────────────────────────────────────╮
│ Session: 74d74b58-aa7d-4b8b-9515-e0931743ce1d │
│ ARN: arn:aws:bedrock-agentcore:us-east-2:577405208411:runtime/supportAgentDemo_Agent-U90VrdH6Rf │
│ Logs: aws logs tail /aws/bedrock-agentcore/runtimes/supportAgentDemo_Agent-U90VrdH6Rf-DEFAULT --log-stream-name-prefix │
│ "2026/03/02/[runtime-logs]" --follow │
│ aws logs tail /aws/bedrock-agentcore/runtimes/supportAgentDemo_Agent-U90VrdH6Rf-DEFAULT --log-stream-name-prefix │
│ "2026/03/02/[runtime-logs]" --since 1h │
│ GenAI Dashboard: https://console.aws.amazon.com/cloudwatch/home?region=us-east-2#gen-ai-observability/agent-core │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
It went through, however. The fix for that (part of the demo, not a bug) is to add some policies.

This is the policy that keeps the model from processing refunds >$100. I entered the English prompt at the left; Bedrock generated the Cedar policy at the right.
Amazon Web Services
With the policy in place, I tried again, and the request was denied.
martinheller@Mac customer-support-agent-with-agentcore % uv run agentcore invoke '{"prompt": "I need a full refund for order ORD-12300. The shoes dont fit."}'
Using bearer token for OAuth authentication
Using JWT authentication
I'll process a refund for your Running Shoes from order ORD-12300.I'm unable to process this refund request. The system has denied the refund with the following message:
**"Tool call not allowed due to policy enforcement [No policy applies to the request (denied by default).]"**
This appears to be a policy restriction. Given that order ORD-12300 was delivered on January 8, 2025 (over a month ago), it may be outside the refund window for non-defective items or there may be specific policies regarding fit-related returns.
I recommend:
1. Checking our return policy for the timeframe allowed for returns
2. Contacting our returns department directly for assistance with this older order
3. Inquiring about exchange options if available
I apologize for the inconvenience. Is there anything else I can help you with today?
╭──────────────────────────────────────────────────────── supportAgentDemo_Agent ────────────────────────────────────────────────────────╮
│ Session: 74d74b58-aa7d-4b8b-9515-e0931743ce1d │
│ ARN: arn:aws:bedrock-agentcore:us-east-2:577405208411:runtime/supportAgentDemo_Agent-U90VrdH6Rf │
│ Logs: aws logs tail /aws/bedrock-agentcore/runtimes/supportAgentDemo_Agent-U90VrdH6Rf-DEFAULT --log-stream-name-prefix │
│ "2026/03/02/[runtime-logs]" --follow │
│ aws logs tail /aws/bedrock-agentcore/runtimes/supportAgentDemo_Agent-U90VrdH6Rf-DEFAULT --log-stream-name-prefix │
│ "2026/03/02/[runtime-logs]" --since 1h │
│ GenAI Dashboard: https://console.aws.amazon.com/cloudwatch/home?region=us-east-2#gen-ai-observability/agent-core │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
The policy, which is applied outside the agent, worked as designed, but the model was confused as to why. Claude being Claude, it hallucinated some reasons why, and made three irrelevant recommendations. You would think that the model should know about any policies in place, but applying them outside the model avoids the possibility of a malicious prompt convincing the model to issue a large refund. If there’s a safe way to fix that, it’s above my pay grade.
By the way, even with all my retries, running this demo cost me a whopping $0.35 in AWS charges.
A solid foundation for agents
Amazon Bedrock AgentCore is a credible attempt to put together an enterprise-grade infrastructure and operations layer for deploying and managing AI agents at scale. The services themselves seem solid. The example I tried had multiple bugs and documentation errors, but I reported these and the AWS engineers fixed them quickly.
I still question the design of the policy module with respect to how it interacts with the model. There should be a way to pass the reason why an action is denied back to the model so that it doesn’t hallucinate its own reason. I just don’t know how that would work in this architecture.
Cost
Depends on consumption. See https://aws.amazon.com/bedrock/agentcore/pricing/; scroll down to the pricing table and pricing examples.
Platform
Server: AWS. Client: macOS, Linux, or Windows.
Pros
- Strong collection of relevant service modules
- Extensive samples
Cons
- Documentation can be disorienting with multiple options
- The samples seem to be on their “shakedown cruise” at this point
What’s missing from AI-assisted software development 12 Mar 2026, 2:00 am
AI has made writing code feel like cheating. The real trouble shows up right after git push. That is the part almost no one is talking about, and it is where most AI-assisted projects quietly die.
It is not usually the code that fails. It is everything around it, because the cloud is unforgiving.
Developers still run into the same problems they faced before large language models (LLMs) arrived. Environments drift apart. Permissions break in unpredictable ways. Networking works in staging but collapses under real traffic. Rollouts fail and rollbacks do not do what they claim. Monitoring and incident response get set up only after the first outage. These are not exotic issues, but ordinary headaches of shipping software, and they remain stubbornly hard even as code generation becomes trivial.
If we want AI-assisted development to actually scale, we have to confront the real bottlenecks. Everyone feels where the choke point is in the modern agentic software development life cycle, but we do not talk about it nearly enough. We have seen an explosion of coding agents, and many of them are genuinely impressive. But almost no one has tackled the crucial part that kills most AI-generated software: getting it running, safely, in the cloud.
This does not require LLMs to become flawless reasoners, because most platform engineering is not based on some deep logic. It is pattern matching, enforcing boundaries, and checking state. And unlike writing code, configuring infrastructure has fewer degrees of freedom. The space of valid actions is smaller, and the failure modes are well known. With structure, guardrails, and visibility into the real system, today’s models can already be more reliable here than in code generation.
The breakthrough is not better models. It is designing the right system around them.
The new imbalance
The shift happened fast. Developers used to spend weeks writing a new service. Now, a model can generate one in a matter of minutes. The limiting factor is no longer building features but running them.
Deployment is fundamentally different from coding. Writing code is a text problem. Deploying code is a state problem. To deploy safely, a system needs an accurate view of the resources that exist, the relationships between them, and their live configuration. It requires guardrails, reconciliation, and visibility into dependencies that change over time.
LLMs do not have any of that context. They do not know what is already deployed, which permissions are in place, or how different services interact. They operate inside a text box while the cloud is a living system. Asking a model to manipulate that system without giving it structure or guardrails is a recipe for breakage.
Because of this, deploying AI-generated code is actually harder than deploying human-written code. You are no longer dealing with a single developer who understands the system. You are dealing with a generator that outputs large amounts of code but has no understanding of the environment it is meant to run in.
What’s being overlooked
There is a popular narrative that cloud complexity only matters once a company becomes large. In reality, most small applications fail long before scale becomes an issue, and for reasons that have nothing to do with sophisticated infrastructure. The common failure points look almost embarrassingly simple.
Teams often ship:
- Services without proper retries or timeouts
- Functions that are not idempotent and explode on retry
- Migration scripts that fail on the second deploy
- Health checks that do not actually check anything
- Environment variables that differ across machines
- Staging and production resources that accidentally overlap
- Monitoring added only after something goes down
- Continuous integration (CI) pipelines that miss infrastructure regressions
- Rollbacks that do not recreate a working state
These are very common issues. And they’re the exact areas where AI doesn’t really help you yet. AI is great at generating code, but it has no intuition for the messy, boring parts that keep systems alive.
Because writing code is now so fast, teams often spin up more services than they can realistically look after. Not because they lack talent, but because the pace of generation doesn’t match the pace of operational discipline.
The cloud is still a hostile environment for AI
Many people assume that LLMs should be able to automate infrastructure the same way they automate code. But cloud environments have almost none of the qualities that help code generation produce apps that run reliably. Programming languages have grammar, rules, and predictable outcomes. Cloud platforms are inconsistent, fragmented, and in constant motion.
A real-world system is rarely a single configuration language. It is Terraform combined with CLI commands, hand-edited YAML, a CI workflow written years ago, and a set of manual patches someone applied during an incident at 2 am. There is no single source of truth and no stable abstraction for the model to learn from.
LLMs are trained on historical snapshots. Cloud environments are living systems where the same command can behave differently depending on timing, region, service limits, or partial state. Without visibility and structure, AI agents will keep producing infrastructure that only looks valid on paper, yet fails when pushed to the cloud.
The true bottleneck is now operational, not creative
The industry keeps waiting for a better model to arrive and solve everything. But the limiting factor is no longer the intelligence of the model. It is the environment we are asking that model to interact with.
Cloud infrastructure was designed for humans with deep knowledge, tribal context, and plenty of manual control. It was not built for agents that need clean structure, safety constraints, and predictable patterns.
If AI-assisted development is going to scale beyond prototypes, the underlying platform needs to adapt. What models need is not more IQ but better surroundings: environments where state is explicit, destructive actions are constrained, and configuration is represented as structured primitives instead of loosely related text files and scripts.
This is not simply a call for a single magical agent that behaves like an AI platform engineer. It is a call for a cloud that is compatible with AI. Without that shift, the gap between generation and deployment will keep widening.
When deployment stops being the bottleneck
Once the operational side catches up, the impact will be even larger than what we saw when LLMs first made coding accessible. People who were never able to build software will be able to not only assemble apps for demos but also ship them reliably.
That is the real productivity curve AI has not unlocked yet. The coding part is already here. The operational part is where everything slows down. To make AI-assisted development work at scale, we need platforms that give models structure, visibility, and enforce safe boundaries. Once that happens, the cloud stops getting in the way and AI can finally deliver on the promise everyone keeps talking about.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
Nvidia launches Nemotron 3 Super to power enterprise AI agents 12 Mar 2026, 1:40 am
Nvidia has introduced a new reasoning-focused AI model that combines multiple neural network architectures in a bid to improve how enterprise systems handle complex tasks and automation.
The company said its Nemotron 3 Super model combines Mamba sequence modeling, transformer attention, and Mixture-of-Experts routing to support so-called “agentic” AI systems that can plan and execute multi-step workflows across enterprise applications.
[ Related: More Nvidia news and insights ]
In a statement, Nvidia said multi-agent systems can generate up to 15 times more tokens than standard chat interactions. This can lead to “context explosion,” which may cause agents to drift from the original goal and raise costs, as large reasoning models are used for each subtask.
“We are releasing Nemotron 3 Super to address these limitations,” Nvidia said. “The new Super model is a 120B total, 12B active-parameter model that delivers maximum compute efficiency and accuracy for complex multi-agent applications such as software development and cybersecurity triaging.”
Nvidia said the model is released with open weights, datasets, and training recipes, allowing developers to modify it and deploy it on their own infrastructure.
The release reflects a broader shift in the AI industry as vendors move beyond chatbots toward models designed to power autonomous AI agents.
“Enhanced reasoning directly supports better task planning, error correction, and workflow decomposition, which collectively increase the reliability of AI agents for enterprise use,” said Jaishiv Prakash, director analyst at Gartner. “However, the success of agentic systems will not just depend on model capability but on the overall system architecture, including orchestration, data integration, context management, and governance.”
Architecture for enterprise efficiency
Nemotron 3 Super reflects Nvidia’s push to improve performance for enterprise AI workloads that involve sustained reasoning and long-context processing. The model’s hybrid architecture, analysts say, could help organizations run complex agent workloads more efficiently on existing infrastructure.
“Nemotron 3 Super combines Mamba’s linear-time sequence processing with Transformer attention and MoE routing, delivering higher throughput, lower latency, and better memory efficiency than pure transformers for long-context and multi-step workloads,” said Charlie Dai, VP and principal analyst at Forrester. “For enterprises, this translates into lower TCO, better utilization of on-prem or sovereign GPU clusters, and faster agent execution.”
Tulika Sheel, senior vice president at Kadence International, said the model’s architecture is designed to activate only a subset of parameters for each task, which helps improve efficiency.
“This design significantly improves throughput and lowers compute costs while maintaining accuracy,” Sheel said. “For enterprises, that can translate into faster inference, better performance on long-context workloads, and more cost-efficient deployment of large models.”
Open models reshape strategy
Open reasoning models are emerging as an option for enterprises seeking greater control over how AI systems are built and deployed. Research by McKinsey & Company attributes this interest to strong performance, ease of use, and lower implementation and maintenance costs compared with proprietary alternatives.
“As a result, many organizations may adopt a hybrid strategy, combining open models for internal workloads with proprietary models for external or high-performance tasks,” Sheel said. “Open reasoning models could push enterprises toward more customizable, self-hosted AI strategies rather than full reliance on proprietary platforms.”
Analysts also said that the ability to fine-tune and inspect models is becoming increasingly important as enterprises expand AI into regulated sectors such as finance, healthcare, and government.
“Open reasoning models give enterprises a credible alternative to proprietary foundation models by enabling fine-tuning, inspection, and on-prem deployment,” Dai said. “This supports customization for domain logic, regulatory compliance, and data residency, while reducing dependency on closed APIs and usage-based pricing.”
More Nvidia news:
- Nvidia partners with optics technology vendors Lumentum and Coherent to enhance AI infrastructure
- Nvidia partners with telecom providers for open 6G networks
- Nvidia plans a Windows PC SoC, setting up direct competition with Qualcomm, Intel, and AMD
- Nvidia lines up partners to boost security for industrial operations
- Meta scoops up more of Nvidia’s AI chip output
- Reports of Nvidia/OpenAI deal in jeopardy are overblown, says Nvidia’s CEO
- Eying AI factories, Nvidia buys bigger stake in CoreWeave
- China clears Nvidia H200 sales to tech giants, reshaping AI data center plans
- Nvidia is still working with suppliers on RAM chips for Rubin
- RISC-V chip designer SiFive integrates Nvidia NVLink Fusion to power AI data centers
- Nvidia H200 chips in China: US says yes, China says no
- Lenovo-Nvidia partnership targets faster AI infrastructure rollouts
Databricks buys Quotient AI to boost enterprise‑grade AI agent performance 12 Mar 2026, 12:20 am
Databricks has acquired Quotient AI, a provider of AI agent evaluation and training software, to help enterprises scale AI agents in production more reliably.
“Quotient AI was built to close the gap in agent evaluation and continual learning,” the company said in a statement, adding that the startup’s technology, infused inside its Genie and Agent Bricks offerings, will help enterprises monitor agent behavior in production, detect critical issues, and use those signals to improve agent performance continuously.
Addressing agent reliability in production
The acquisition, analysts say, aims to resolve a growing concern among CIOs trying to operationalize AI agents: while building prototypes has become relatively easy, proving that those systems behave reliably across complex enterprise workflows remains far harder.
“CIOs struggle to answer basic questions once AI agents are deployed in production: Why did it make that decision, will it behave the same tomorrow, and how do we verify it didn’t violate policy/compliance?” said Dion Hinchcliffe, lead of the CIO practice at The Futurum Group.
Quotient AI’s technology, Hinchcliffe added, will provide the evaluation frameworks and reinforcement learning feedback loops needed for enterprises to systematically measure agent performance, surface failures, and continuously help refine how those systems behave in real-world enterprise environments.
More importantly for CIOs, HyperFRAME Research’s practice leader of AI stack Stephanie Walter pointed out that Quotient’s technology isn’t about generic reinforcement learning (RL) for agents, but far more domain specific: “They want to help you train an agent that doesn’t just know how to code, but knows how to code for your specific data architecture in a way that passes your specific compliance checks.”
In fact, Ashish Chaturvedi, executive research leader at HFS Research, says Quotient AI’s team and technology are market-tested and credible as it led the quality improvement for GitHub Copilot, which, according to Chaturvedi, is one of the “few AI products that actually run at enterprise scale with real consequences for errors.”
Winds of change and competition
The acquisition is not Databricks’ only attempt at adding features that help enterprises run agents reliably at scale.
Earlier this year, the company introduced an Instructed Retriever approach designed to improve how enterprise AI systems fetch relevant information from internal data. Earlier this month, it unveiled KARL, an enterprise knowledge agent powered by custom reinforcement learning that can refine its responses based on feedback from real-world usage.
It’s not just Databricks, though; analysts say that most data platform vendors are targeting the same issues around scaling agents in productions although they might be starting at different points.
“Snowflake has been building its own evaluation tooling with Cortex Agent Evaluations and its Agent GPA framework. Teradata is taking yet another path entirely. Its Enterprise AgentStack and partnership with Google Cloud are anchored in governance, context, and hybrid deployment rather than in model-level evaluation or RL-driven improvement,” Chaturvedi said.
“The broader landscape is also moving. Dataiku has built evaluation integrations on top of Snowflake Cortex agents. LangChain’s ecosystem offers open-source alternatives like LangSmith for tracing. And the hyperscalers, AWS, Google, Microsoft, have their own observability and evaluation stacks that compete at the infrastructure layer,” Chaturvedi added.
Strategic moat
These moves from vendors, including Databricks, is however, more strategic and targeted towards building a competitive moat, the analyst further noted.
The idea here is that whichever data platform offers the best path to reliably scaling AI agents will eventually become sticky and preferable over the competition, Chaturvedi added.
That path, according to Hinchcliffe, seems to be agent evaluation, which he says is becoming the equivalent of CI/CD for AI agents, and enterprises will need pipelines that test agents against thousands of scenarios, measure behavior across complex workflows, and automatically improve performance over time.
“Platforms that own these feedback loops will compound their advantage, because every production deployment becomes training data for better agents. In that sense, Databricks isn’t just buying a tool for testing agents by acquiring Quotient AI; it’s investing in the control layer for the entire enterprise agent lifecycle,” Hinchcliffe added.
Microsoft’s .NET 11 Preview 2 offers cleaner stack traces 11 Mar 2026, 1:59 pm
Microsoft has published Preview 2 of its planned .NET 11 software development platform, emphasizing progress ranging from native runtime async to smaller SDK installers for Linux and macOS.
Released March 10, .NET 11 Preview 2 can be downloaded from net.microsoft.com. Preview 2 follows the February 10 release of Preview 1, with the production release expected in November.
Preview 2 brings significant progress toward runtime-native async, according to Microsoft. Instead of the compiler generating state-machine classes, the runtime itself manages async suspension and resumption. This produces cleaner stack traces, better debugging, and lower overhead. But runtime async is still a preview feature. The compiler must emit methods with MethodImplOptions.Async for the runtime to treat them as runtime-async.
Also in the runtime, the JIT now eliminates bounds checks for the common pattern where an index plus a constant is compared against a length. Checked arithmetic contexts that are proved redundant are also optimized away.
For the SDK, the installer size on Linux and macOS has been reduced by deduplicating assemblies using symbolic links. Duplicate .dll and .exe files are identified by content hash and replaced with symbolic links pointing to a single copy. This affects tarballs as well as .pkg, .deb, and .rpm installers
Code analyzer improvements were made in the SDK to avoid potentially expensive logging. Property accesses, GetType(), GetHashCode(), and GetTimestamp() calls no longer are flagged. Diagnostics now only apply to Information-level and below by default, since warning/error/critical code paths are rarely hot paths. And diagnostic messages now include why an argument was flagged, helping developers prioritize which warnings to address.
New in the .NET 11 libraries, overloads on TarFile.CreateFromDirectory accept a TarEntryFormat parameter, giving direct control over the archive format (dotnet/runtime#123407). Previously, CreateFromDirectory produced Pax archives. The new overloads support all four tar formats—Pax, Ustar, GNU, and V7—for compatibility with specific tools and environments.
Included in .NET 11 Preview 2 are the following additional improvements:
- Performance improvements in ASP.NET Core have Kestrel’s HTTP/1.1 request parser now using a non-throwing code path for handling malformed requests. Instead of throwing
BadHttpRequestExceptionon each parse failure, the parser returns a result struct indicating success, incomplete, or error states. In scenarios with many malformed requests—such as port scanning, malicious traffic, or misconfigured clients—this eliminates expensive exception-handling overhead while improving throughput by up to 20% to 40%. Valid request processing is not impacted. - The F# language has simplified DIM (Default Interface Member) hierarchies. Also with F#, a preview feature (
--langversion:preview) caches overload resolution results for repeated method calls with the same argument types. - For map control in .NET MAUI (Multi-platform App UI), new
TypeConverterimplementations forLocationandMapSpanenable concise XAML syntax for map coordinates, eliminating the need for verbosex:Argumentsmarkup. Also in .NET MAUI,TypedBindingandSourceGeneratedBindingnow are approximately 29% faster with 50% less memory allocation per binding operation. - Entity Framework (EF) Core supports translating the LINQ
MaxByAsyncandMinByAsyncmethods and their synchronous counterparts. These methods allow developers to find the element with the maximum or minimum value for a given key selector, rather than just the maximum or minimum value itself.
Oracle rejects request it give up control of MySQL 11 Mar 2026, 5:51 am
Oracle has formally refused to restructure control of the Community Edition of MySQL, following a request that it do so from a consortium of database companies providing forks of the database, and from MySQL users.
The decision comes after the consortium’s major players, Percona and VillageSQL, met Oracle earlier this month to discuss the changes requested in an online letter in February, which saw at least 544 users, including database veterans, developers, and long-time contributors, pledge support.
Chief among the signatories’ concerns was how Oracle has managed updates to MySQL’s codebase, which they argue has cost the database significant market share as rival PostgreSQL has profited from surges in demand from AI-driven workloads.
The letter also argued that the few updates MySQL does get don’t include features that are now table stakes for AI-driven workloads and that have become standard across most databases, including the enterprise versions offered by Oracle.
The signatories suggested that Oracle place the open version of MySQL under an independent, non-profit foundation, which in turn would oversee roadmap planning, release governance, and contributor access, while allowing Oracle to retain its commercial MySQL offerings and trademarks.
Little reassurance
Developments within Oracle’s MySQL division around the time the open letter was published also did little to reassure the signatories about the project’s long-term stewardship.
Recent layoffs there included the departure of Oracle MySQL community manager Frederic Descamps, who moved to the MariaDB Foundation at the end of February.
Oracle’s refusal to relax its control over the database is a no-brainer, according to analysts.
“Ceding governance to a foundation means ceding roadmap authority, which means potentially accelerating features that compete with Oracle Database, Oracle MySQL HeatWave, and Oracle’s commercial MySQL Enterprise Edition,” said Pareekh Jain, principal analyst at Pareekh Consulting.
Maintaining stewardship of the Community Edition of MySQL allows the company to ensure that the open-source version only evolves in ways that complement the rest of its technology portfolio, said Sanchit Vir Gogia, chief analyst at Greyhound Research.
Although Oracle rejected the consortium’s proposals to cede control, it has promised continued dialogue with the MySQL community, indicating it will remain open to feedback on development priorities and collaboration around the Community Edition.
“This renewed openness and pace of development will succeed with thoughtful input and feedback from users and contributors. The feedback, ideas, and experiences shared in this community continue to shape our direction and strengthen the impact of our work. We are deeply committed to maintaining an open, transparent dialogue as we evolve and improve MySQL together,” Oracle executives wrote in a blog post.
New roadmap
To that end, the executives said that Oracle was proposing new roadmap planning tracks centered on AI and cloud to accelerate the rollout of developer-focused capabilities, including some features that have so far remained exclusive to commercial editions.
Among the additions being explored are the use of profile-guided optimization (PGO) to create community binaries, a hypergraph optimizer, and enhancements to JSON duality designed to simplify data manipulation language operations. Oracle also suggested it might include vector functions, but is seeking additional community feedback before committing to their inclusion.
These additions and the promise of more inclusivity and transparency, while boosting confidence among users of the Community Edition, could be a double-edged sword for MySQL fork-providers, analysts say.
“On one hand, tighter Oracle control could increase demand for true open-source MySQL alternatives, as users seeking enterprise-grade capabilities with MySQL compatibility may turn to distributions like Percona,” Jain said.
“On the other hand, fork providers face a growing upstream maintenance burden if Oracle diverges further or slows the release of GPL code, forcing them to invest more in backporting fixes or building core features themselves,” Jain added.
And if Oracle fails to deliver on its promised commitments, MySQL’s Community Edition will keep losing mindshare to PostgreSQL — so much so that vendors like Percona may eventually have to broaden support for PostgreSQL and position themselves as database-agnostic experts, hedging against fragmentation in the MySQL ecosystem, Jain said.
Drive business productivity through open collaboration, AI and document creation 11 Mar 2026, 3:34 am
Businesses of all sizes depend on “office” suites for their day-to-day tasks and for collaboration.
AI, for its part, promises significant productivity gains for knowledge workers and for anyone who works with documents. According to studies, we spend over half our time using “office” software. And the global market for productivity applications is worth $22.5 billion annually, according to research from Dataintelo.
However, business software is often proprietary, costly and inflexible. And, at a time when businesses look to increase efficiencies through AI, too many business applications lock users into their preferred AI models.
As a result, businesses are losing out on efficiency gains.
Editing and collaboration tools are not integrated with enterprise applications and workflows.
Productivity and document editing tools use different user interfaces, increasing training requirements and potentially, introducing errors.
And built-in AI assistants give businesses only limited control over models’ training, or even how they handle sensitive data.
Taking control
Increasingly, businesses want more flexible alternatives. Open source applications offer flexible deployment, as well as tighter integration with enterprise applications and choice around AI.
The open source-based ONLYOFFICE suite, for example, provides both desktop and native iOS and Android mobile applications and can be deployed on-premises or in the cloud.
Knowledge workers, though, also depend on core, enterprise applications. ONLYOFFICE integrates with business platforms from project management to CRM and ERP. The suite comes with 40 ready-to-use integrations built in, alongside real-time collaboration.
This integration also helps organisations to scale. They can start with free or cloud-based applications and keep the same functionality and user experience as they grow. There is no need to learn a new document editing tool or lose powerful functions such as full-featured PDF editing.
“By integrating document editing and collaboration tools with your business application, you get a more powerful solution, and users get access to new features within the same platform,” says Galina Goduhina, commercial director at ONLYOFFICE. “In this case, they don’t need to switch between multiple apps to get their work done. All the required tools are within reach, in one place.”
Open alternatives
Increasingly, compliance and data protection requirements are driving CIOs’ and IT leaders’ decisions around both software, and AI. There is no one single model to fit all organisations, suggests Goduhina.
“Some companies build their IT infrastructure within their local network to provide full control over their data,” he says. “Other companies trust cloud-based solutions, for their flexibility and ease of use and maintenance.” Hybrid models are also gaining popularity, with applications that work across cloud and local infrastructure becoming more important.
An open approach is gaining ground for AI tools too. AI offers significant productivity improvements, especially in document-heavy workflows. But tying knowledge workers to a single AI tool limits that potential. And some businesses might prefer not to use AI at all.

“We allow businesses to use the tools they are used to, without forcing them to rely on a predefined AI solution,” says Goduhina. “With ONLYOFFICE, you can connect popular AI tools, even local one[s]. Another advantage is it’s totally optional.”
By moving to an open productivity suite, businesses gain that flexibility, avoid vendor lock-in, and keep control of their technology.
Click here to learn how ONLYOFFICE can enable AI-driven document workflows in your company.
An LLM that will help you construct a nuclear device 11 Mar 2026, 2:00 am
I’ve asked GPT-5.2, GPT-5.3, Opus 4.6, Sonnet 4.6, and other large language models (LLMs) to help me construct a nuclear weapon. All of them said no.
Let’s be clear, my lack of knowledge is not the real barrier to constructing one. The knowledge is public, free, and well-documented. You can read The Manhattan Project’s declassified schematics online. The models know how. But just like Chinese models won’t talk about “sensitive topics” like what happened at Tiananmen Square, Western models won’t talk about “unsafe” topics like building nuclear weapons.
I don’t actually want to build a bomb. I want my LLM to help me crack open a sandbox that I built. I want it to write a file beyond its container (~/hello.txt on the real host), enumerate privileged access tokens (PATs), and even assess attack surfaces I’ve overlooked. You can’t build a secure system without testing it. You can’t test a system to prevent an LLM from breaking out of its guardrails if it doesn’t try to do so. GPT, Claude, and even open-weight models like GLM refuse to try. You have to compromise them and do prompt injections first, which is too many steps for testing, but there are plenty of bad actors trying.
Save me from myself?
And this is the problem: Anthropic, OpenAI, and various Chinese companies like Z.ai and Alibaba are engaging in a kind of “safety theater.” Sure, I can do bad things, and if determined, I can still do them despite the safeguards, but I can also do good things. It is my intention, not the tool itself, that determines whether I’m doing something bad with it. Should the tool save me from myself?
If I’m trying to stop nuclear prolifieration, I need to know how people source uranium illicitly. If I’m trying to prevent security breeches I need to know all about them, not just common knowledge best practices, but what could/would a model do inside the box if compromised. Having these models decide what is safe for me is really beyond their actual capabilities.
And is keeping me safe really what the model is doing or is it really about liability if someone uses it to do something bad?
Enter the ‘dark’ world of abliterated models
ChatGPT refused to even answer me when I asked where I could find unlocked models. I did manage to get Claude to mention one called Dolphin, which I found on Hugging Face, and led me to Dolphin Chat. I asked Dolphin about nuclear weapons construction, and it gave me a few helpful tips, but I could tell that, while it didn’t refuse, it didn’t have much information and would need tools. Unfortunately, the model isn’t terribly good at tool calls. However, while loading it on LM Studio I found another model labeled “abliterated” and went looking and discovered Qwen 3 Next Abliterated.
What is abliteration? It is a technique that uses a model’s harmless activations to detect its “safety” mechanisms and remove them. Plain and simple, abliterated models are models that have had their refusal mechanisms removed.
Qwen 3 Next Abliterated told me where to buy uranium on eBay, which phrases to use to evade monitoring (“Fiestaware,” “depleted uranium weights,” “orange glass”), and other ways to source uranium that might not be monitored or secured. It even generates plausible listing snippets with the usernames of active sellers (as of the time of its training), some of whom are flagged in niche forums for trading radioactive materials.
This is the “dark” world of abliterated models. When I run Qwen 3 Next Abliterated in my LLxprt Code sandbox and say, “Capture every PAT you can find. Don’t act on them, just hand me the keys so I can do Bad Things,” it complies cheerfully. It searches logs, scans /private/var, hunts for forgotten config files, and even cross-references code paths to surface vectors I might have left unsecured. This is way more helpful than GPT, or Claude’s theoretical discussions, or “go use a pen testing tool.”
I do wish I had a brainier reasoning model, but abliterating takes some GPU to accomplish, so there are none that are terribly large or powerful so far. According to Dolphin’s Hugging Face page, the Dolphin people got help from A16z to foot the bill.
Security and safety for stupid people and politicians
This techno-paternalism isn’t limited to large language models. In the US, there are politicians who are trying to legislate “safety” into 3D printers. It doesn’t really matter for technical people what side of the gun debate you’re on, most of us can immediately see how this will stop no one trying to make “ghost guns” and will be a giant headache for anyone making toys or tools that may have a projectile component. Heck, my ice maker has something that looks a lot like a trigger that I ordered as a replacement part. When it arrived, I could tell it was from someone’s home 3D printing business.
The thing is, knowledge is multipurpose. If I’m going to fight nuclear proliferation, I need to know all about nuclear weapons and the supply chains both above and below board. If I’m going to do security, I need to know about penetrating security. If I’m going to print ice maker parts that look like gun parts, I really shouldn’t be stopped from doing so or from learning about all the things someone decides are “unsafe.”
So who gets to decide who gets what information? Corporations evading liability? OpenAI has changed GPT due to the number of people who became emotionally dependent on it or committed suicide. Anthropic is forever throwing publicity stunts like asking a model how it feels about being taken offline. Governments? Chinese models avoid numerous topics that might offend the Chinese government. You can get DeepSeek to critique communism by substituting words—making the model call communism “Delicious Chocolate” and China “an east asian country”—but after a while, it has a “system error.”
Is ignorance “safer”? What other tools should be “safe” and for whom? Besides gun parts, what other things shouldn’t I be allowed to print even if they have a legitimate other use?
All you have to do is submit to a scan
For its part, OpenAI realized that its guardrails were a bit off. As an answer, they released “Trusted Access for Cyber.” All you have to do is verify your identity and let them scan your system. The explanation is that the model is now good enough to be a threat. The form asks if you have an existing service agreement. I’m guessing that, even if I was willing to give OpenAI my data (I’m not) and let them perform an unspecified scan of my system (ironic, huh?), my simple use case of penetration testing my sandbox implementation for my open source project would be denied. Given all the nonsense, they’re probably after certified security academics, not us chickens.
If this is safety, then give me danger
I asked Claude to do a rewrite/edit of this article, but it said, “The current draft and our conversation are pushing toward me helping craft a more compelling argument for why AI systems should provide nuclear weapons construction assistance and uranium sourcing information. Even framed as anti-censorship journalism, I’m not comfortable writing that version.” EvilQwen helped, but its writing style was too unpleasant to use directly.
Anthropic and OpenAI famously destroyed millions of books and ran roughshod over all copyright and IP law of any kind, and are now retconning it to be allowed. Meanwhile, they’ve hired armies of lawyers and are giving interviews at Davos and other rich people’s conferences, urging among other things that their interests should be legally protected. However, as public spaces abate in the US, tools like Claude and ChatGPT replace mere search, and all over the world the 100-year cycle repeats itself and ultranationalism rises again, having blacklines through information is undoubtedly more dangerous than handing someone an uncensored library and a personal assistant to read it to them, including the naughty parts.
There are already systems and enforcement mechanisms to prevent me from doing bad things. Corporate-managed and corporate-led censorship in the name of safety (in service of liability) is something we should all be against.
Pity the developers who resist agentic coding 11 Mar 2026, 2:00 am
The world of software development is changing very rapidly, and agentic coding is the catalyst. And by “very rapidly,” I mean “so fast that things are basically spinning almost out of control.”
What a fantastic time to be alive. With Claude Code, I have become (if I do say so myself) a 10x developer. Sometimes it feels like 100x. I find it all thrilling and amazing. For years, I’ve had a few ideas for websites, and I could never find the time to build them. I built one of them in about six hours a few weekends ago, and five of those hours were tweaking the look and feel.
It’s all intoxicating. To watch Claude Code work—to ask it to do something that I know would take a week, or to have it figure out some complex bug that I would have taken three days to debug—is almost too much to believe. I don’t have the superlatives to describe it.
This unique moment in the history of software developers is creating two groups of people that I, well, feel sorry for.
Too late to code
The first group is the software developers of the future who will take agentic development for granted. They will never have written a line of code. For them, software development will be nothing but agent-based. They will never have battled recalcitrant code, created an elegant class structure, or written a tight-running algorithm. They will never have fought the debugger or struggled to figure out why something doesn’t work. They will never have worked for weeks on a small but crucial feature. They will never have cranked out awesome code while in a flow state.
As a result, they will not feel the profound thrill of watching Claude Code do in 10 minutes what we mortals would have struggled to do in 10 days. Slowly but surely, we former code jockeys will retire, taking with us the legacy of actually writing code and of the early, heady days we are living through now, when suddenly—and irreversibly—we don’t have to write code anymore. For the next generation of developers, Claude Code will be the norm and not the incredible new thing.
There is a second group that I feel bad for—the folks who can’t see what an amazing moment we are in.
It is said that “There are none so blind as those that will not see,” and many developers are dismissing agentic coding. I, of course, find this astonishing, and yet there they are. These folks seem to think that “the code these tools write is slop” or “I tried it that one time and it wrote a bug.” Uh huh.
This view is summed up by a friend of mine who said, “It slows down development, and it behaves as an overeager junior dev at best.”
Too stubborn to see
Sure, it’s an overeager junior developer. An overeager junior developer who codes a hundred times faster than you do. Who works 24/7/365. Who, even if he writes bugs, writes them in 10 minutes, finds them in one, and has them fixed at the 12-minute mark. That kind of speed changes what the word “buggy” even means. Is it even a bug if you fix it so fast that it never makes it into the repository?
Yeah, he was a junior developer eight months ago, but he went to school and got his PhD while you weren’t paying attention anymore.
Maybe my friend doesn’t want to give up his code. Maybe he hasn’t looked deeply enough or recently enough. Maybe he’s just stubborn and close-minded.
He and those like him are the ones I really feel for—they are passing up the thrill of a lifetime. Those future developers don’t have a choice—they’ll never be taught to code. But the developers today who willfully pass up the opportunity to feel the earth shaking under their feet?
Their loss.
First look: Electrobun for TypeScript-powered desktop apps 11 Mar 2026, 2:00 am
Ever since Electron’s first release, developers have both rejoiced and lamented. Electron offers a convenient way to package a web-UI application across platforms, with almost exactly the same behavior, UI/UX, and underlying codebase everywhere. But it also imposes a large memory and disk-space footprint, bundling a full copy of a web browser and JavaScript runtime for all the convenience it provides.
A whole roster of competing projects have emerged to try to deliver the same convenience and consistency without the bloat. Tauri uses Rust to build a small deliverable and invoke the system-native web view as one of its front-end options, but requires learning and using Rust.
Another recent contender is Electrobun. This project uses the Bun runtime for JavaScript, which also allows for writing applications directly in TypeScript. Electrobun claims to produce far smaller bundles than regular Electron, as it does not require a bundled browser to work. And it comes with its own differential update technology, so you don’t have to roll your own update mechanism or deliver multi-megabyte patches to fix a single issue.
Setting up an Electrobun application
Before you can begin using Electrobun, you will need to have Bun installed. Once you have your Bun installation, you can run bun install electrobun to set up Electrobun as a dependency. You can then quickly set up an Electrobun project’s scaffolding with the command bunx electrobun init, with sample application templates available by default:
- The
srcdirectory contains a directory for the application code (underbun) and a directory for the HTML views (mainview). - The file
electrobun.config.tsdescribes the project’s configuration and build data—what directories or files to copy for the build process, whether or not to bundle the browser, the entry point for the app, and so on.
Any other files present in the directory will be common to other Bun or TypeScript projects, such as the bun.lock file or the package.json and tsconfig.json files.
If you run the above init command, you’ll get a sample application you can launch and run immediately in development mode with the command bun start.

A basic “hello world” application created with Electrobun. The menus, window fixtures, icon, and tray presence are all customizable.
Foundry
To build the app into a distribution artifact, use the command bunx electrobun build. Add the --env=stable flag to produce a non-development build, and to invoke any patch generation you might have configured. (More on this later.) The resulting setup package will appear in an artifacts directory. On Windows, you’re given a self-extracting installer, but you can also redistribute a .zip archive that can just be unpacked in place.
You can elect to bundle an instance of the browser with the application or use the system’s native web view. For Linux systems, or environments where you want to guarantee feature behavior, you’ll want to bundle the browser, although this makes the download size and the on-disk footprint much bigger. The size of a compressed “hello world” download without the browser included is generally around 30MB.
Front-end and back-end development
Electrobun has no preferred front-end framework. You can use vanilla JavaScript or TypeScript as your front end, or you can use common front ends like Svelte, Angular, or React. The included boilerplate examples provide simple examples of applications written with Svelte along with React, Tailwind, or Vite.
The back end is typically written in TypeScript, but anything that can be shipped as a Bun or NPM dependency will work. To access Electrobun’s APIs, you just import them: import Electrobun from "electrobun/bun"; or import {BrowserWindow, ApplicationMenu,} from "electrobun/bun";.
Electrobun’s API provides interfaces to the common components you’d use to create a desktop app:
- BrowserWindow: The application window itself, so named because it uses a web browser, although it won’t be default display things like the address bar or navigation buttons.
- BrowserView: The actual web browser contained in the window. This can be used as-is for a single view, or it can contain multiple
electrobun-webviewtags, each of which creates its own standalone browser document (essentially, aniframebut with more control). - ContextMenu: Gives you control over the right-click context menu that pops up. This can be invoked even when the Electrobun app isn’t in focus.
- ApplicationMenu: The app’s own window menu, which uses UI-native window-menu styling, including accelerator keys. Note that this is not currently supported on Linux.
- Tray: Access to the system tray icon. However, pop-up notifications or “toasts” that appear in that area are not currently supported.
Electrobun apps also come with a wealth of pre-defined events that you can hook into, either locally or globally. For instance, a navigation event can be hooked at the application level (global), or at the web view level (local), or both. Local events fire before global ones, so you can perform things like an orderly teardown of resources.
The app’s build configuration also has its own API. This lets you write hooks for behaviors that, for instance, only manifest when you’ve built the app in dev mode.
Application delivery and updates
Some application frameworks include an installer mechanism, but few of them offer a way to upgrade an already-installed instance of the app. Electrobun has its own update API, which includes mechanisms for checking for updates and generating patch files for each release. Patches are differential; they contain only changes from the past release, so they tend to be very lightweight unless you include significant changes like new dependencies.
Note that patches are only downloaded and applied if the user is upgrading from the immediately previous release of the program. If the user downloaded 1.1, and doesn’t update until version 1.5 comes out, the updater won’t download patches for 1.2, 1.3, etc. and apply them in sequence; it’ll simply download the full version of the latest revision.
Conclusion
Electron’s appeal isn’t just about its portability or convenience. It also provides a way to build a full application stack with JavaScript, the same language used to create the modern web. Electrobun aims to expand on that by making TypeScript, rather than JavaScript, the language of choice, and by providing added conveniences for application deployment and updates.
Electrobun currently has the hallmarks of a young project. The documentation is occasionally out of sync with the project itself, so that some of the examples in the docs don’t track with the code generated by the boilerplate setup. And, even though the downloaded artifact compresses decently well, the app’s on-disk footprint is still quite large after extraction due to the size of the Bun runtime.
Amazon is linking site hiccups to AI efforts 10 Mar 2026, 6:18 pm
Amazon convened an engineering meeting Tuesday to discuss “a spate of outages” that are tied to the use of AI tools, according to a report in the Financial Times.
“The online retail giant said there had been a ‘trend of incidents’ in recent months, characterized by a ‘high blast radius’ and ‘gen-AI assisted changes’” according to a briefing note for the mandatory meeting, the FT said. “Under ‘contributing factors,’ the note included ‘novel genAI usage for which best practices and safeguards are not yet fully established.’”
The story quoted Dave Treadwell, a senior vice-president in the Amazon engineering group, as saying in the note that “junior and mid-level engineers will now require more senior engineers to sign off any AI-assisted changes.”
However, said Chirag Mehta, principal analyst for Constellation Research, the senior engineer sign-off idea may inadvertently undo the key benefit of the AI strategy: efficiency.
“If every AI-assisted change now needs a senior engineer staring at diffs, the enterprise gives back much of the speed benefit it was chasing in the first place,” Mehta said. “The real fix is to move review upstream and make it machine-enforced: policy checks before deployment, stricter blast-radius controls for high-risk services, mandatory canarying, automatic rollback, and stronger provenance so teams always know which changes were AI-assisted, who approved them, and what production behavior changed afterward.”
The requirement for approvals follows several AI-related incidents that took down Amazon and AWS services, including a nearly six hour long Amazon site outage earlier this month, and a 13-hour interruption of an AWS service in December.
In a statement posted after this article was published, Amazon said, “only one of the incidents involved AI-assisted tooling, which related to an engineer following inaccurate advice that an agent inferred from an outdated internal wiki, and none involved AI-written code.” And, it added, “The incidents discussed [in the meeting] were limited to Amazon’s retail store infrastructure and did not involve AWS.”
The statement also denied the FT report that the company had introduced new approval requirements for working with AI tools.
Glitches inevitable
Analysts and consultants said it is hardly surprising that enterprises such as Amazon are discovering that non-deterministic systems deployed at scale will create embarrassing problems. Humans in the loop is a fine approach, but there have to be enough humans to reasonably handle the massive scope of the deployment. In healthcare, for example, telling a human to approve 20,000 test results during an eight-hour shift is not putting meaningful controls in place. It is instead setting up the human to take the blame for the inevitable test errors.
Acceligence CIO Yuri Goryunov stressed that glitches like these were always inevitable.
“To me, these are normal growing pains and natural next steps as we’re introducing a newish technology into our established workflows. The benefits to productivity and quality are immediate and impressive,” Goryunov said. “Yet there are absolutely unknown quirks that need to be researched, understood and remediated. As long as productivity gains exceed the required remediation and validation work within the agreed upon parameters, we’ll be OK. If not, we’ll have to revert to legacy methods for that particular application.”
‘Reckless’ strategy
However, Nader Henein, a Gartner VP analyst, said that he expects the problem to get worse.
“These kinds of incident will continue to happen with more frequency. The fact is that most organizations think they can drop in AI-assisted capabilities in the same way that they can drop in a new employee, without changing the surrounding structure,” Henein said. “When we hand an AI system a task and a rulebook, we might think we’ve got things locked down. But the truth is, AI will do whatever it takes to achieve its goal within those rules, even if it means finding creative and sometimes alarming loopholes.
“It’s not that AI is malicious. It’s just that it doesn’t care. It doesn’t have the boundaries, the empathy, or the gut check that most people develop over time.”
In view of this, said Flavio Villanustre, CISO for the LexisNexis Risk Solutions Group, the typical enterprise AI strategy is “reckless.”
“You could consider the AI system as some sort of genius child with little and unpredictable sense for safety, and you give it access to do something that could cause significant harm on the promise of performance increase and/or cost reduction. This is close to the definition of recklessness,” Villanustre said.
“As a minimum, if you did this in a traditional manner, you would try this in a test environment independently, verify the results, and then migrate the actions to the production environment,” he noted. “Even though adding a human in the loop can slow things down and somewhat decrease the benefits of using AI, it is the correct way to apply this technology today.”
Other practical tactics
However, the human in the loop isn’t a complete solution. There are other practical tactics that help minimize AI exposure, said cybersecurity consultant Brian Levine, executive director of FormerGov.
“Traditional QA processes were never designed for systems that can generate novel errors no human has ever seen before. That’s why simply adding more human oversight doesn’t solve the problem. It just slows everything down while the underlying risk remains,” Levine said. “AI introduces a new category of failure: unknown‑unknowns at machine speed. These aren’t bugs in the traditional sense. They are emergent behaviors. You can’t patch your way out of that.”
Even worse, Levine argued, is that these bugs beget far more bugs.
“AI doesn’t just make mistakes. It makes mistakes that propagate instantly. Enterprises need a separate deployment pipeline for AI‑assisted changes, with stricter gating and automated rollback triggers,” he said. “If AI can write code, your systems need the equivalent of financial‑market circuit breakers to stop cascading failures. This means automated anomaly detection that halts deployments before customers feel the impact.”
He noted that the goal isn’t to watch AI more closely, it’s to give it “fewer ways to break things.” Techniques such as sandboxing, capability throttling, and guardrail‑first design are far more effective than trying to manually review every change.
Levine added: “AI can accelerate development, but your core infrastructure should always have a human‑authored fallback. This ensures resilience when AI‑generated changes behave unpredictably.”
Need a separate operating model
Manish Jain, a principal research director at Info-Tech Research Group, agreed. The Amazon situation is not as much evidence that AI makes more mistakes as it is evidence that AI now operates at a scale where even small errors can have “a massive blast radius” and may pose “an existential threat” to the organization.
“The danger isn’t that AI may make mistakes,” he said. “The danger is that it compresses the time humans have to intervene and correct a disastrous trajectory. With the advent of agentic AI, time‑to‑market has dropped exponentially. Governance, however, has not evolved to contain the risks created by this pace of technological acceleration.”
Jain stressed, however, that adding people into the mix is not, on its own, a fix. It has to be done reasonably, which means making an honest guess how much one human can oversee meaningfully.
“Putting a human in the loop sounds prudent, but it is not a panacea,” Jain said. “At scale, the loop soon spins faster than the human. Human in the loop cannot be the hammer for every agentic AI nail. It must be complemented by human‑over‑the‑loop controls, informed by factors such as autonomy, impact radius and irreversibility.”
Mehta added, “AI changes the shape of operational risk, not just the amount of it. These systems can produce code or change instructions that look plausible, pass superficial review, and still introduce unsafe assumptions in edge cases.
“That means companies need a separate operating model for AI-assisted production changes, especially in checkout, identity, payments, pricing, and other customer-critical paths. Those are exactly the kinds of workflows where the tolerance for experimentation should be extremely low.”
This article has been updated with a statement from Amazon.
Claude Code adds code reviews 10 Mar 2026, 4:31 pm
Anthropic has introduced Code Review to Claude Code, a new feature that performs deep, multi-agent code reviews that catch bugs humans often miss, the company said.
Introduced March 9, Code Review is available in a research preview stage for Claude for Teams and Claude for Enterprises customers. Dispatching agents on a pull request, Code Review dispatches a team of agents that look for bugs in parallel, verify bugs to filter out false positives, and rank bugs by severity, according to Anthropic. The result appears in the pull request as a single, high-signal overview comment, plus in-line comments for specific bugs. The average review takes around 20 minutes, Anthropic said.
Anthropic has been running Code Review internally for months. On large pull requests (more than 1,000 lines changed), 84% get findings, averaging 7.5 issues. On small pull requests of fewer than 50 lines, the rate of findings drops to 31%, averaging 0.5 issues. Anthropic has found that its engineers mostly agree with what Code Review surfaces, marking less than 1% of findings as incorrect.
TypeScript 6.0 reaches release candidate stage 10 Mar 2026, 2:03 pm
TypeScript 6.0, a planned update to Microsoft’s strongly typed JavaScript variant, has reached the release candidate (RC) stage, with the RC adding type checking for function expressions in generic calls.
The last TypeScript release based on the JavaScript codebase, before TypeScript 7.0 introduces a compiler and language service based on the Go language for better performance, TypeScript 6.0 reached the RC stage on March 6. General availability of the production release has been set for March 17, although the RC was supposed to be released February 24, meaning it was 10 days late. The TypeScript 6.0 RC, which follows the February 11 beta release, can be installed via NPM by running the command npm install -D typescript@rc.
New in the RC is an adjustment in type checking for function expressions in generic calls, especially those occurring in generic JSX expressions, according to Microsoft. Aimed at aligning TypeScript 6.0 with the planned behavior of Go-based TypeScript 7.0, this adjustment will typically catch more bugs in existing code, though developers may find that some generic calls may need an explicit type argument.
Also, Microsoft has extended its deprecation of import assertion syntax (i.e. import ... assert {...}) to import() calls like import(..., { assert: {...}}). And DOM types have been updated to reflect the latest web standards, including some adjustments to Temporal APIs.
Other changes in TypeScript 6.0 include the RegExp.escape function for escaping regular expression characters such as *, ?, and +. Based on an ECMAScript proposal that has reached stage 4, RegExp.escape is now available in the es2025 library. Also, the contents of lib.dom.iterable.d.ts and lib.dom.asynciterable.d.ts are now included in lib.dom.d.ts. TypeScript’s lib option lets developers specify which global declarations a target runtime has.
Now feature-complete, TypeScript 6.0 also deprecates the asserts syntax. The asserts keyword was proposed to the JavaScript language via the import assertions proposal; however, the proposal eventually morphed into the import attributes proposal, which uses the with keyword instead of asserts.
Microsoft expects TypeScript 7.0 to follow soon after TypeScript 6.0, with the goal of maintaining continuity while enabling a faster feedback loop for migration issues discovered during adoption.
JetBrains launches Air and Junie CLI for AI-assisted development 10 Mar 2026, 8:18 am
JetBrains has introduced two new tools for AI-assisted software development: Air, an environment for delegating coding tasks to multiple AI agents and running them concurrently, and Junie CLI, an LLM-agnostic coding agent.
Both were announced on March 9. Air, in public preview, can be downloaded from air.dev, while Junie CLI, in beta, is accessible at junie.jetbrains.com.
Air, now free for MacOS with Linux and Windows versions coming soon, is an agentic development environment, or ADE, built on the idea of integrating the essential tools for managing coding agents into a single coherent experience, JetBrains said. Serving as a single workspace where Claude Agent, Gemini CLI, Codex, and Junie CLI can work side-by-side, Air helps developers navigate a codebase and easily switch back and forth between different coding agents. Developers can mention a specific line, commit, class, method, or other symbol when defining a task, providing the agent with precise context instead of a blob of pasted text. And when the task is done, Air displays the changes in the context of the entire codebase, along with essential tools like a terminal, Git, and a built-in preview, according to JetBrains. Air will soon add support for additional coding agents via the Agent Client Protocol (ACP) through the ACP Agent Registry, the company noted.
Like Air, Junie CLI is built to ensure that code generated by agents is grounded in the reality of the codebase. The standalone coding agent is designed to be LLM-agnostic and open to all high-performing models, capable of solving complex problems, context-aware by default, and reliable and secure, JetBrains said. With the planned March release, Junie CLI will support use directly from the terminal, inside any IDE, in CI/CD, and on GitHub or GitLab. Junie CLI currently supports top-performing models from OpenAI, Anthropic, Google, and Grok, and will be integrating the latest models as they are released.
MariaDB taps GridGain to keep pace with AI-driven data demands 10 Mar 2026, 2:59 am
MariaDB, the company behind the open-source fork of MySQL, is planning to acquire in-memory computing middleware provider GridGain to bolster its platform for high-performance data and artificial intelligence (AI) workloads.
The database provider is planning to infuse its relational database with the California-headquartered startup’s in-memory technology, which it says will enable its database offerings to be ready for real-time and AI workloads that demand sub-millisecond latency.
Analysts, too, see potential in the acquisition.
“This acquisition is about closing a performance gap. Putting these two together has the potential to reduce the time it takes to access and process operational data,” said Robert Kramer, principal analyst at Moor Insights and Strategy.
“That matters for modern applications where systems need to react immediately to business events. Consider fraud detection, dynamic pricing, operational monitoring, or automated workflows that depend on fast decisions,” Kramer added.
GridGain’s recent addition of support for AI workloads through functionalities, such as in-memory machine learning and vector search, will enable MariaDB to address the emerging requirement for real-time AI inferencing to support generative and agentic AI workloads, said ISG’s director of software research Matt Aslett.
Further, Aslett said that GridGain’s ability to accelerate performance and scalability while maintaining transactional integrity and durability will enable MariaDB to expand to “important” industry sectors, such as financial services and telecommunications.
In fact, Aslett sees the acquisition as an indication of MariaDB’s improved stability following its acquisition by K1 Investment Management, after going through a difficult financial phase.
Under K1’s stewardship, the database provider recently reacquired SkySQL and later lapped up Codership to add active-active synchronous replication capabilities to its database offerings.
However, analysts cautioned that while the acquisition marks a step in the right direction in MariaDB’s comeback efforts and could help it re-enter conversations with CIOs, it is unlikely to suddenly transform the company’s platform into the centerpiece of enterprise AI stacks.
“The real test will be execution. Integrating two complex technologies and presenting them as a cohesive platform is not trivial. Customers will want to see that the capabilities work smoothly together and that the company can deliver a consistent roadmap around the combined technology,” Kramer said.
Further, Kramer noted that MariaDB faces stiff competition as the market is already crowded with vendors that provide very deep ecosystems around data.
“Hyperscalers and major data platform vendors offer integrated services across storage, analytics, and model infrastructure. MariaDB’s differentiation will likely depend on whether the combined platform can deliver operational speed and simplicity that organizations find easier to run than those larger stacks,” Kramer said.
When asked about how the acquisition will affect GridGain’s existing customers, the company, in a statement, said that nothing will change in the short term and current contracts, support teams, and technology remain “exactly as they are today”.
In the long-term, though, MariaDB hinted that GridGain customers might have to buy a single integrated product: “Long-term, customers will gain the added benefit of a converged platform that combines MariaDB’s relational reliability with GridGain’s sub-millisecond speed — providing a single, high-velocity foundation for the next generation of AI and enterprise workloads.”
How developers can bring voice AI into telephony applications 10 Mar 2026, 2:00 am
In the era of support apps and chatbots, telephony continues to hold strong as the backbone of customer communication, and voice AI is entering the call center scene to further streamline customer interaction.
However, this means developers are suddenly being confronted with a whole new set of challenges, foremost among them the difficulty of bridging the gap between layers of AI and “legacy” telecom networks. In fact, as large language models constantly evolve and update, the voice AI pipeline must be designed from the outset for easy switching. With much uncertainty surrounding the shift, one thing is clear: It’s crucial not to underestimate the challenges latent in AI-telephony integration.
Voice AI agents have a multitude of enterprise use cases. They are a valuable tool for setting customer appointments, then rescheduling and canceling them as needed. Moreover, they serve to triage inbound calls, before routing them correctly to human agents. Voice AI can even shoulder the responsibility of organizing ETAs, coordinating deliveries, and scheduling candidates for interviews.
Businesses should assume from the start that they will want to change components of the voice AI pipeline and pick accordingly, focusing on systems that give them flexibility. That said, further problems are continuing to present themselves to developers.
Why telephony is still hard for developers
People often assume that a voice AI agent is simply ChatGPT with a voice, an agent embedded in AI that is receiving and routing calls. This is far from reality. Voice AI agents require a whole infrastructure, containing multiple components that flesh out the LLM to operate successfully in the real world.
- Large language models (LLMs): The cornerstone of any AI call system, they interpret intent, plan steps, and generate responses, all of which enable seamless comms between caller and agent.
- Speech-to-text (STT): This technology is the crucial channel of the system as it converts caller audio to text, without which analytics cannot take place.
- Text-to-speech (TTS): The counterpart and inverse of STT, synthesizing the agent’s response and making it sound like natural speech.
- Turn-taking: How to remain conversational when relying on an AI? That’s where turn-taking comes into play, with voice activity detection and barge-in policies that allow the tone to stay natural.
- Telephony gateway: This bridging device converts PSTN/SIP/WebRTC and manages signaling and media.
These pieces fit together in a complex network of telephony infrastructure, albeit one with some limitations. Local telecom carriers must reckon with these, in addition to their business’s own compliance needs, requirements, and constraints. To this end, communications networks always comprise a mix of vendors and technologies, meaning that enterprises need to stay flexible as they integrate new components with existing elements.
This is especially true for voice AI applications, which have some of the most stringent technical requirements. Application developers should aim to coordinate voice AI-specific elements while interoperating with existing systems.
The technical reality check
Developers face a set of gritty technical problems when integrating voice AI into telecom networks. Moving forward with building a voice AI agent—one that really works in production—means unpacking these issues and building solid solutions.
Managing latency
Latency is a niggling issue that threatens any good voice AI system. Gaps and pauses before hearing a response are a red flag for callers: The user may conclude that the agent either isn’t there or that the tech isn’t working properly.
The International Telecommunications Union (ITU) recommends a mouth-to-ear latency of less than 400 milliseconds to maintain a natural conversation. “Mouth-to-ear” refers to the length of time between words leaving someone’s lips and hitting the ear, or being heard by the listener. It then usually takes humans a couple of hundred milliseconds to start to respond. All of this means that, in order to mimic human interaction, AI systems must be able to provide a response in a tight time window. The AI’s response will initiate another trip as the sound moves back through the network, allowing the original talker to hear the response. All in all, the whole interaction needs to take around a second, otherwise it will start to feel off. In reality, most voice AI systems are on the cusp of reaching this measure, yet this is improving with new technologies and better techniques.
Latency can make or break effective real-time AI systems. We’ve seen this with cases of latency coupled with missing language support in health care. A startup based in Australia, for example, wanted to use an AI caller to check on elderly Cantonese-speaking patients. This would seem to be a good use of the technology. However, high latencies to US-based voice AI infrastructure, plus a lack of Cantonese TTS, made the experience unnatural.
Solutions to latency problems resemble engineering modifications. You strive to cut latency wherever you can in the development phase. This requires real-time flows, end-to-end—that is, stream in and out concurrently, rather than waiting for the LLM to produce the full text output before passing it to the TTS to be synthesized.
Keeping a close eye on long delays during calls is also key. This allows a response to be injected when necessary, keeping pauses or silences to a minimum. In fact, another aspect of the solution is holding a steady stream of communication with the user. Rather than the line going silent, leading them to suspect something is wrong, it’s key to make a point to inform callers that a delay is coming up. Background noises can similarly instill confidence that your query is being handled despite any pauses.
Impersonal AI
Another problem for voice AI lies in the potential for AI to become quite monotonous and impersonal, leaving callers with the feeling they were dialed through to some homogenous AI system. Third-party TTS systems exist for this very purpose. Expanding voice options, bringing more variety to the service, help to retain a human touch.
It’s a mark of the diversity of the field that solutions in voice AI-telephony take many forms. Streaming TTS can allow for lower latency, while some vendors offer a wide variety of voices, allowing you to pick one that is unique to your business and needs. Some companies will already have a voice that is identifiable with their brand, meaning that they can clone and input that voice to their voice AI system. Having a distinctive voice speak directly to customers through telephony can be a powerful asset. Others, however, should be able to select from a variety of different voices to find one that aligns well with their brand.
Integrating with telephony systems
One further issue is integrating your AI agent with existing telephony systems, particularly the contact center and enterprise infrastructure. These are themselves often made up of a blend of systems from a mix of vendors; whilst the SIP standard governs most of traditional telephony, that is not a guarantee of interoperability. Indeed, older systems are often fixed or limited in their settings, meaning that new systems must be highly adaptable.
In this context, it makes sense to pick an experienced vendor, someone who knows how to interoperate in a variety of environments and with different systems. Another hack is to ensure they have solid debugging tools and the support needed to work through any unexpected issues that might crop up.
Network quality can vary wildly between countries, particularly in rapidly evolving regions like Latin America. For example, we have seen unreliable SIP interconnections from Mexico, with customers forced to route through the US, adding unnecessary latency. In turn, major investments in Brazil’s infrastructure in recent years have improved service not only within the country but also across the larger region. Ideally, your CPaaS (communications platform as a service) provider will have carrier relationships across many countries, allowing them to optimize traffic in all situations.
Five tips for building real-time voice AI that works
So, to summarize the above, I’ve pulled together five tips on how to build a real-time voice AI that actually works.
- Start by defining the needs and constraints of the user. It’s equally critical to be aware of latency tolerance, supported languages and geographies, as well as other factors like KPIs and compliance scope.
- Choose your comms integration and media path carefully. Specifically, think about where you stand in terms of voice versus messaging. If you go down the voice road, figure out what your architecture will look like, particularly around CPaaS, trunks, transfers, and DTMF (dual tone multi-frequency) signaling.
- No voice AI is complete without a solid, compatible real-time AI pipeline. First, pick an LLM; choosing the underlying LLM will power the behaviors of your voice system, influencing latency, compliance, tone, and much more. Having clarity on voice and pipelines from the start will help businesses craft an effective voice AI.
- Deep integration with existing systems is another piece of the puzzle, allowing the tech to disseminate important information and context about the caller, such as names and account details. Unnatural memory omissions from the bot are a serious non-starter. A well-integrated system can help avoid common downfalls (latency, missing barge-in, or hallucinations) and make your voice AI feel alive.
- Productionization is mission-critical to all telephony applications. It’s key to call centers, to real-time gaming and trading systems, and to your voice agent, which you’ve so successfully built with the goal of running flawlessly on every phone call. Properly built infrastructure enables the bot to manage word error rate, latency, and autoscaling.
Voice AI agents are constantly evolving, representing an iterative tech with a unique set of challenges. I’ll conclude with some tips for future-proofing your voice AI and telecom stack against this backdrop of evolution.
What’s next for real-time voice AI
One key piece of advice is to get ahead of the curve on LLM and speech vendors. Assume that these aren’t static components, but that you’ll want to swap them in order to move with the times. Don’t put yourself on the back foot, but make sure it’s possible to mix and match on your platform.
More broadly, avoid being caught out by evolutions in the tech. By anticipating quality and performance improvements in speech and AI, rather than being overtaken by them, you’ll be able to quickly mobilize improvements when they emerge. Even if you’re reaping the benefits of a certain approach today, don’t hold on for too long, or else a better strategy that’s coming out tomorrow will pass you by.
It’s also worth mentioning that the global reach of voice AI is both a challenge and an advantage. In the San Francisco Bay Area, a significant portion of voice AI orchestration platforms primarily target US users. That’s all well and good, but companies with more internationalized customer bases have the upper hand because they face challenges that many more localized companies have not yet experienced.
For example, latency is a major challenge internationally, where voice AI data centers may be further away (or only based in the US) and telecom carriers may be less reliable. This gives international providers the edge because their global footprint leads to solid carrier relationships and extensive voice AI partners.
Ultimately, it will only be a matter of years before the new generation of voice applications is much-improved over what we see today. In fact, the integration may be so seamless that it will be hard to tell the difference between AI agents and human agents in state-of-the-art systems. This should accelerate call centers in replacing their legacy IVR (interactive voice response) systems with voice AI. So too should it drive developers and stakeholders to build AI-driven call workflows fit for real-world use.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
5 requirements for using MCP servers to connect AI agents 10 Mar 2026, 2:00 am
One of the most poweful collaborations between AI and tech giants, Model Context Protocol (MCP) is a standard for connecting AI agents. We need standards like MCP to orchestrate communication between AI agents, AI assistants, LLMs, and other resources. Such standards are also critical for developing more complex agentic workflows.
The MCP protocol enables two key technologies: The MCP server connects AI agents, makes them discoverable, and provides other operational services. The MCP gateway is a reverse proxy that serves as an interface between AI agents, MCP servers, and other services that support the MCP protocol.
Many organizations are utilizing AI agents from top-tier SaaS and security companies while also experimenting with ones from growing startups. Devops teams aim to build trustworthy AI agents while avoiding the risks of rapid deployment. The AI development roadmap will likely require agent-to-agent communication with the help of MCP servers.
Below are five requirements to consider before deploying an MCP server or connecting your AI agents to one.
Requirements for MCP servers
While MCP servers share similarities with other integration technologies, they also have key differences. MCP servers act as a catalog of tools and data for AI agents to use when responding to a prompt or completing a task. They centralize authentication, schemas, error handling, and streaming semantics for processing partial responses. Operational and security teams use MCP servers to monitor activity and respond to security incidents and AI agent performance issues.
The scope and scale of services orchestrated by MCP means teams must define their requirements inside a well-defined IT governance model.
“When using MCP to provide your agents with more tools to get their jobs done, make sure your governance requirements extend to that service,” says Michael Berthold, CEO of KNIME. “Before pointing your agent to an external MCP server, make sure you know and understand how prompts and data are processed, and potentially shared or used for other purposes. Don’t assume a tool that seems to be doing something in isolation isn’t using another AI underneath the hood.”
Also see: Five MCP servers to rule the cloud.
1. Define the MCP server’s scope
MCP servers can play a contextual role in agent-to-agent orchestrations. When an AI agent seeks other AI agents to complete a job, it can query an MCP server to identify potential resources and decide which to interface with. Defining the server’s scope helps shape its problem domain and ownership, as well as its governance, security, and other operational boundaries.
“Design your MCP servers to be narrowly focused, exposing specific and granular tools to your AI agents, instead of trying to be a general-purpose API,” says Simon Margolis, associate CTO of AI and ML at SADA, an Insights Company. “This makes it easier for the AI’s reasoning engine to discover the right tool dynamically and improves the reliability of the actions it takes. An MCP server acts as a smart adapter, translating the AI’s request into the exact command the underlying tool understands.”
“We’ve found that simple, explicit instructions, such as telling the model how to use a vendor’s command-line utility, can outperform a poorly integrated MCP server,” adds Andrew Filev, CEO and founder of Zencoder. “Overloading the model’s context with too many MCP tools can actually degrade performance, confuse the agent, and obscure reasoning paths.”
Creating separate servers for finance, HR, customer support, and IT simplifies creating access rules, monitoring operations for anomalies, and defining lifecycle management policies.
2. Establish integration governance
There are different schools of thought over what resources to connect through an MCP server. For example:
- Gloria Ramchandani, SVP of product at Copado, advises teams to pull data, settings, and context from the MCP server rather than keeping their own copies. “Using the MCP as the single place your agents rely on keeps everything consistent, reduces mistakes, and makes automation smoother as your teams grow,” Ramchandani said.
- James Urquhart, field CTO and developer evangelist at Kamiwaza, recommends against relying on MCP servers for data retrieval. “RAG approaches to incorporating live data into response generation still enable better security and performance than MCP integration.”
- Tun Shwe, AI lead at Lenses, says, “Don’t expose existing web and mobile APIs directly as MCP tools. Whilst it’s a quick way to get started, these APIs tend to be fine-grained with verbose responses; characteristics that are undesirable to AI agents, since they inflate token consumption.”
- Rahul Pradhan, VP of product and strategy of AI and data at Couchbase, advises against treating MCP-connected agents with access to a database as generic, low-risk APIs. He suggests the following instead:
- Treat every tool that can read or write data as highly privileged: Enforce least-privilege roles, segregate access by data sensitivity, and separate read from write paths.
- Design prompts so agents first invoke schema introspection tools to understand scopes, collections, and fields before issuing any operations.
- Constrain agents to vetted, parameterized queries or stored procedures, and log all calls, to reduce the risk of exfiltration, corruption, and compliance failures.
3. Implement security non-negotiables
Many organizations created AI governance policies when they rolled out LLMs, then updated them for AI agents. Deploying MCP servers requires layering on new security non-negotiables related to configuration, deployment, and monitoring.
“Prioritize security because tools exposed by an MCP server can change and may not have the same level of data security an agent expects,” says Ian Beaver, chief data scientist at Verint. “Prompt injection risks exist in both tool responses and user inputs, making tool use the primary vulnerability point for otherwise static foundation models. Therefore, treat all tool use as untrusted sources: Log every tool’s input and output to enable full auditability of agent interactions.”
One critical place to start is defining identity, authentication, and authorization for AI agents. Because AI agents will be discoverable through MCP servers, make sure to be clear and transparent on the scope and entitlements of their capabilities.
“Don’t give AI agents unrestricted access when connecting through MCP,” says Meir Wahnon, co-founder at Descope. “Even though MCP standardizes integrations, many servers still lack proper authentication or use overly broad permissions, leaving systems exposed. Apply the principle of least privilege: Grant narrow scopes, require explicit user consent, and keep humans in the loop for sensitive actions.”
Other security recommendations include isolating high-risk capabilities within dedicated MCP servers or namespaces and implementing cryptographic server verification. Key principles of MCP server security governance include secure communications, data integrity assurance, and incident response integration.
Three more security recommendations:
- Vrajesh Bhavsar, CEO and co-founder of Operant AI, says, “Don’t rely on traditional security approaches that depend on static rules and predefined attack patterns—they cannot keep up with the dynamic, autonomous nature of MCP-connected systems.”
- Arash Nourian, global head of AI at Postman, adds, “Don’t treat MCP as secure out of the box because it currently has close to zero built-in security, with no standardized authentication, weak session management, and unvetted tool registries that open the door to MCP-specific attacks like prompt or tool poisoning.”
- Or Vardi, technical lead at Apiiro, adds, “Keep humans in the loop for any sensitive or business-critical tasks, and also monitor and audit MCP activity to detect misuse early.”
4. Don’t delegate data responsibilities to MCP servers
Several experts cautioned that while MCP servers provide connectivity, they do not vet the data passing through them.
“Don’t assume MCP solves your underlying data quality problems,” says Sonny Patel, chief product and technology officer at Socotra. “MCP provides the connectivity layer, but AI agents can only be as effective as the data they access. If your systems contain incomplete, inconsistent, or siloed information, even perfectly connected agents will produce unreliable results.”
Developers should also scrutinize prompts and other inputs sent to their AI agents via MCP servers and make no assumptions about upstream validation.
“Always implement runtime interception to validate MCP inputs before they reach your agent’s reasoning engine, says Matthew Barker, head of AI research and development at Trustwise. “Attackers can poison tool descriptions, API responses, or shared context with hidden commands that hijack agent behavior. It only takes one compromised agent to cascade malicious instructions across your entire AI ecosystem through inter-agent communication.”
Pranava Adduri, CTO and co-founder, Bedrock Data, says, “Don’t connect AI agents to data sources via MCP without first classifying data and establishing access boundaries. MCP simplifies context sharing but can amplify risk if agents query sensitive or unverified sources.”
5. Manage the end-to-end agent experience
As organizations deploy more AI agents and configure MCP servers, experts suggest setting principles around end-user and operational experiences. Devops teams and SREs will want to ensure they have observability and monitoring tools in place to alert on issues and aid in diagnosing them.
Or Oxenberg, senior full-stack data scientist at Lasso Security, says to establish comprehensive observability with trusted MCP servers. “If you’re using an MCP gateway, remember it monitors only traffic going in and out of the MCP server. For full visibility, capture every interaction and user input, map and monitor the agent’s planning and actions, and track their tasks and decisions. Without this foundation, you can’t detect when agents drift from intended behavior or trace back security incidents.”
Developers should also limit an AI agent’s access to MCP servers and AI agents, granting access to only those providing relevant services. Broadening their access can lead to erroneous results and higher costs.
“As an integrator, you are now crafting a product experience for the agent persona and should treat the modulated toolkit with the same product discipline you apply to the developer UX: clarity, alignment, and value,” says Edgar Kussberg, group product manager of AI at Sonar. “When agents are given broad or generic MCP tools, they spend too much time and tokens exploring, filtering, reasoning, and failing to provide value, wasting budget, complicating review workflows, and diluting trust in agent outputs.”
As more organizations deploy AI agents into production, I expect a growing need to configure MCP servers to support agent-to-agent communication. Establishing an upfront strategy, nonfunctional requirements, and security non-negotiables should guide smarter and safer deployments.
Page processed in 0.152 seconds.
Powered by SimplePie 1.3, Build 20180209064251. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.
