Or try one of the following: 詹姆斯.com, adult swim, Afterdawn, Ajaxian, Andy Budd, Ask a Ninja, AtomEnabled.org, BBC News, BBC Arabic, BBC China, BBC Russia, Brent Simmons, Channel Frederator, CNN, Digg, Diggnation, Flickr, Google News, Google Video, Harvard Law, Hebrew Language, InfoWorld, iTunes, Japanese Language, Korean Language, mir.aculo.us, Movie Trailers, Newspond, Nick Bradbury, OK/Cancel, OS News, Phil Ringnalda, Photoshop Videocast, reddit, Romanian Language, Russian Language, Ryan Parman, Traditional Chinese Language, Technorati, Tim Bray, TUAW, TVgasm, UNEASYsilence, Web 2.0 Show, Windows Vista Blog, XKCD, Yahoo! News, You Tube, Zeldman
Visual Studio Code update shines on coding agents | InfoWorld
Technology insight for the enterpriseVisual Studio Code update shines on coding agents 5 Feb 2026, 2:13 pm
Microsoft has released Visual Studio Code 1.109, the latest update of the company’s popular code editor. The new release brings multiple enhancements for agents, including improvements for optimization, extensibility, security, and session management.
Released February 4 and also known as the January 2026 release, VS Code 1.109 can be downloaded for Windows, Linux, and macOS at code.visualstudio.com.
Microsoft with this release said it was evolving VS Code to become “the home for multi-agent development.” New session management capabilities allow developers to run multiple agent sessions in parallel across local, background, and cloud environments, all from a single view. Users can jump between sessions, track progress at a glance, and let agents work independently.
Agent Skills, now generally available and enabled by default, allow developers to package specialized capabilities or domain expertise into reusable workflows. Skill folders contain tested instructions for specific domains like testing strategies, API design, or performance optimization, Microsoft said.
A preview feature, Copilot Memory, allows developers to store and recall important information across sessions. Agents work smarter with Copilot Memory and experience faster code search with external indexing, according to Microsoft. Users can enable Copilot Memory by setting github.copilot.chat.copilotMemory.enabled to true.
VS Code 1.109 also adds Claude Agent support, enabling developers to leverage Anthropic’s agent SDK directly. The new release also adds support for MCP apps, which render interactive visualizations in chat. The update also introduces terminal sandboxing, an experimental capability that restricts file and network access for agent-executed commands, and auto-approval rules, which skip confirmation for safe operations.
VS Code 1.109 follows last month’s release of VS Code 1.108, which introduced support for agent skills. Other improvements in VS Code 1.109 include the following:
- For Chat UX, streaming improvements show progress as it happens, while support for thinking tokens in Claude models provide better visibility into what the model is thinking.
- For coding and editing, developers can customize the text color of matching brackets using the new
editorBracketMatch.foregroundcolor theme token. - An integrated browser lets developers preview and inspect localhost sites directly in VS Code, complete with DevTools and authentication support. This is a preview feature.
- Terminal commands in chat now show richer details, including syntax highlighting and working directory. New options let developers customize sticky scroll and use terminals in restricted workspaces.
- A finalized Quick Input button APIs offer more control over input placement and toggle states. Proposed APIs would enable chat model providers to declare configuration schemas.
Databricks adds MemAlign to MLflow to cut cost and latency of LLM evaluation 5 Feb 2026, 2:55 am
Databricks’ Mosaic AI Research team has added a new framework, MemAlign, to MLflow, its managed machine learning and generative AI lifecycle development service.
MemAlign is designed to help enterprises lower the cost and latency of training LLM-based judges, in turn making AI evaluation scalable and trustworthy enough for production deployments.
The new framework, according to the research team, addresses a critical bottleneck most enterprises are facing today: their ability to efficiently evaluate and govern the behavior of agentic systems or the LLMs driving them, even as demand for their rapid deployment continues to rise.
Traditional approaches to training LLM-based judges depend on large, labeled datasets, repeated fine-tuning, or prompt-based heuristics, all of which are expensive to maintain and slow to adapt as models, prompts, and business requirements change.
As a result, AI evaluation often remains manual and periodic, limiting enterprises’ ability to safely iterate and deploy models at scale, the team wrote in a blog post.
MemAlign’s memory-driven alternative to brute-force retraining
In contrast, MemAlign uses a dual memory system that replaces brute-force retraining with memory-driven alignment based on human feedback from human subject matter experts, although fewer in number and frequency than conventional training methods.
Instead of repeatedly fine-tuning models on large datasets, MemAlign separates knowledge into a semantic memory, which captures general evaluation principles, and an episodic memory, which stores task-specific feedback expressed in natural language by subject matter experts, depending on the use case.
This allows LLM judges to rapidly adapt to new domains or evaluation criteria using small amounts of human feedback, while retaining consistency across tasks, the research team wrote.
This reduces the latency and costs required to reach more efficient and stable levels of judgment, making the approach more practical to adapt for enterprises, the team added.
In Databricks-controlled tests, MemAlign was able to show the same efficiency as labeled datasets.
Analysts expect the new framework to benefit enterprises and their development teams.
“For developers, MemAlign helps reduce the brittle prompt engineering trap where fixing one error often breaks three others. It provides a delete or overwrite function for feedback. If a business policy changes, the developer can update or overwrite relevant feedback rather than restarting the alignment process,” said Stephanie Walter, practice leader of AI stack at HyperFRAME Research.
Walter was referring to the framework’s episodic memory, which is stored as a highly scalable vector database, enabling it to handle millions of feedback examples with minimal retrieval latency.
The ability to keep LLM-based judges aligned with changing business requirements, according to Moor Insights and Strategy principal analyst Robert Kramer, is a critical ability as it doesn’t destabilize production systems, which is increasingly important for enterprises as agentic systems scale.
Agent Bricks may soon get MemAlign
Separately, a company spokesperson told InfoWorld that Databricks may soon embed MemAlign to its AI-driven agent development interface, Agent Bricks.
More so because the company feels that the new framework would be more efficient in evaluating and governing agents built on the interface than previously introduced capabilities, such as Agent-as-a-Judge, Tunable Judges, and Judge Builder.
Judge Builder, which was previewed in November last year, is a visual interface to create and tune LLM judges with domain knowledge from subject matter experts and utilizes the Agent-as-a-Judge feature that offers insights into an agent’s trace, making evaluations more accurate.
“While the Judge Builder can incorporate subject matter expert feedback to align its behavior, that alignment step is currently expensive and requires significant amounts of human feedback,” the spokesperson said.
“MemAlign will soon be available in the Judge Builder, so users can build and iterate on their judges much faster and much more cheaply,” the spokesperson added.
The ‘Super Bowl’ standard: Architecting distributed systems for massive concurrency 5 Feb 2026, 2:00 am
In the world of streaming, the “Super Bowl” isn’t just a game. It is a distributed systems stress test that happens in real-time before tens of millions of people.
When I manage infrastructure for major events (whether it is the Olympics, a Premier League match or a season finale) I am dealing with a “thundering herd” problem that few systems ever face. Millions of users log in, browse and hit “play” within the same three-minute window.
But this challenge isn’t unique to media. It is the same nightmare that keeps e-commerce CTOs awake before Black Friday or financial systems architects up during a market crash. The fundamental problem is always the same: How do you survive when demand exceeds capacity by an order of magnitude?
Most engineering teams rely on auto-scaling to save them. But at the “Super Bowl standard” of scale, auto-scaling is a lie. It is too reactive. By the time your cloud provider spins up new instances, your latency has already spiked, your database connection pool is exhausted and your users are staring at a 500 error.
Here are the four architectural patterns we use to survive massive concurrency. These apply whether you are streaming touchdowns or processing checkout queues for a limited-edition sneaker drop.
1. Aggressive load shedding
The biggest mistake engineers make is trying to process every request that hits the load balancer. In a high-concurrency event, this is suicide. If your system capacity is 100,000 requests per second (RPS) and you receive 120,000 RPS, trying to serve everyone usually results in the database locking up and zero people getting served.
We implement load shedding based on business priority. It is better to serve 100,000 users perfectly and tell 20,000 users to “please wait” than to crash the site for all 120,000.
This requires classifying traffic at the gateway layer into distinct tiers:
- Tier 1 (Critical): Login, Video Playback (or for e-commerce: Checkout, Inventory Lock). These requests must succeed.
- Tier 2 (Degradable): Search, Content Discovery, User Profile edits. These can be served from stale caches.
- Tier 3 (Non-Essential): Recommendations, “People also bought,” Social feeds. These can fail silently.
We use adaptive concurrency limits to detect when downstream latency is rising. As soon as the database response time crosses a threshold (e.g. 50ms), the system automatically stops calling the Tier 3 services. The user sees a slightly generic homepage, but the video plays or the purchase completes.
For any high-volume system, you must define your “degraded mode.” If you don’t decide what to turn off during a spike, the system will decide for you, usually by turning off everything.
2. Bulkheads and blast radius isolation

Manoj Yerrasani
On a cruise ship, the hull is divided into watertight compartments called bulkheads. If one section floods, the ship stays afloat. In distributed systems, we often inadvertently build ships with no walls.
I have seen massive outages caused by a minor feature. For example, a third-party API that serves “user avatars” goes down. Because the “Login” service waits for the avatar to load before confirming the session, the entire login flow hangs. A cosmetic feature takes down the core business.
To prevent this, we use the bulkhead pattern. We isolate thread pools and connection pools for different dependencies.
In an e-commerce context, your “Inventory Service” and your “User Reviews Service” should never share the same database connection pool. If the Reviews service gets hammered by bots scraping data, it should not consume the resources needed to look up product availability.
We strictly enforce timeouts and Circuit Breakers. If a non-essential dependency fails more than 50% of the time, we stop calling it immediately and return a default value (e.g. a generic avatar or a cached review score).
Crucially, we prefer semaphore isolation over thread pool isolation for high-throughput services. Thread pools add overhead context switching. Semaphores simply limit the number of concurrent calls allowed to a specific dependency, rejecting excess traffic instantly without queuing. The core transaction must survive even if the peripherals are burning.
3. Taming the thundering herd with request collapsing
Imagine 50,000 users all load the homepage at the exact same second (kick-off time or a product launch). All 50,000 requests hit your backend asking for the same data: “What is the metadata for the Super Bowl stream?”
If you let all 50,000 requests hit your database, you will crush it.
Caching is the obvious answer, but standard caching isn’t enough. You are vulnerable to the “Cache Stampede.” This happens when a popular cache key expires. Suddenly, thousands of requests notice the missing key and all of them rush to the database to regenerate it simultaneously.
To solve this, we use request collapsing (often called “singleflight”).
When a cache miss occurs, the first request goes to the database to fetch the data. The system identifies that 49,999 other people are asking for the same key. Instead of sending them to the database, it holds them in a wait state. Once the first request returns, the system populates the cache and serves all 50,000 users with that single result.
This pattern is critical for “flash sale” scenarios in retail. When a million users refresh the page to see if a product is in stock, you cannot do a million database lookups. You do one lookup and broadcast the result.
We also employ probabilistic early expiration (or the X-Fetch algorithm). Instead of waiting for a cache item to fully expire, we re-fetch it in the background while it is still valid. This ensures the user always hits a warm cache and never triggers a stampede.
4. The ‘game day’ rehearsal
The patterns above are theoretical until tested. In my experience, you do not rise to the occasion during a crisis; you fall to the level of your training.
For the Olympics and Super Bowl, we don’t just hope the architecture works. We break it on purpose. We run game days where we simulate massive traffic spikes and inject failures into the production environment (or a near-production replica).
We simulate specific disaster scenarios:
- What happens if the primary Redis cluster vanishes?
- What happens if the recommendation engine latency spikes to 2 seconds?
- What happens if 5 million users log in within 60 seconds?
During these exercises, we validate that our load shedding actually kicks in. We verify that the bulkheads actually stop the bleeding. Often, we find that a default configuration setting (like a generic timeout in a client library) undoes all our hard work.
For e-commerce leaders, this means running stress tests that exceed your projected Black Friday traffic by at least 50%. You must identify the “breaking point” of your system. If you don’t know exactly how many orders per second breaks your database, you aren’t ready for the event.
Resilience is a mindset, not a tool
You cannot buy “resilience” from AWS or Azure. You cannot solve these problems just by switching to Kubernetes or adding more nodes.
The “Super Bowl Standard” requires a fundamental shift in how you view failures. We assume components will fail. We assume the network will be slow. We assume users will behave like a DDoS attack.
Whether you are building a streaming platform, a banking ledger or a retail storefront, the goal is not to build a system that never breaks. The goal is to build a system that breaks partially and gracefully so that the core business value survives.
If you wait until the traffic hits to test these assumptions, it is already too late.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Deno Sandbox launched for running AI-generated code 5 Feb 2026, 1:00 am
Deno Land, maker of the Deno runtime, has introduced Deno Sandbox, a secure environment built for code generated by AI agents. The company also announced the long-awaited general availability of Deno Deploy, a serverless platform for running JavaScript and TypeScript applications. Both were announced on February 3.
Now in beta, Deno Sandbox offers lightweight Linux microVMs running as protected environments in the Deno Deploy cloud. Deno Sandbox defends against prompt injection attacks, the company said, where a user or AI attempts to run malicious code. Secrets such as API keys never enter the sandbox and will only appear when an outbound HTTP request is sent to a pre-approved host, according to the company.
Deno Sandbox was created in response to the rise in AI-driven development, explained Deno co-creator Ryan Dahl, as more LLM-generated code is being released with the ability to call external APIs using real credentials, without human review. In this scenario, he wrote, “Sandboxing the compute isn’t enough. You need to control network egress and protect secrets from exfiltration.” Deno Sandbox provides both, according to Dahl. It specializes in workloads where code must be generated, evaluated, or safely executed on behalf of an untrusted user.
Developers can create a Deno Sandbox programmatically via Deno’s JavaScript or Python SDKs. The announcement included the following workload examples for Deno Sandbox:
- AI agents and copilots that need to run code as they reason
- Secure plugin or extension systems
- Vibe-coding and collaborative IDE experiences
- Ephemeral CI runners and smoke tests
- Customer-supplied or user-generated code paths
- Instant dev servers and preview environments
Also announced on February 3 was the general availability of Deno Deploy, a platform for running JavaScript and TypeScript applications in the cloud or on a user’s own infrastructure. It provides a management plane for deploying and running applications with the built-in CLI or through integrations such as GitHub Actions, Deno said. The platform is a rework of Deploy Classic and has a new dashboard, and a new execution environment that uses Deno 2.0.
What is context engineering? And why it’s the new AI architecture 5 Feb 2026, 1:00 am
Context engineering is the practice of designing systems that determine what information an AI model sees before it generates a response to user input. It goes beyond formatting prompts or crafting instructions, instead shaping the entire environment the model operates in: grounding data, schemas, tools, constraints, policies, and the mechanisms that decide which pieces of information make it into the model’s input at any moment. In applied terms, good context engineering means establishing a small set of high-signal tokens that improve the likelihood of a high-quality outcome.
Think of prompt engineering as a predecessor discipline to context engineering. While prompt engineering focuses on wording, sequencing, and surface-level instructions, context engineering extends the discipline into architecture and orchestration. It treats the prompt as just one layer in a larger system that selects, structures, and delivers the right information in the right format so that an LLM can plausibly accomplish its assigned task.
What does ‘context’ mean in AI?
In AI systems, context refers to everything an a large language model (LLM) has access to when producing a response — not just the user’s latest query, but the full envelope of information, rules, memory, and tools that shape how the model interprets that query. The total amount of information the system can process at once is called the context window. The context consists of a number of different layers that work together to guide model behavior:
- The system prompt defines the model’s role, boundaries, and behavior. This layer can include rules, examples, guardrails, and style requirements that persist across turns.
- A user prompt is the immediate request — the short-lived, task-specific input that tells the model what to do right now.
- State or conversation history acts as short-term memory, giving the model continuity across turns by including prior dialog, reasoning steps, and decisions.
- Long-term memory is persistent and spans many sessions. It contains durable preferences, stable facts, project summaries, or information the system is designed to reintroduce later.
- Retrieved information provides the model with external, up-to-date knowledge by pulling relevant snippets from documents, databases, or APIs. Retrieval-augmented generation turns this into a dynamic and domain-specific knowledge layer.
- Available tools consist of the actions an LLM is capable of performing with the help of tool calling or MCP servers: function calls, API endpoints, and system commands with defined inputs and outputs. These tools help the model take actions rather than only produce text.
- Structured output definitions that tell the model exactly how its response should be formatted — for example, requiring a JSON object, a table, or a specific schema.
Together, these layers form the full context an AI system uses to generate responses that are hopefully accurate and grounded. However, a host of difficulties with context in AI can lead to suboptimal results.
What is context failure?
The term “context failure” describes a set of common breakdown modes when AI context systems go wrong. These failures fall into four main categories:
- Context poisoning happens when a hallucination or other factual error slips into the context and then gets used as if it were truth. Over time, the model builds on that flawed premise, compounding mistakes and derailing reasoning.
- Context distraction occurs when the context becomes too large or verbose. Instead of reasoning from training data, the model can overly focus on the accumulated history — repeating past actions or clinging to old information rather than synthesizing a fresh, relevant answer.
- Context confusion arises when irrelevant material — extra tools, noisy data, or unrelated content — creeps into context. The model may treat that irrelevant information as important, leading to poor outputs or incorrect tool calls.
- Context clash occurs when new context conflicts with earlier context. If information is added incrementally, earlier assumptions or partial answers may contradict later, clearer data — resulting in inconsistent or broken model behavior.
One of the advances that AI players like OpenAI and Anthropic have offered for their chatbots are the capability to handle increasingly large context windows. But size isn’t everything, and indeed larger windows can be more prone to the sorts of failures described here. Without deliberate context management — validation, summarization, selective retrieval, pruning, or isolation — even large context windows can produce unreliable or incoherent outcomes.
What are some context engineering techniques and strategies?
Context engineering aims to overcome these types of context failures. Here are some of the main techniques and strategies to apply:
- Knowledge base or tool selection. Choose external data sources, databases, documents or tools the system should draw from. A well-curated knowledge base directs retrieval toward relevant content and reduces noise.
- Context ordering or compression. Decide which pieces of information deserve space and which should be shortened or removed. Systems often accumulate far more text than the model needs, so pruning or restructuring keeps the high-signal material while dropping noise. For instance, you could replace a 2,000-word conversation history with a 150-word summary that preserves decisions, constraints, and key facts but omits chit-chat and digressions. Or you could sort retrieved documents by relevance score and inject only the top two chunks instead of all twenty. Both approaches keep the context window focused on the information most likely to produce a correct response.
- Long-term memory storage and retrieval design. Defines how persistent information — including user preferences, project summaries, domain facts, or outcomes from prior sessions — is saved and reintroduced when needed. A system might store a user’s preferred writing style once and automatically reinsert a short summary of that preference into future prompts, instead of requiring the user to restate it manually each time. Or it could store the results of a multi-step research task so the model can recall them in later sessions without rerunning the entire workflow.
- Structured information and output schemas. These allow you to provide predictable formats for both context and responses. Giving the model structured context — such as a list of fields the user must fill out or a predefined data schema — reduces ambiguity and keeps the model from improvising formats. Requiring structured output does the same: for instance, demanding that every answer conform to a specific JSON shape lets downstream systems validate and consume the output reliably.
- Workflow engineering. You can link multiple LLM calls, retrieval steps, and tool actions into a coherent process. Rather than issuing one giant prompt, you design a sequence: gather requirements, retrieve documents, summarize them, call a function, evaluate the result, and only then generate the final output. Each step injects just the right context at the right moment. A practical example is a customer-support bot that first retrieves account data, then asks the LLM to classify the user’s issue, then calls an internal API, and only then composes the final message.
- Selective retrieval and retrieval-augmented generation. This technique applies filtering so the model sees only the parts of external data that matter. Instead of feeding the model an entire knowledge base, you retrieve only the paragraphs that match the user’s query. One common example is chunking documents into small sections, ranking them by semantic relevance, and injecting only the top few into the prompt. This keeps the context window small while grounding the answer in accurate information.
Together, these approaches allow context engineering to deliver a tighter, more relevant, and more reliable context window for the model — minimizing noise, reducing the risk of hallucination or confusion, and giving the model the right tools and data to behave predictably.
Why is context engineering important for AI agents?
Context engineering gives AI agents the information structure they need to operate reliably across multiple steps and decisions. Strong context design treats the prompt, the memory, the retrieved data, and the available tools as a coherent environment that drives consistent behavior. Agents depend on this environment because context is a critical but limited resource for long-horizon tasks.
Agents fail most often when their context becomes polluted, overloaded, or irrelevant. Small errors in early turns can accumulate into large failures when the surrounding context contains hallucinations or extraneous details. Good context engineering improves their efficiency by giving them only the information they need while filtering out noise. Techniques like ranked retrieval and selective memory keep the context window focused, reducing unnecessary token load and improving responsiveness.
Context also enables statefulness — that is, the ability for agents to remember preferences, past actions, or project summaries across sessions. Without this scaffolding, agents behave like one-off chatbots rather than systems capable of long-term adaptation.
Finally, context engineering is what allows agents to integrate tools, call functions, and orchestrate multi-step workflows. Tool specifications, output schemas, and retrieved data all live in the context, so the quality of that context determines whether the agent can act accurately in the real world. In tool-integrated agent patterns, the context is the operating environment where agents reason and take action.
Context engineering guides
Want to learn more? Dive deeper into these resources:
- LlamaIndex’s “What is context engineering — what it is and techniques to consider“: A solid foundational guide explaining how context engineering expands on prompt engineering, and breaking down the different types of context that need to be managed.
- Anthropic’s “Effective context engineering for AI agents”: Explains why context is a finite but critical resource for agents, and frames context engineering as an essential design discipline for robust LLM applications.
- SingleStore’s “Context engineering: A definitive guide”: Walks you through full-stack context engineering: how to build context-aware, reliable, production-ready AI systems by integrating data, tools, memory, and workflows.
- PromptingGuide.ai’s “Context engineering guide”: Offers a broader definition of context engineering (across LLM types, including multimodal), and discusses iterative processes to optimize instructions and context for better model performance.
- DataCamp’s “Context engineering: A guide with examples“: Useful primer that explains different kinds of context (memory, retrieval, tools, structured output), helping practitioners recognize where context failures occur and how to avoid them.
- Akira.ai’s “Context engineering: Complete guide to building smarter AI systems“: Emphasizes context engineering’s role across use cases from chatbots to enterprise agents, and stresses the differences with prompt engineering for scalable AI systems.
- Latitude’s “Complete guide to context engineering for coding agents”: Focuses specifically on coding agents and how context engineering helps them handle real-world software development tasks accurately and consistently
These guides form a strong starting point if you want to deepen your understanding of context engineering — what it is, why it matters, and how to build context-aware AI systems in practice. As models grow more capable, mastering context engineering will increasingly separate toy experiments from reliable, production-grade agents.
Beyond NPM: What you need to know about JSR 5 Feb 2026, 1:00 am
NPM, the Node Package Manager, hosts millions of packages and serves billions of downloads annually. It has served well over the years but has its shortcomings, including with TypeScript build complexity and package provenance. Recently, NPM’s provenance issues have resulted in prominent security breaches, leading more developers to seek alternatives.
The JavaScript Registry (JSR), brought to us by Deno creator Ryan Dahl, was designed to overcome these issues. With enterprise adoption already underway, it is time to see what the next generation of JavaScript packaging looks like.
The intelligent JavaScript registry
Whether or not you opt to use it right away, there are two good reasons to know about JSR. First, it addresses technical issues in NPM, which are worth exploring in detail. And second, it’s already gathering steam, and may reach critical mass toward ubiquity in the near future.
JSR takes a novel approach to resolving known issues in NPM. For one thing, it can ship you compiled (or stripped) JavaScript, even if the original package is TypeScript. Or, if you are using a platform that runs TypeScript directly (like Deno), you’ll get the TypeScript. It also makes stronger guarantees and offers security controls you won’t find in NPM. Things like package authentication and metadata tracking are more rigorous. JSR also handles the automatic generation of package documentation.
Every package consumer will appreciate these features, but they are even more appealing to package creators. Package authors don’t need to run a build step for TypeScript, and the process of publishing libraries is simpler and easier to reproduce. That’s probably why major shops like OpenAI and Supabase are already using JSR.
JSR responds to package requests based on your application setup and requires little additional overhead. It is also largely (and seamlessly) integrated with NPM. That makes trying it out relatively easy. JSR feels more like an NPM upgrade than a replacement.
How JSR works
How to handle TypeScript code in a JavaScript stack has become a hot topic lately. Mixing TypeScript and JavaScript is popular in development, but leads to friction during the build step, where it necessitates compiling the TypeScript code beforehand. Making type stripping a permanent feature in TypeScript could eventually solve that issue, but JSR resolves it today, by offloading compilation to the registry.
JSR is essentially designed as a superset of the NPM registry, but one that knows the difference between TypeScript and JavaScript.
When a package author publishes to JSR, they upload the TypeScript source directly, with no build step required. JSR then acts as the intelligent middle layer. If you request a package from a Deno environment, JSR serves the package in the original TypeScript version. But if you request it from a Node environment (via npm install), JSR automatically transpiles the code to ESM (ECMAScript Modules)-compliant JavaScript and generates the necessary type definition (.d.ts) files on the fly.
This capability eliminates the so-called TypeScript tax that has plagued library authors for years. There’s no need to configure complex build pipelines or maintain separate distribution folders. You simply publish code, and the registry adapts it to the consumer’s runtime.
It’s also important to know that JSR has adopted the modern ESM format as its module system. That isn’t a factor unless you are relying on an older package based in CommonJS (CJS). Using CJS with JSR is possible but requires extra work. Adopting ESM is a forward-looking move by the JSR team, supporting the transition of the JavaScript ecosystem to one approved standard: ESM.
The ‘slow types’ tradeoff
To make its real-time transpilation possible, JSR introduces a constraint known as “no slow types.” JSR forbids global type inference for exported functions and classes. You must explicitly type your exports (e.g., by defining a function’s return type rather than letting the compiler guess it). This is pretty much a common best practice, anyway.
Explicitly defining return types allows JSR to generate documentation and type definitions instantly without running a full, slow compiler pass for every install. It’s a small tradeoff in developer experience for a massive performance gain in the ecosystem.
Security defaults
Perhaps most handy for enterprise users, JSR tackles supply-chain security directly, through provenance. It integrates with GitHub Actions via OpenID Connect.
When a package is published, JSR generates a transparency log (using Sigstore). The log cryptographically links the package in the registry to the specific commit and CI (continuous integration) run that created it. Unlike the “blind trust” model of the legacy NPM registry, JSR lets you verify that the code you are installing came from the source repository it claims.
This also makes it easier for package creators to provide the security they want without a lot of extra wrangling. What NPM has recently tried to backfill via trusted publishing, JSR has built in from the ground up.
Hands-on with the JavaScript Registry
The best part of JSR is that it doesn’t require you to abandon your current tooling. You don’t need to install a new CLI or switch to Deno to use it. It works with npm, yarn, and pnpm right out of the box via a clever compatibility layer.
There are three steps to adding a JSR package to a standard Node project.
1. Ask JSR to add the package
Instead of the install command, you use the JSR tool to add the package. For example, to add a standard library date helper, you would enter:
npx jsr add @std/datetime
Here, we use the NPX command to do a onetime run of the jsr tool with the add command. This adds the datetime library to the NPM project in the working directory.
2. Let JSR configure the package
Behind the scenes, the npx jsr add command creates an .npmrc file in your project root that maps the @jsr scope to the JSR registry (we will use this alias in a moment):
@jsr:registry=https://npm.jsr.io
You don’t have to worry about this mapping yourself; it just tells NPM, “When you see a @jsr package, talk to JSR, not the public NPM registry.”
3. Let JSR add the dependency
Next, JSR adds the package to your package.json using an alias. The dependency will look something like this:
"dependencies": {
"@std/datetime": "npm:@jsr/std__datetime@^0.224.0"
}
Again, you as developer won’t typically need to look at this. JSR lets you treat the dependency as a normal one, while NPM and JSR do the work of retrieving it. In your program’s JavaScript files, the dependency looks like any other import:
import { format } from "@std/datetime";
JSR and the evolution of JavaScript
JSR comes from the same mind that gave us Node.js: Ryan Dahl. In his well-known 2018 talk, 10 things I regret about Node.js, Dahl discussed ways the ecosystem had drifted from web standards and spoke on the complexity of node_modules. He also addressed how opaque the Node build process could be.
Dahl initially attempted to correct these issues by creating Deno, a secure runtime that uses standard URLs for imports. But JSR is a more pragmatic pivot. While it is possible to change the runtime you use, the NPM ecosystem of 2.5 million packages is too valuable to abandon.
JSR is effectively a second chance to get the architecture right. It applies Deno’s core philosophies (security by default and first-class TypeScript) to the broader JavaScript world. Importantly, to ensure it serves the whole ecosystem (not only Deno users), the project has moved to an independent open governance board, ensuring it remains a neutral utility for all runtimes.
The future of JSR and NPM
Is JSR on track to be an “NPM killer”? Not yet; maybe never. Instead, it acts as a forcing function for the evolving JavaScript ecosystem. It solves the specific, painful problems of TypeScript publishing and supply-chain security that NPM was never designed to handle.
For architects and engineering leads, JSR represents a safe bet for new internal tooling and libraries, especially those written in TypeScript. JSR’s interoperability with NPM ensures the risk of lock-in is near zero. If JSR disappears tomorrow, you’ll still have the source code.
But the real value of JSR is simpler: It lets developers write code, not build scripts. In a world where configuration often eats more time than logic, that alone makes JSR worth exploring.
How to reduce the risks of AI-generated code 5 Feb 2026, 1:00 am
Vibe coding is the latest tech accelerator, and yes, it kind of rocks. New AI-assisted coding practices are helping developers ship new applications faster, and they’re even allowing other business professionals to prototype workflows and tools without waiting for a full engineering cycle.
Using a chatbot and tailored prompts, vibe coders can build applications in a flash and get them into production within days. Gartner even estimates that by 2028, 40% of new enterprise software will be built with vibe coding tools and techniques, rather than traditional, human-led waterfall or agile software development methods. The speed is intoxicating, so I, for one, am not surprised by that prediction.
The challenge here is that when those who aren’t coders—and even some of those who do work with code for a living—get an application that does exactly what they want, they think the work is over. In truth, it has only just begun.
After the app, then comes the maintenance: updating the app, patching it, scaling it, and defending it. And before you expose real users and data to risk, you must first understand the route that AI took to get your new app working.
How vibe coding works
Vibe coding tools and applications are built on large language models (LLMs) and trained on existing code and patterns. You prompt the model with ideas for your application and, in turn, it generates artifacts like code, configurations, UI components, etc. When you try to run the code, or look at the application’s front end, you’ll see one of two things: the application will look and run the way you were expecting, or an error message will be generated. Then comes the iterative phase, tweaking or changing code until you finally get the desired outcome.
Ideally, the end result is a working app that follows software development best practices based on what AI has learned and produced before. However, AI might just help you produce an application that functions and looks great, but is fragile, inefficient, or insecure at the foundational level.
The biggest issue is that this approach does not often account for the learned security and engineering experience that is needed to operate securely in production, where attackers, compliance requirements, customer trust, and operational scale all converge at once. So, if you’re not a security professional or accustomed to developing apps, what do you do?
Just because it works doesn’t mean it’s ready
The first step to solving a problem is knowing that it exists. A vibe-coded prototype can be a win as a proof of concept, but the danger is in treating it as production-ready.
Start with awareness. Use existing security frameworks to check that your application is secure. Microsoft’s STRIDE threat model is a practical way to sanity-check a vibe-coded application before it goes live. STRIDE stands for:
- Spoofing
- Tampering
- Repudiation
- Information disclosure
- Denial of service
- Elevation of privilege
Use STRIDE as a guide to ask yourself the uncomfortable questions before someone else does. For example:
- Can someone pretend to be another user?
- Does the app leak data through errors, logs, or APIs?
- Are there rate limits and timeouts, or can requests be spammed?
To prevent those potential issues, you can check that your new vibe-coded application handles identities correctly and is secure by default. On top of this, you should make sure that the app code doesn’t have any embedded credentials that others can access.
These real-world concerns are common to all applications, whether they’re built by AI or humans. Being aware of issues preemptively allows you to take practical steps toward a more robust defense. This takes you from “it works” to “we understand how it could fail.”
Humans are still necessary for good vibes
Regardless of your personal opinion on vibe coding, it’s not going anywhere. It helps developers and line-of-business teams build what they need (or want), and that is useful. That newfound freedom and ability to create apps, however, must be matched with awareness that security is necessary and cannot be assumed.
The goal of secure vibe coding isn’t to kill momentum—it’s to keep the speed of innovation high and reduce the potential blast radius for threats.
Whatever your level of experience with AI-assisted coding, or even coding in general, there are tools and practices you can use to ensure your vibe-coded applications are secure. When these applications are developed quickly, any security steps must be just as fast-paced and easy to implement. This begins with taking responsibility for your code from the start, and then maintaining it over time. Start on security early–ideally, as you plan your application and begin its initial reviews. Earlier is always better than trying to bolt security on afterward.
After your vibe-coded app is complete and you’ve done some initial security due diligence, you can then look into your long-term approach. While vibe coding is great for testing or initial builds, it is not often the best approach for full-scale applications that must be able to support a growing number of users. At this point, you can implement more effective threat modeling and automated safety guardrails for more effective security. Bring in a developer or engineer while you’re at it, too.
There are many other security best practices to begin following at this point in the process, too. Using software scanning tools, for example, you can see what your application relies on in terms of software packages and/or additional tools, and then check that list for potential vulnerabilities. Alongside evaluating third-part risk, you can move to CI/CD pipeline security checks, such as blocking hardcoded secrets with pre-commit hooks. You can also use metadata around any AI-assisted contributions within the application to show what was written with AI, which models were used to generate that code, and which LLM tools were involved in building your application.
Ultimately, vibe coding helps you build quickly and deploy what you want to see in the world. And while speed is great, security should be non-negotiable. Without the right security practices in place, vibe coding opens you up to a swarm of preventable problems, a slough of undue risk, or worse.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
Apple’s Xcode 26.3 brings integrated support for agentic coding 4 Feb 2026, 11:46 am
Apple is previewing Xcode 26.3 with integrated support for coding agents such as Anthropic’s Claude Agent and OpenAI’s Codex.
Announced February 3, Xcode 26.3, the Apple-platforms-centered IDE, is available as a release candidate for members of the Apple Developer Program, with a release coming soon to the App Store. This latest version expands on intelligence features introduced in Xcode 26 in June 2025, which offered a coding assistant for writing and editing in the Swift language.
In Xcode version 26.3, coding agents have access to more of Xcode’s capabilities. Agents like Codex and Claude Agent can work autonomously throughout the development lifecycle, supporting streamlined workflows and faster iteration. Agents can search documentation, explore file structures, update project settings, and verify their work visually by capturing Xcode previews and iterating through fixes and builds, according to Apple. Developers now can incorporate agents’ advanced reasoning directly into their development workflow.
Combining the power of these agents with Xcode’s native capabilities provides the best results when developing for Apple platforms, according to Apple. Also new in Xcode 26.3 is support for Model Context Protocol (MCP), an open standard that gives developers the flexibility to use any compatible agent or tool with Xcode.
GitHub eyes restrictions on pull requests to rein in AI-based code deluge on maintainers 4 Feb 2026, 3:53 am
GitHub helped open the floodgates to AI-written code with its Copilot. Now it’s considering closing the door, at least in part, for the short term.
GitHub is exploring what already seems like a controversial idea that would allow maintainers of repositories or projects to delete pull requests (PRs) or turn off the ability to receive pull requests as a way to address an influx of low-quality, often AI-generated contributions that many open-source projects are struggling to manage.
Last week, GitHub product manager Camilla Moraes posted a community discussion thread on GitHub seeking feedback on the solutions that it was mulling in order to address the “increasing volume of low-quality contributions” that was creating significant operational challenges for maintainers of open source projects and repositories.
“We’ve been hearing from you that you’re dedicating substantial time to reviewing contributions that do not meet project quality standards for a number of reasons — they fail to follow project guidelines, are frequently abandoned shortly after submission, and are often AI-generated,” Moraes wrote before listing out the solutions being planned.
AI is breaking the trust model of code review
Several users in the discussion thread agreed with Moraes’ principle that AI-generated code was creating challenges for maintainers.
Jiaxiao Zhou, a software engineer on Microsoft’s Azure Container Upstream team and maintainer of Containerd’s Runwasi project and SpinKube, for one, pointed out that AI-generated code was making it unsustainable for maintainers to review line by line for any code that is shipped.
The maintainer lists several reasons, such as a breakdown in the trust model behind code reviews, where reviewers can no longer assume contributors fully understand what they submit; the risk that AI-generated pull requests may appear structurally sound while being logically flawed or unsafe; and the fact that line-by-line reviews remain mandatory for production code but do not scale to large, AI-assisted changes.
To address these challenges in the short term, Moraes wrote that GitHub was planning to provide configurable pull request permissions, meaning the ability to control pull request access at a more granular level by restricting contributions to collaborators only or disabling PRs for specific use cases like mirror repositories.
“This also eliminates the need for custom automations that various open source projects are building to manage contributions,” Moraes wrote.
Community pushes back on disabling or deleting pull requests
However, the specific suggestion on disabling PRs was met with skepticism.
A user with the handle ThiefMaster suggested that GitHub should not look at restricting access to previously opened PRs as it may lead to loss of content or access for someone. Rather, the user suggested that GitHub should allow users to access them with a direct link.
Moraes, in response, seemed to align with the user’s view and said that GitHub may include the user’s suggestion.
In addition, GitHub is also mulling the idea of providing maintainers with the ability to remove spam or low-quality PRs directly from the interface to improve repository organization.
This suggestion was met with even more skepticism from users.
While ThiefMaster suggested that GitHub should allow a limited timeframe for a maintainer to delete a PR, most likely due to low activity, other users, such as Tibor Digana, Hayden, and Matthew Gamble, were completely against the idea.
The long-term suggestions from Moraes, which included using AI-based tools to help maintainers weed out “unnecessary” submissions and focus on the worthy ones, too, received considerable flak.
While Moraes and GitHub argue that these AI-based tools would cut down the time required to review submissions, users, such as Stephen Rosen, argue that AI-based tools would do quite the opposite because they are prone to hallucinations, forcing the maintainer to go through each line of code anyways.
AI should reduce noise, not add uncertainty
Paul Chada, co-founder of agentic AI software startup Doozer AI, argued that the usefulness of AI-based review tools will hinge on the strength of the guardrails and filters built into them.
Without those controls, he said, such systems risk flooding maintainers with submissions that lack project context, waste review time, and dilute meaningful signal.
“Maintainers don’t want another system they have to second-guess. AI should act like a spam filter or assistant, not a reviewer with authority. Used carefully, it reduces noise. Used carelessly, it adds a new layer of uncertainty instead of removing one,” Chada said.
GitHub has also made other long-term suggestions to address the cognitive load on reviewers, such as improved visibility and attribution when AI tools are used throughout the PR lifecycle, and more granular controls for determining who can create and review PRs beyond blocking all users or restricting to collaborators only.
Azure outage disrupts VMs and identity services for over 10 hours 4 Feb 2026, 3:36 am
Microsoft’s Azure cloud platform suffered a broad multi-hour outage beginning on Monday evening, disrupting two critical layers of enterprise cloud operations. The outage, which lasted over 10 hours, began at 19:46 UTC on Monday and was resolved by 06:05 UTC on Tuesday.
The incident initially left customers unable to deploy or scale virtual machines in multiple regions. This was followed by a related platform issue with the Managed Identities for Azure Resources service in the East US and West US regions between 00:10 UTC and 06:05 UTC on Tuesday. The disruption also briefly affected GitHub Actions.
Policy change at the core of disruption
A policy change unintentionally applied to a subset of Microsoft-managed storage accounts, including those used to host virtual machine extension packages, led to this outage. The change blocked public read access that disrupted scenarios such as virtual machine extension package downloads, according to Microsoft’s status history.
Logged under tracking ID FNJ8-VQZ, some customers experienced failures when deploying or scaling virtual machines, including errors during provisioning and lifecycle operations. Other services were impacted as well.
Azure Kubernetes Service users experienced failures in node provisioning and extension installation, while Azure DevOps and GitHub Actions users faced pipeline failures when tasks required virtual machine extensions or related packages. Operations that required downloading extension packages from Microsoft-managed storage accounts also saw degraded performance.
Although an initial mitigation was deployed within about two hours, it triggered a second platform issue involving Managed Identities for Azure Resources. Customers attempting to create, update, or delete Azure resources, or acquire Managed Identity tokens, began experiencing authentication failures.
Microsoft’s status history page, logged under tracking ID M5B-9RZ, acknowledged that following the earlier mitigation, a large spike in traffic overwhelmed the managed identities platform service in the East US and West US regions.
This impacted the creation and use of Azure resources with assigned managed identities, including Azure Synapse Analytics, Azure Databricks, Azure Stream Analytics, Azure Kubernetes Service, Microsoft Copilot Studio, Azure Chaos Studio, Azure Database for PostgreSQL Flexible Servers, Azure Container Apps, Azure Firewall, and Azure AI Video Indexer.
After multiple infrastructure scale-up attempts failed to handle the backlog and retry volumes, Microsoft ultimately removed traffic from the affected service to repair the underlying infrastructure without load.
“The outage didn’t just take websites offline, but it halted development workflows and disrupted real-world operations,” said Pareekh Jain, CEO at EIIRTrend & Pareekh Consulting.
Cloud outages on the rise
Cloud outages have become more frequent in recent years, with major providers such as AWS, Google Cloud, and IBM all experiencing high-profile disruptions. AWS services were severely impacted for more than 15 hours when a DNS problem rendered the DynamoDB API unreliable.
In November, a bad configuration file in Cloudflare’s Bot Management system led to intermittent service disruptions across several online platforms. In June, an invalid automated update disrupted the company’s identity and access management (IAM) system, resulting in users being unable to use Google to authenticate on third-party apps.
“The evolving data center architecture is shaped by the shift to more demanding, intricate workloads driven by the new velocity and variability of AI. This rapid expansion is not only introducing complexities but also challenging existing dependencies. So any misconfiguration or mismanagement at the control layer can disrupt the environment,” said Neil Shah, co-founder and VP at Counterpoint Research.
Preparing for the next cloud incident
This is not an isolated incident. For CIOs, the event only reinforces the need to rethink resilience strategies.
In the immediate aftermath when a hyperscale dependency fails, waiting is not a recommended strategy for CIOs, and they should focus on a strategy of stabilize, prioritize, and communicate, stated Jain. “First, stabilize by declaring a formal cloud incident with a single incident commander, quickly determining whether the issue affects control-plane operations or running workloads, and freezing all non-essential changes such as deployments and infrastructure updates.”
Jain added that the next step is to prioritize restoration by protecting customer-facing run paths, including traffic serving, payments, authentication, and support, and, if CI/CD is impacted, shifting critical pipelines to self-hosted or alternate runners while queuing releases behind a business-approved gate. Finally, communicate and contain by issuing regular internal updates that clearly state impacted services, available workarounds, and the next update time, and by activating pre-approved customer communication templates if external impact is likely.”
Shah noted that these outages are a clear warning for enterprises and CIOs to diversify their workloads across CSPs or go hybrid and add necessary redundancies. To prevent future outages from impacting operations, they should also manage the size of the CI/CD pipelines and keep them lean and modular.
Even the real-time vs non-real-time scaling strategy, especially for crucial code or services, should be well thought through. CIOs should also have a clear understanding and operational visibility of hidden dependencies, knowing what could be impacted in such scenarios, and have a robust mitigation plan.
AI is not coming for your developer job 4 Feb 2026, 1:00 am
It’s easy to see why anxiety around AI is growing—especially in engineering circles. If you’re a software engineer, you’ve probably seen the headlines: AI is coming for your job.
That fear, while understandable, does not reflect how these systems actually work today, or where they’re realistically heading in the near term.
Despite the noise, agentic AI is still confined to deterministic systems. It can write, refactor, and validate code. It can reason through patterns. But the moment ambiguity enters the equation—where human priorities shift, where trade-offs aren’t binary, where empathy and interpretation are required—it falls short.
Real engineering isn’t just deterministic. And building products isn’t just about code. It’s about context—strategic, human, and situational—and right now, AI doesn’t carry that.
Agentic AI as it exists today
Today’s agentic AI is highly capable within a narrow frame. It excels in environments where expectations are clearly defined, rules are prescriptive, and goals are structurally consistent. If you need code analyzed, a test written, or a bug flagged based on past patterns, it delivers.
These systems operate like trains on fixed tracks: fast, efficient, and capable of navigating anywhere tracks are laid. But when the business shifts direction—or strategic bias changes—AI agents stay on course, unaware the destination has moved.
Sure, they will produce output, but their contribution will either be sideways or negative instead of progressing forward, in sync with where the company is going.
Strategy is not a closed system
Engineering doesn’t happen in isolation. It happens in response to business strategy—which informs product direction, which informs technical priorities. Each of these layers introduces new bias, interpretation, and human decision-making.
And those decisions aren’t fixed. They shift with urgency, with leadership, with customer needs. A strategy change doesn’t cascade neatly through the organization as a deterministic update. It arrives in fragments: a leadership announcement here, a customer call there, a hallway chat, a Slack thread, a one-on-one meeting.
That’s where interpretation happens. One engineer might ask, “What does this shift mean for what’s on my plate this week?” Faced with the same question, another engineer might answer it differently. That kind of local, interpretive decision-making is how strategic bias actually takes effect across teams. And it doesn’t scale cleanly.
Agentic AI simply isn’t built to work that way—at least not yet.
Strategic context is missing from agentic systems
To evolve, agentic AI needs to operate on more than static logic. It must carry context—strategic, directional, and evolving.
That means not just answering what a function does, but asking whether it still matters. Whether the initiative it belongs to is still prioritized. Whether this piece of work reflects the latest shift in customer urgency or product positioning.
Today’s AI tools are disconnected from that layer. They don’t ingest the cues that product managers, designers, or tech leads act on instinctively. They don’t absorb the cascade of a realignment and respond accordingly.
Until they do, these systems will remain deterministic helpers—not true collaborators.
What we should be building toward
To be clear, the opportunity isn’t to replace humans. It’s to elevate them—not just by offloading execution, but by respecting the human perspective at the core of every product that matters.
The more agentic AI can handle the undifferentiated heavy lifting—the tedious, mechanical, repeatable parts of engineering—the more space we create for humans to focus on what matters: building beautiful things, solving hard problems, and designing for impact.
Let AI scaffold, surface, validate. Let humans interpret, steer, and create—with intent, urgency, and care.
To get there, we need agentic systems that don’t just operate in code bases, but operate in context. We need systems that understand not just what’s written, but what’s changing. We need systems that update their perspective as priorities evolve.
Because the goal isn’t just automation. It’s better alignment, better use of our time, and better outcomes for the people who use what we build.
And that means building tools that don’t just read code, but that understand what we’re building, who it’s for, what’s at stake, and why it matters.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
Six reasons to use coding agents 4 Feb 2026, 1:00 am
I use coding agents every day. I haven’t written a line of code for any of my side projects in many weeks. I don’t use coding agents in my day job yet, but only because the work requires a deeper knowledge of a huge code base than most large language models can deal with. I expect that will change.
Like everyone else in the software development business, I have been thinking a lot about agentic coding. All that thinking has left me with six thoughts about the advantages of using coding agents now, and how agentic coding will change developer jobs in the future.
Coding agents are tireless
If I said to a junior programmer, “Go and fix all 428 hints in this code base, and when you are done, fix all 434 warnings,” they would probably say, “Sure!,” and then complain about the tediousness of the task to all of their friends at the lunch table.
But if I were to ask a coding agent to do it, it would do it without a single complaint. Not one. And it would certainly complete the task a hundred times faster than that junior programmer would. Now, that’s nothing against humans, but we aren’t tireless. We get bored, and our brains turn fuzzy with endless repetitive work.
Coding agents will grind away endlessly, tirelessly, and diligently at even the most mind-numbingly boring task that you throw at them.
Coding agents are slavishly devoted
Along those same lines, coding agents will do what you ask. Not that people won’t, but coding agents will do pretty much exactly what you ask them to do. Now, this does mean that we need to be exacting about what we ask for. A high level of fastidious prompting is the next real skill that developers will need. It’s probably both a blessing and a curse, but if you ask for a self-referential sprocket with cascading thingamabobs, a coding agent will build one for you.
Coding agents ask questions you didn’t think to ask
One thing I always do when I prompt a coding agent is to tell it to ask me any questions that it might have about what I’ve asked it to do. (I need to add this to my default system prompt…) And, holy mackerel, if it doesn’t ask good questions. It almost always asks me things that I should have thought of myself.
Coding agents mean 10x more great ideas
If you can get a coding agent to code your idea in two hours, then you could undoubtedly produce many more ideas over the course of a month or a year. If you can add a major feature to your existing app in a day, then you could add many more major features to your app in a month than you could before. When coding is no longer the bottleneck, you will be limited not by your ability to code but by your ability to come up with ideas for software.
Ultimately, this last point is the real kicker. We no longer have to ask, “Can I code this?” Instead, we can ask ourselves, “What can I build?” If you can add a dozen features a month, or build six new applications a week, the real question becomes, “Do I have enough good ideas to fill out my work day?”
Coding agents make junior developers better
A common fear is that inexperienced developers won’t be employable because they can’t be trusted to monitor the output of coding agents. I’m not sure that is something to worry about.
Instead of mentoring juniors to be better coders, we’ll mentor them to be better prompt writers. Instead of saying, “Code this,” we’ll be saying, “Understand this problem and get the coding agent to implement it.”
Junior developers can learn what they need to know in the world of agentic coding just like they learn what they need to know in the world of human coding. With coding agents, juniors will spend more time learning key concepts and less time grinding out boilerplate code.
Everyone has to start somewhere. Junior developers will start from scratch and head to a new and different place.
Coding agents will not take your job
A lot of keyboards have been pounded over concerns that coding agents are coming for developer jobs. However, I’m not the least bit worried that software developers will end up selling apples on street corners. I am worried that some developers will struggle to adapt to the changing nature of the job.
Power tools didn’t take carpentry jobs away; they made carpenters more productive and precise. Developers might find themselves writing a lot less code, but we will be doing the work of accurately describing what needs implementing, monitoring those implementations, and making sure that our inputs to the coding agents are producing the correct outputs.
We may even end up not dealing with code at all. Ultimately, software development jobs will be more about ideas than syntax. And that’s good, right?
4 self-contained databases for your apps 4 Feb 2026, 1:00 am
For any application that works with a lot of data, the obvious approach is to store it in some kind of database. Depending on your ambitions, that can be something as modest as SQLite, or as upscale as PostgreSQL.
What often complicates things is not which database to use, but how to use it. Enterprise-scale databases like PostgreSQL give you great power but impose an infrastructure tax: Setting up and configuring a database server is nobody’s idea of fun. Docker containers are an option, but if you’re not already invested in Docker as a solution, moving everything to containers is an outsized project.
One reaction to all this has been the rise of self-contained database solutions. Everything is packed into a single executable or living in a single directory, with no external dependencies and no formal installation. SQLite is the obvious example of this style, but it’s not always the most robust solution. There are more powerful databases that do the same thing—including some familiar A-list options. Even better, some come as drop-in application libraries for popular languages.
MariaDB
If you want a fully self-contained instance of MariaDB, you have three choices: Create one yourself with a little work; use a third-party pre-packaged version; or use a portable full-stack solution that includes a web server and other tools along with the database.
Nathaniel Sabanski has outlined the steps required to manually set up MariaDB in a standalone edition. Download and unpack the binaries, create a basic configuration file, then start the server and any management tools you’d need along with it. Any upgrades or maintenance are entirely on you, but the gist describes how to get it done without too much sweat. Note that if you’re using Microsoft Windows, get the .ZIP file to make it easier to extract, and use the mysql_install_db.exe application to set things up (the defaults should be fine).
For even more convenient downloading and unpacking, Andy Targino provides self-contained versions of the MariaDB binaries.
If you need to stand up a web server along with the database, and maybe a few other components, too, look to the XAMPP stack. This all-in-one solution contains MariaDB plus the Apache web server, the PHP runtime, the Mercury SMTP mail server, web-based controls for all the components, and a service manager for the desktop. It even includes OpenSSL for proper https support.
PostgreSQL
Various repackagings of PostgreSQL as a standalone application have come and gone over the years (see this project, for instance), but it takes relatively little work to set up your own standalone PostgreSQL application. Obtain the binaries minus the setup tools, unpack them into a directory, and run initdb to configure the basic setup. You can then use pg_ctl to start and stop the database as needed.
Python developers have a truly slick option for adding a self-contained PostgreSQL instance to an application: pgserver, a pip install-able library that contains a fully standalone instance of PostgreSQL. The entire app, binaries and all, lives in your Python program’s virtual environment. It does add about 30MB to the base footprint of the venv, but the resulting convenience is hard to match.
MongoDB
MongoDB binaries are available without an installer, so in theory they can be run from any directory. But if you want tighter integration with a given application without a formal install, that’s possible.
What the pgserver library does for PostgreSQL and Python, portable-mongodb does for MongoDB and Node.js. Add it to a Node project and you’ll have a private, automatically deployed instance of MongoDB for that project, or for other project environments if you so choose.
If you want to do this yourself, and want more granular control over the deployment, Atikur Rahman has published step-by-step instructions for creating a self-contained MongoDB instance. It’s less complex than you might think.
Redis
On Linux, Windows, and macOS, a common way to run Redis is to simply use a Docker container, or to install by way of a package manager (APT, RPM, Snap, Homebrew). But for a fully portable experience without Docker on Windows, a third-party build of Redis provides standalone Windows binaries. These are built directly from source and kept in synchrony with recent releases.
Vercel revamps AI-powered v0 development platform 3 Feb 2026, 7:02 pm
Centered on vibe coding, AI-powered Vercel’s v0 development platform has been fitted with security and integrations geared to shipping real software instead of just making demos, the company said. The platform has been rebuilt from the ground up to close the prototype-to-production gap for vibe coding in the enterprise, according to the company.
Called the “new v0” and detailed in a February 3 blog post, the update evolves the vibe coding platform for building production apps and agents. Developers can log in to v0.app to give the release a try. For security, v0 is built on the Vercel core cloud platform, where security is configurable for common compliance needs, Vercel said. Users can set deployment protection requirements, connect securely to enterprise systems, and set proper access controls for each app. Also featured are secure integrations with Snowflake and AWS databases, enabling the building of custom reporting, adding rich context to internal tools, and automating data-triggered processes. All code generated by v0 is designed to plug into Vercel’s standard Git-based workflows and its preview and production deployment infrastructure on the Vercel cloud platform.
The new v0 release also has a new sandbox-based runtime that can import any GitHub repo and automatically pull environment variables and configurations from Vercel. Every prompt generates production-ready code in a real environment, and it lives in the user’s repo, Vercel said. A new Git panel lets developers create a new branch for each chat, open pull requests against a project’s main branch in the connected GitHub repository, and deploy on merge. Anyone on a team, not just engineers, can ship production code through proper Git workflows, according to Vercel. Future plans call for enabling developers to build end-to-end agentic workflows in v0, AI models included, and deploy them on Vercel self-driving infrastructure.
Snowflake: Latest news and insights 3 Feb 2026, 9:41 am
Snowflake (NYSE:SNOW) has rapidly become a staple for data professionals and has arguably changed how cloud developers, data managers and data scientists interact with data. Its architecture is designed to decouple storage and compute, allowing organizations to scale resources independently to optimize costs and performance.
For cloud developers, Snowflake’s platform is built to be scalable and secure, allowing them to build data-intensive applications without needing to manage underlying infrastructure. Data managers benefit from its data-sharing capabilities, which are designed to break down traditional data silos and enable secure, real-time collaboration across departments and with partners.
Data scientists have gravitated to Snowflake’s capability to handle large, diverse datasets and its integration with machine learning tools. Snowflake is designed to rapidly prepare raw data, build, train, and deploy models directly within the platform to achieve actionable insights.
Watch this page for the latest on Snowflake.
Snowflake latest news and analysis
Snowflake debuts Cortex Code, an AI agent that understands enterprise data context
February 3, 2026: Snowflake has launched Cortex Code, an AI-based coding agent that aims to extend AI assistance beyond SQL generation and conversational analytics into data and app development tasks.
Snowflake to acquire Observe to boost observability in AIops
January 9, 2026: Snowflake plans to acquire AI-based SRE platform provider Observe to strengthen observability capabilities across its offerings and help enterprises with AIOps as they accelerate AI pilots into production.
Snowflake software update caused 13-hour outage across 10 regions
December 19, 2025: A software update knocked out Snowflake’s cloud data platform in 10 of its 23 global regions for 13 hours on December 16, leaving customers unable to execute queries or ingest data.
Snowflake to acquire Select Star to enhance its Horizon Catalog
November 21, 2025: Snowflake has signed an agreement to acquire startup Select Star’s team and context metadata platform to enhance its Horizon Catalog offering, the company said in a statement. Horizon Catalog is a unified data discovery, management, and governance suite inside the cloud-based data warehouse provider’s Data Cloud offering.
Databricks fires back at Snowflake with SQL-based AI document parsing
November 13, 2025: Databricks and Snowflake are at it again, and the battleground is now SQL-based document parsing. In an intensifying race to dominate enterprise AI workloads with agent-driven automation, Databricks has added SQL-based AI parsing capabilities to its Agent Bricks framework, just days after Snowflake introduced a similar ability inside its Intelligence platform.
Snowflake to acquire Datometry to bolster its automated migration tools
November 11, 2025: Snowflake will acquire San Francisco-headquartered startup Datometry, for an undisclosed sum, to bolster SnowConvert AI, one of its existing set of migration tools.
Snowflake brings analytics workloads into its cloud with Snowpark Connect for Apache Spark
July 29, 2025: Snowflake plans to run Apache Spark analytics workloads directly on its infrastructure, saving enterprises the trouble of hosting an Apache Spark instance elsewhere, and eliminating data transfer delays between it and the Snowflake Data Cloud.
Snowflake customers must choose between performance and flexibility
June 4, 2025: Snowflake is boosting the performance of its data warehouses and introducing a new adaptive technology to help enterprises optimize compute costs. Adaptive Warehouses, built atop Snowflake’s Adaptive Compute, is designed to lower the burden of compute resource management by maximizing efficiency through resource sizing and sharing,
Snowflake takes aim at legacy data workloads with SnowConvert AI migration tools
June 3, 2025: Snowflake is hoping to win business with a new tool for migrating old workloads. SnowConvert AI is designed to help enterprises move their data, data warehouses, business intelligence (BI) reports, and code to its platform without increasing complexity.
Snowflake launches Openflow to tackle AI-era data ingestion challenges
June 3, 2025: Snowflake introduced a multi-modal data ingestion service — Openflow — designed to help enterprises solve challenges around data integration and engineering in the wake of demand for generative AI and agentic AI use cases.
Snowflake acquires Crunchy Data to counter Databricks’ Neon buy
June 3, 2025: Snowflake plans to buy Crunchy Data,a cloud-based PostgreSQL database provider, for an undisclosed sum. The move is an effort to offer developers an easier way to build AI-based applications by offering a PostgreSQL database in its AI Data Cloud. The deal, according to the Everest Group ,is an answer to rival Databricks’ acquisition of open source serverless Postgres company Neon.
Snowflake’s Cortex AISQL aims to simplify unstructured data analysis
June 3, 2025: Snowflake is adding generative AI-powered SQL functions to help organizations analyze unstructured data with SQL. The new AISQL functions will be part of Snowflake’s Cortex, managed service inside its Data Cloud providing the building blocks for using LLMs without the need to manage complex GPU-based infrastructure.
Snowflake announces preview of Cortex Agent APIs to power enterprise data intelligence
February 12, 2025: Snowflake announced the public preview of Cortex Agents, a set of APIs built on top of the Snowflake Intelligence platform, a low-code offering that was first launched in November at Build, the company’s annual developer conference.
Snowflake open sources SwiftKV to reduce inference workload costs
January 16, 2025: Cloud-based data warehouse company Snowflake has open-sourced a new proprietary approach — SwiftKV — designed to reduce the cost of inference workloads for enterprises running generative AI-based applications. SwiftKV was launched in December.
OpenAI launches Codex app as enterprises weigh autonomous AI coding tools 3 Feb 2026, 1:47 am
OpenAI has launched a standalone Codex app to manage multiple AI coding agents across projects, pushing beyond chat-based code generation as enterprises weigh the impact of more autonomous tools on development workflows and governance.
The move comes as OpenAI faces intensifying competition from rivals such as Anthropic and GitHub. Last month, Anthropic introduced Cowork, a research-preview feature that extends Claude Code beyond programming into broader enterprise workflows.
“The Codex app provides a focused space for multi-tasking with agents,” OpenAI said in a statement, noting that agents “run in separate threads organized by projects,” allowing developers to switch between tasks without losing context.
The Codex app is currently available on macOS for users on OpenAI’s paid ChatGPT plans. OpenAI said it plans to make it available on Windows.
Codex is also moving beyond code generation, with “skills” that allow agents to gather information, solve problems, and carry out broader tasks on a developer’s computer, the company added.
Implications for developers
Agentic AI-based coding is gaining interest in more enterprise development teams, pushing workflows beyond traditional IDE-centric models.
“These agent-driven development shells can speed up coding, debugging, and deployment, although they introduce higher enterprise risks,” said Neil Shah, VP for research at Counterpoint Research.
Based on early indications, OpenAI has taken a meaningful but incremental step with Codex in AI-assisted development, rather than a radical change, said Tulika Sheel, senior vice president at Kadence International.
“It doesn’t change the fundamentals of how code is written or reviewed, but it does streamline workflows by letting developers manage longer, more complex coding tasks in one place rather than through scattered IDE prompts,” Sheel said. “Over time, this could subtly reshape how developers plan, review, and maintain code by treating AI as a continuous collaborator rather than a helper used on a moment-to-moment basis.”
A standalone app also signals a shift from AI assisting with individual lines of code to handling larger blocks of work, said Abhivyakti Sengar, practice director at Everest Group. “Developers spend less time typing and more time reviewing and directing, closer to managing a junior engineer than using an autocomplete tool,” Sengar said.
Risks and challenges
The risks of using AI remain a topic of debate among enterprises. Concerns are likely to increase as multi-agent systems take on a larger role in the software development lifecycle.
“Autonomous AI coders need the same levels of oversight as human ones,” Sheel added. “This includes review, accountability, and clear ownership of the code they produce.”
Companies would also need clarity on intellectual property ownership and licensing to avoid surprises when AI-generated code is reused or shipped.
“Also important is maintaining control over the workflow layer,” Shah added. “Enterprises should opt for tools that support open integration with existing systems such as GitHub to avoid being locked into vertically integrated AI IDEs.”
Vendor lock‑in could also become a real concern as models and agents learn deeply from an enterprise’s code and workflows, analysts say.
“Prioritizing tools that embrace open standards for agentic protocols and workflow while promising transparency around data and IP handling should be non-negotiable,” Shah said. “This, coupled with a stronger governance framework from token usage monitoring, policy enforcement, and auditable controls, will be key to ensuring these tools do not compromise on enterprise sovereignty or security.”
Snowflake debuts Cortex Code, an AI agent that understands enterprise data context 3 Feb 2026, 1:33 am
Snowflake has launched Cortex Code, an AI-based coding agent that aims to extend AI assistance beyond SQL generation and conversational analytics into data and app development tasks.
Cortex Code is designed to understand enterprise data context, including schemas, governance rules, compute constraints, and production workflows, Christian Kleinerman, EVP of Product at Snowflake, said during a press briefing.
[ Related: More Snowflake news and insights ]
As a result, developers and data teams can use natural language to build, optimize, and deploy data pipelines, analytics, machine learning (ML) workloads, and AI agents, Kleinerman added.
Snowflake’s existing AI features, such as Cortex AISQL and Snowflake Intelligence, help users query and analyze data.
Analysts say the native coding agent can enable enterprise teams to move from experimentation to scalable data and app deployments faster, as the agent, unlike generic coding agents, understands critical contexts such as which tables are sensitive, which transformations are expensive, which pipelines are production-critical, and how analytics, ML, and agents are supposed to work together.
That contextual understanding, according to HyperFRAME Research’s practice leader of AI stack Stephanie Walter, can reduce the manual effort required to move from an experimental idea to a reliable, governed solution that is ready to run at enterprise scale.
More so because the real risk for enterprises isn’t bad code but code that breaks governance, is expensive, or can’t scale, Walter added.
Cortex Code as a CLI tool in code editors
In order to further help enterprises scale production-grade data and AI applications, Cortex Code, apart from being made available in Snowsight, is also being made available as a command-line interface (CLI) tool via code editors like VS Code and Cursor.
The deployment as a CLI tool, according to Moor Insights and Strategy principal analyst Robert Kramer, will help developers retain their enterprise’s data context, specifically data stored in Snowflake, while still being available in the code editor of their choice, locally on their machines.
The retention of context is important at the code editor level as most development work starts there, Kramer said.
Additionally, since the enterprise context is built in at the pilot stage locally, there are fewer chances of it failing in production or when it hits the warehouse, Kramer noted.
“The same Snowflake-aware agent that helps you prototype in local development workflows can follow the work into Snowflake Workspaces, Notebooks, and production pipelines. That continuity reduces the rewrite and revalidation steps where many AI pilots stall.”
Rival data warehouse and data cloud providers are pursuing comparable strategies to embed AI assistance deeper into data and application development workflows.
While Databricks is focusing more on notebook-centric development and in-platform assistants rather than local-first workflows, Google Cloud is leaning toward analyst-driven discovery through BigQuery, Looker, and Gemini, with a stronger emphasis on in-platform experiences than on local-first, Snowflake-style continuity across environments, Kramer pointed out.
Teradata, on the other hand, is prioritizing agent orchestration, governance, and control over developer ergonomics.
“The right choice depends on whether an organization’s biggest bottleneck is experimentation, governance, or operationalizing AI at scale,” Kramer said.
Cortex Code in Snowsight, which is Snowflake’s web interface, will be made generally available soon, the company said, adding that the CLI version has been made available.
Weighing the benefits of AWS Lambda’s durable functions 3 Feb 2026, 1:00 am
The pace of innovation in cloud computing continues unabated, with AWS recently unveiling durable functions for AWS Lambda. This move could have significant implications for how enterprises design and orchestrate complex serverless workflows. As someone who has tracked and analyzed cloud evolution for decades, I view Amazon’s new offering as both a testament to the maturation of serverless practices and an opportunity for organizations to reassess the fit and risks of deeper integration with AWS’s serverless ecosystem.
A common criticism of serverless computing is its limited ability to orchestrate multistep or long-running workflows. AWS Lambda, arguably the most well-known serverless compute service, excels at handling single, stateless, short-lived tasks such as image processing, data transformation, or lightweight back-end APIs. However, for complex operations like order processing, multistage onboarding, or AI-driven decisions that span hours or weeks, the existing Lambda model requires substantial custom code or external orchestration tools, such as AWS Step Functions or even third-party platforms.
AWS Lambda’s new durable functions directly address this pain point with native state management, automatic checkpointing, and workflow orchestration. Developers can now define a sequence of steps and introduce waits of up to a year without incurring charges during pauses, improving efficiency for workflows with delays or dependencies. This is a significant economic benefit for workflows subject to lengthy delays or external dependencies.
This type of orchestration differentiates proof-of-concept serverless apps from resilient, production-grade systems that withstand partial failures, recover from interruptions, and remain cost-effective throughout their life cycle. Durable functions provide templated state management, built-in error handling, and robust failure recovery, all of which reduce the engineering burden. Consequently, serverless becomes better suited to enterprise-level, workflow-centric applications where reliability and agility are essential.
A meaningful step, but not a panacea
So, should every enterprise jump to refactor their applications around Lambda’s durable functions? The answer, as always, depends on the circumstances.
On the positive side, durable functions extend the serverless model for organizations already using Lambda, offering first-class support for Python (3.13, 3.14) and Node.js (22, 24), and integration with AWS’s CLI, SDK, and orchestration tools. They lower entry barriers for teams already familiar with AWS, simplifying app development that previously relied on container-based or traditional VM-driven architectures. Teams that prioritize agility, business logic, and rapid experimentation will likely realize significant value from durable functions’ abstraction of infrastructure and orchestration.
However, organizations must weigh the trade-offs of deepening serverless adoption, especially with proprietary abstractions like durable functions. Serverless models promote agility and efficiency, but can also increase vendor dependence. For example, migrating complex workflows from AWS Lambda durable functions to another cloud platform (or back to on-premises infrastructure) will be costly and complex because the code relies on AWS-specific APIs and orchestration that don’t translate directly to Microsoft Azure, Google Cloud, or open source options.
There’s also a broader architectural consideration. Serverless, by its very nature, expects statelessness and composability, but it also introduces new patterns for observability, testing, and operational troubleshooting. While AWS Lambda durable functions make workflow orchestration less burdensome, they also increase the “magic” that must happen behind the curtain, sometimes making debugging and understanding cross-step failures more challenging. Enterprisewide visibility, compliance, and cost control require investments in new monitoring practices and possibly some third-party or proprietary tools.
Pros and cons of serverless lock-in
Some in the cloud community have taken a myopic approach to vendor lock-in, sounding alarms at any whiff of proprietary technology adoption. In reality, completely avoiding lock-in isn’t practical, and seeking absolute portability can undermine access to genuine innovation, such as Lambda durable functions. The calculus should focus on risk management and exit strategies: Does the value delivered by automation, embedded error recovery, and operational efficiency justify the increased dependency on a specific cloud provider at this stage of your evolution?
For enterprises aggressively pursuing digital transformation, serverless and Lambda durable functions may align with near-term and medium-term goals, offering agility, lower overhead, and a better developer experience. However, investments in cloud-native features should balance current benefits with long-term flexibility. For extensive multicloud or future-proofing needs, consider encapsulating some application logic outside these proprietary serverless constructs or adopting architectures that decouple workflows from the cloud runtime.
Looking past the hype cycle
AWS Lambda durable functions are a significant innovation. For the right workloads, they dramatically raise the ceiling on what’s achievable with serverless. They help realize the original vision of serverless and enable developers to focus on business logic rather than distributed state management or infrastructure plumbing. However, they’re not a blanket prescription for all workloads, and they’re not exempt from the strategic concerns that have always accompanied rapid platform innovation.
Enterprise architects and leaders now face a familiar balancing act. The key is to recognize where Llambda durable functions deliver transformative value and where they risk pulling your architecture into a pattern that is hard to unwind later. As always, the best approach is a clear-eyed, strategic assessment guided by organizational priorities—never fad-driven adoption for its own sake.
To that end, enterprises considering AWS Lambda durable functions should weigh the following:
- The degree of vendor lock-in
- Migration costs
- The fit with existing skill sets and development workflows
- The maturity of monitoring and observability solutions
- Regulatory and compliance implications
- Total cost of ownership versus more portable or traditional alternatives
Wise adoption requires more than technical enthusiasm—it demands business and architectural discipline. AWS Lambda durable functions are a meaningful evolution for serverless, but they become truly transformative only in the context of an informed, balanced enterprise strategy.
Building AI agents with the GitHub Copilot SDK 3 Feb 2026, 1:00 am
GitHub Copilot is one of the more mature AI assistants in use, having begun life as a way to use AI tools for code completion. Since then, Copilot has added features, becoming a resource for coordinating and orchestrating a wide variety of development-focused agents and services. Part of that development has been making Copilot available everywhere developers are: inside the browser, in your code editor, and now, with the Copilot CLI, in your terminal.
GitHub Copilot is especially useful when combined with Microsoft services like the Work IQ MCP server, enabling you to build prompts that combine code with specifications, and more importantly, with the discussions that take place outside traditional development environments. A missed email may hold a key feature request or an important user story. Using the Copilot CLI and Work IQ to surface the requirement and convert it into code helps reduce the risk of later rework and ensures that a project is still aligned to business needs.
Having GitHub Copilot at your fingertips is surprisingly helpful, but it does require switching context from application to terminal and back again. With it embedded in Visual Studio and Visual Studio Code, that isn’t too much of a problem, but what if you wanted to take that model and implement it inside your own applications? Building an agent command-and-control loop isn’t easy, even if you’re using platforms like Semantic Kernel.
Introducing the GitHub Copilot SDK
This is where the new GitHub Copilot SDK offers a solution. If you’ve got the Copilot CLI binaries installed on your device, whether it’s Windows, macOS, or Linux, then you can use the SDK (and the CLI’s own JSON APIs) to embed the Copilot CLI in your code, giving you access to its orchestration features and to GitHub’s Model Context Protocol (MCP) registry. Having an integrated registry simplifies the process of discovery and installation for MCP servers, quickly bringing new features into your agent applications.
When your code uses the GitHub Copilot SDK with the CLI, it’s using it as a server. This lets it run headless, so you won’t see the interactions between your code and the underlying agents. You don’t need to run the CLI on devices running GitHub Copilot SDK applications; thanks to remote access, you can install it on a central server. Users will still need a GitHub Copilot license to use GitHub Copilot SDK applications though, even if they’re working against a remote instance.
The SDK lets you treat the Copilot CLI server as a tool for executing and managing models and MCP servers instead of having to build your own, which simplifies development significantly. Running the CLI in server mode is a simple matter of launching it as a server and defining the port used for prompting. You can then use the fully qualified domain name of the machine running the server and the defined port as a connection string.
Once the CLI is installed, add the appropriate SDK dialect. Official support is provided for JavaScript and TypeScript via Node.js, .NET, Python, and Go. Node support is hosted in npm, .NET in NuGet, and Python in pip. For now, the Go SDK can be found in the project’s GitHub repository and can be installed using Go’s get command. It’s a fast-moving project, so be sure to regularly check for updates.
Using the GitHub Copilot SDK
Calling the SDK is simple enough. In .NET you start by creating a new CopilotClient and adding a session with a support model before sending an asynchronous session message to the Copilot CLI containing a prompt. The response has the agent’s answer and can then be managed by your code.
This is one of the simplest ways to interact with a large language model (LLM) from your code, with similar approaches used in other languages and platforms. You’ll need to write code to display or parse the data returned from the SDK, so it’s useful to build base prompts in your code that can force the underlying GitHub Copilot LLM to return formats that can be parsed easily.
The default is to wait for Copilot CLI to return a complete response, but there’s support for displaying response data as it’s returned by adding a streaming directive to the session definition. This approach can be helpful if you’re building an interactive user experience or where LLM responses can take time to generate.
Like Semantic Kernel and other agent development frameworks, the GitHub Copilot SDK is designed to use tools that link your code to LLMs, which can structure calls to data that Copilot will convert to natural language. For example, you could build a tool that links to a foreign exchange platform to return the exchange rate between two currencies on a specific date, which can then be used in a Copilot SDK call, adding the tool name as part of the session definition.
Using tools and MCP servers
Tools are built as handlers linked to either local code or a remote API. Working with API calls allows you to quickly ground the LLM and reduce the risks associated with token exhaustion. Once you have a tool, you can have an interactive session allow users to build their own prompts and deliver them to the grounded LLM as part of a larger application or as a task-specific chatbot in Teams or another familiar application.
You don’t have to write your own tool; if you’re working with existing platforms and data, your SDK code can call an MCP server, like the one offered by Microsoft 365, to provide quick access to larger data sources. As the GitHub Copilot SDK builds on the capabilities of the Copilot CLI, you can start by defining a link to the relevant MCP endpoint URLs and then allow the tool to call the necessary servers as needed. An application can include links to more than one MCP server, so linking Work IQ to your GitHub repositories bridges the gap between code and the email chains that inspired it.
MCP servers can be local or remote, using HTTP for remote connections and stdio for local. You may need to include authorization tokens in your MCP server definition, as well as accepting all its tools or choosing specific tools for your GitHub Copilot SDK agent to use.
Other options for the SDK agent session include defining a base prompt for all queries to help avoid inconsistencies and to provide some context for interactions outside of the user prompt. Different agents in the same application can be given different base prompts and separate names for separate functions. A local MCP server might give your application access to a PC’s file system, while a remote server might be GitHub’s own one, giving your application access to code and repositories.
You’re not limited to the official language support. There are community releases of the SDK for Java, Rust, C++, and Clojure, so you can work with familiar languages and frameworks. As they’re not official releases, they may not be coordinated with GitHub’s own SDKs and won’t have the same level of support.
Working in the Microsoft Agent Framework
Usefully, the Microsoft Agent Framework now supports the GitHub Copilot SDK, so you can integrate and orchestrate its agents with ones built from other tools and frameworks, such as Fabric or Azure OpenAI. This lets you build complex AI-powered applications from proven components, using Agent Framework to orchestrate workflow across multiple agents. You’re not limited to a single LLM, either. It’s possible to work with ChatGPT in one agent and Claude in another.
Tools like the GitHub Copilot SDK are a useful way to experiment with agent development, taking the workflows you’ve built inside GitHub and Visual Studio Code and turning them into their own MCP-powered applications. Once you’ve built a fleet of different single-purpose agents, you can chain them together using higher-level orchestration frameworks, thereby automating workflows that bring in information from across your business and your application development life cycle.
Three ways AI will change engineering practices 3 Feb 2026, 1:00 am
The impacts of AI are mostly viewed through the lens of improving business processes, but AI is poised to have long-term impacts on engineering too. Not only can AI make it easier for engineers to bring their ideas to life, but it can also remove the burden of manual toil that keeps them from focusing on more business-critical issues. The growing integration of AI into engineering workflows will drive both engineering efficiencies and business advantages as teams continue to unlock valuable use cases.
Here are three ways AI will bring long-lasting change to engineering practices.
AI will accelerate prototype building and idea validation
For teams assessing their current architectures and gauging where they want to be in the next few years, AI can help identify the changes they would need to make and the prototypes they would need to build. An engineer may compile bullet points to map out their thought process, which can then serve as the initial AI prompt to assist in building out the plan. The feedback generated from the AI prompts can enable teams to make decisions faster and get into the proof-of-concept stage at an accelerated pace.
AI can also help engineers become more efficient by directing them where to begin research. For example, when building a chat UI, AI can identify various factors the engineer needs to think through, such as performance and network considerations, and create a list of recommended actions. As a result, teams can get a head start on their projects and ultimately get to solutions faster.
AI will increase the importance and quality of documentation
It’s important to remember that AI tools need direction, and they’re only as good as the data they’re trained on. If the AI doesn’t have good input, it can’t create good output.
The industry will move toward stronger documentation practices because of how AI works. In order for the AI to perform as desired, organizations need proper, up-to-date documentation to prevent the AI from making incorrect assumptions. Well-formed prompts that are focused on an outcome, and that give the AI easier ways to discover the documentation, make a significant difference in performance because these prompts can give the AI stronger context and direction to work with.
Plus, AI can be leveraged to ensure that technical documentation is kept up-to-date and relevant. Similar to prototyping, engineers can automate the initial stages of documentation to help reduce the amount of time they spend on it. The engineer can then use their expertise to review and refine the AI-produced documentation, further guiding the AI toward the correct results.
AI will increase the focus on compliance
Compliance remains an ongoing challenge for organizations as they navigate data and privacy regulations around the world, including GDPR and SOC 2. Maintaining compliance is crucial to avoiding costly fines and penalties from regulatory bodies, but also to protect the organization from potential reputational damage. AI use exposes a new liability as organizations are now challenged with managing the data it can access.
As organizations feed sensitive company and customer data into their AI tools, they need to ensure that the data stays within these systems. Teams need to maintain visibility into how accessible the AI data is internally, as well as what other tools may be exposed to it. Additionally, strong guardrails and security permissions must be established to prevent any unauthorized access or use of customer data. While every organization is eager to deploy AI, they need to be vigilant around the level of data access they grant these tools.
The future of engineering
The ongoing integration of AI will transform the ways engineers work. They can kick off their projects faster and offload their more boring and tedious tasks, freeing them to focus on the higher-impact issues that require human expertise. Between speeding up prototype building and increasing productivity, engineers will be empowered to find more ways to leverage AI as they continue to guide their organizations toward innovation.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
Page processed in 0.279 seconds.
Powered by SimplePie 1.3, Build 20180209064251. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.
