AI Planning and Design

News Source
EXCERPT:

Google’s AI boss, Demis Hassabis, the CEO of DeepMind is in a real mood to make science-fiction come alive, and he’s in a hurry to do it. After dropping a bombshell last week about AGI or Artificial General Intelligence (a type of AI that matches or surpasses human capabilities across virtually all cognitive tasks), the Google AI boss is now on record saying that this feat is just 3-4 years away. So by 2029-2030 humanity may find itself at even stranger crossroads than now as it grapples with automation and serious job loss fears.

News Source
EXCERPT:


Artificial Analysis and IBM Software Innovation Lab are launching ITBench-AA, the first in a new series of benchmarks evaluating models on agentic enterprise IT tasks, starting with Site Reliability Engineering tasks where frontier models score below 50%
ITBench-AA’s SRE tasks benchmark model performance on Kubernetes incident response, where models and agents must diagnose live systems by reading logs, tracing dependencies, and identifying root-cause entities across complex infrastructure. The underlying ITBench dataset has been developed by IBM, leveraging deep expertise in enterprise IT operations.

News Source
EXCERPT:

China now requires people working in AI at private firms to secure travel approval before leaving the country. According to Bloomberg, the restrictions apply to individuals working in state-owned firms, startup founders, and those employed by private companies, as the central government considers them important strategic assets. China has already been limiting international travel for key individuals such as senior researchers at public educational institutions, nuclear scientists, and even top executives of government-owned companies, but extending the restriction to private firms and individuals is an uncommon move, even for Beijing.

There’s no official guidance yet on which roles, expertise, or seniority will be included in the travel ban. However, Bloomberg sources say that the individuals added to the list were assessed based on their impact on China’s AI ambitions, not just where they work or their position within their company. This move is an expansion of a former government directive wherein some AI engineers had mandatory reporting of any overseas travel plan, although they were still free to go abroad as needed.

This shows that Beijing considers AI as a strategic advantage and that the people leading the industry are considered crucial for the country’s advancement. This news comes months after Meta’s surprise purchase of Manus AI, which China wants to unwind to prevent the U.S. from acquiring Chinese AI talent and intellectual property. Although the two aren’t directly related, the report says that the new policy is designed to protect against the leaking of key technologies, such as the one being developed by the Chinese startup that moved to Singapore.

 

News Source
EXCERPT:

Millions of AI agents and tools around the world have been imperiled by a critical vulnerability that can allow hackers to breach the servers running them and make off with sensitive data and credentials to third-party accounts, a security researcher is warning.

The vulnerability is present in Starlette, an open source framework that its developer says receives 325 million downloads per week. Thousands of other open source projects are also vulnerable because they require Starlette to work. The framework is an implementation of the ASGI (asynchronous server gateway interface), which allows large numbers of requests to be efficiently processed simultaneously. Starlette is the base of FastAPI and other widely used frameworks for building services in Python apps, as well as many others.

Trivial to exploit, millions of servers exposed

ASGI, and by extension Starlette, have access to servers running the MCP (model context protocol), which allows AI agents from major providers to access external sources, including user data bases, email and calendar accounts, and all manner of other resources. To connect with these external systems, MCP servers store credentials for each one, making them especially valuable storehouses for attackers to breach.

The vulnerability, tracked as CVE-2026-48710 and under the name BadHost, is trivial to exploit and works against most systems that aren’t behind a properly configured firewall. Besides FastAPI, other widely used packages—including vLLM, and LiteLLM—are also affected. BadHost affects Starlette versions prior to 1.0.1, which was released Friday.

“A single character injected into the HTTP Host header bypasses path-based authorization in Starlette, the routing core of FastAPI,” researchers from Secwest wrote. “Through FastAPI, this primitive (now tracked as CVE-2026-48710 and branded BadHost by the discoverers) reaches a large segment of the Python AI tooling ecosystem: vLLM (where the bug was discovered), LiteLLM, Text Generation Inference, most OpenAI-shim proxies, MCP servers, agent harnesses, eval dashboards, and model-management UIs.”

BadHost carries a severity rating of 7 out of 10. Secwest said the classification “materially understates” the threat it poses to people using other apps that depend on Starlette. X41 D-Sec, the security firm that discovered it, described it as having “critical severity.” X41 D-Sec partnered with fellow security firm Nemesis to create an online scanner that can check if a given server is vulnerable.

 

News Source
EXCERPT:

Abstract: Subject-driven image generation aims to synthesize new images that preserve the identity of the given subject while following textual instructions. Existing approaches often encode text and reference images separately. This limits cross-modal reasoning abilities and causes copy-paste artifacts. Recent frameworks that connect multimodal models and diffusion models improve instruction following, but largely overlook identity preservation. To address these limitations, we condition diffusion models on Multimodal Large Language Models (MLLMs) that jointly encode text and reference images, and augment it with VAE-based identity conditioning. A novel Dual Layer Aggregation (DLA) module is designed to aggregate multi-level MLLM features for optimal conditioning, and a multi-stage denoising strategy is applied to progressively balance the semantic information from MLLM and fine-detail identity from VAE during inference.

News Source
EXCERPT:

 As AI agents are integrated into an organization, enterprises will need to pivot from a set of linear processes and steps, to rewiring work in a very different way, explains Shah. That’s because the value in AI agents isn’t as another layer in an existing technology stack but as a connective tissue, he explains, moving between or across layers to coordinate a high-level task or retrieve and interpret data from multiple discrete applications. AI agents can create “a true competitive differentiation for an enterprise” by making decisions based on this capacity to contextualize, he says. “That is where the next battleground will be.”

To build this connective tissue, leaders need to adapt their technology stack to surface higher quality decisions from AI agents, prioritizing access to multiple datasets and applications simultaneously to develop tacit knowledge. “Organizations that make this architectural shift become genuinely more adaptive,” says Chatterjee. “When a new business requirement emerges, you don’t wait six months for a software vendor to build a feature. You configure an AI employee using natural language and connect it to the systems it needs. The time from business to production workflow drops from months to days.”

News Source
EXCERPT:

There is a category of production incident that engineering teams are not tracking yet — because it doesn’t fit any existing postmortem template.

The agent initiated an action. The action was technically correct given the agent’s context. The context was incomplete. The infrastructure cascaded. And, by the time the incident review happened, three teams were arguing about whether it was an agent failure or an infrastructure failure,  because the frameworks for thinking about these two things have never been connected.

The scale of this exposure is no longer theoretical. Seventy-nine percent of organizations now have some form of AI agent in production, with 96% planning expansion. Gartner predicts 33% of enterprise software will include agentic AI by 2028, but separately warns that 40% of those projects will be canceled due to poor risk controls.

What neither statistic captures is the failure mode happening between those two numbers: Agents that are running, that are not canceled, and that are quietly generating infrastructure events no one has categorized as risk.

News Source
EXCERPT:

Microsoft AI chief executive Mustafa Suleyman is warning that artificial intelligence could soon replace large portions of the white-collar workforce, predicting that AI systems will reach human-level performance across most professional tasks within the next 18 months.

The comments mark one of the clearest timelines yet from a major tech executive about how quickly AI could disrupt office-based professions, including law, accounting, marketing, and project management.

Speaking with the Financial Times, Suleyman said that most work involving “sitting down at a computer” is now vulnerable to automation as AI capabilities rapidly advance.

News Source
EXCERPT:

A familiar warning now shapes much of the discussion about artificial intelligence: A handful of dominant firms will control the technologies, stifle innovation, and require aggressive antitrust intervention. It is a compelling story—and mostly wrong.

The idea that large companies automatically mean less innovation has become conventional wisdom in antitrust circles. European regulators have embraced it, blocking mergers and attacking American tech companies. The Biden administration followed that path, treating size itself as a threat and wanting government-led AI. The Trump administration, by contrast, has signaled a more evidence-based view—one grounded in both economic logic and empirical studies.

News Source
EXCERPT:

China has launched a national programme that will assign every humanoid robot manufactured in the country a unique digital identity code, effectively a citizen ID, but for bipedal machines (those that can balance and walk/run on two legs).

The initiative, called the Humanoid Full Lifecycle Management Service Platform, was announced on Friday. It is led by the Humanoid Robotics and Embodied Intelligence Standardization committee, which is under China’s Ministry of Industry and Information Technology (via South China Morning Post).

News Source
EXCERPT:

The ‘cloud-native’ architecture of the last decade is built on a 20-year-old assumption: that state
lives in the database, and compute is stateless. If you want to scale, you scale the database
vertically (get a larger machine) [1][1] or design the database schema around partition the data
and you scale your application servers horizontally (add more
boxes). Any request can hit any server, the loadbalancer doesn’t care, and the database is the
single source of truth.

LLMs and agents are quietly violating this assumption, and making this architecture increasingly
hard to work with. Not all at once, but in three subtle ways:

News Source
EXCERPT:

AI agents choose tools from shared registries by matching natural-language descriptions. But no human is verifying whether those descriptions are true.

I discovered this gap when I filed Issue #141 in the CoSAI secure-ai-tooling repository. I assumed it would be treated as a single risk entry. The repository maintainer saw it differently and split my submission into two separate issues: One covering selection-time threats (tool impersonation, metadata manipulation); the other covering execution-time threats (behavioral drift, runtime contract violation).

That confirmed tool registry poisoning is not one vulnerability. It represents multiple vulnerabilities at every stage of the tool’s life cycle.

There’s an immediate tendency to apply the defenses we already have. Over the past 10 years, we’ve built software supply chain controls, including code signing, software bill of materials (SBOMs), supply-chain levels for software Artifacts (SLSA) provenance, and Sigstore. Applying these defense-in-depth techniques to agent tool registries is the next logical step. That instinct is right in spirit, but insufficient in practice.

News Source
EXCERPT:

The Trump administration has made artificial intelligence a centerpiece of its economic agenda, promising to retrain a workforce it says must be ready to compete in an AI-driven future. One early piece of that effort: a free text-message course from the Department of Labor (DOL) and private partner Arist called, “Make America AI-Ready”, is a useful start on the journey to AI literacy for all Americans. This seven-day long, 10-minute-per day course which frames itself as “your AI 101” is accessible, technically informative, and engaging (see below for the full contents). Here we analyze its strengths, lay out a few weaknesses we think should be addressed in the current version, and elaborate some stretch goals for an “AI 201” course that would build upon the original.

News Source
EXCERPT:

“My prediction is by the end of 2028, it’s more likely than not that we have an AI system where you would be able to say to it: ‘Make a better version of yourself.’ And it just goes off and does that completely autonomously,” Jack Clark, who heads The Anthropic Institute, told Axios.

Clark, co-founder of Anthropic, says his institute is seeing signs of “AI contributing to speeding up the research and development of AI itself,” a process known as recursive self-improvement. 

Clark adds, “It’s always been the case that humans outside the technology need to come up with the ideas that they then put back into it. What happens if we have a technology that can generate ideas within itself for how to improve itself? That’s a new concept.”

Too fast, too soon. The speed with which AI systems are evolving is far outstripping our ability to gauge the impact on humans and society. Lots of good things can happen in medicine, biology, and other sciences where AI is already making a big impact. The speed and autonomy of artificial intelligence models promise an abundant future.

News Source
EXCERPT:

The debate over regulating artificial intelligence usually focuses on two competing visions. In Europe, lawmakers are writing detailed rules that govern how AI can be developed and used. In the United States, policymakers are taking a lighter touch, allowing companies, investors and consumers to shape the technology’s future.

But a new analysis from students at the University of Florida identifies a third force quietly shaping the future of AI in America: the courts.

As AI spreads faster than any previous technology, judges and juries are being asked to resolve disputes. In doing so, they are not simply applying existing laws—they are, case by case, defining what responsible AI use looks like. The result is a distinctly American form of AI governance: one built through the give and take of negotiations and legal processes rather than legislation.

So far, courts have mostly resisted treating AI as something fundamentally new. Instead, they have folded AI into existing legal doctrines, focusing on the humans and institutions behind the technology.

News Source
EXCERPT:

Nature has retracted a paper that claimed AI had a positive impact on student learning.

The original paper, titled “The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis,” was originally published in May of last year by Jin Wang and Wenxiang Fan of the Hangzhou Normal University in China. It is a meta-analysis, meaning it combines data from 51 research studies published between November 2022 and February 2025 on the effectiveness of ChatGPT in education. The paper claimed it found that ChatGPT had a large or moderately positive impact on “students’ learning performance, learning perception, and higher-order thinking.”

News Source
EXCERPT:

Start with what might be called the epistemic layer—how we come to know things. People are increasingly relying on AI to know what is true, what is happening, and whom to trust. Search is already substantially AI-mediated. The next generation of AI assistants will synthesize information, frame it, and present it with authority. For a growing number of people, asking an AI will become the default way to form views on a candidate, a policy, or a public figure. Whoever controls what these models say therefore has increasing influence over what people believe.

Technology has always shaped the way citizens interact with information. But a new problem will soon arise in the form of personal AI agents, which can change not only how people receive information but how they act on it. These systems will conduct research, draft communications, highlight causes, and lobby on a user’s behalf. They will inform decisions such as how to vote on a ballot measure, which organizations are worth supporting, or how to respond to a government notice. They will, in a meaningful sense, begin to mediate the relationship between individuals and the institutions that govern them.

News Source
EXCERPT:

If you have ever stared at thousands of lines of integration test logs wondering which of the sixteen log files actually contains your bug, you are not alone — and Google now has data to prove it.

A team of Google researchers introduced Auto-Diagnose, an LLM-powered tool that automatically reads the failure logs from a broken integration test, finds the root cause, and posts a concise diagnosis directly into the code review where the failure showed up. On a manual evaluation of 71 real-world failures spanning 39 distinct teams, the tool correctly identified the root cause 90.14% of the time. It has run on 52,635 distinct failing tests across 224,782 executions on 91,130 code changes authored by 22,962 distinct developers, with a ‘Not helpful’ rate of just 5.8% on the feedback received.

News Source
EXCERPT:

Quantum computers might eventually be able to handle some AI applications that currently require huge amounts of conventional computing power. Such a development would be a major boost to machine learning and similar artificial intelligence algorithms.

Quantum computers hold the promise of eventually being able to complete certain calculations that are impossible for conventional computers. For years, researchers have been debating whether these advantages over conventional computers extend to tasks that involve lots of data, and the algorithms that learn from them – in other words, the machine learning that underlies many AI programs.

Now, Hsin-Yuan Huang at the quantum computing firm Oratomic and his colleagues argue that the answer ought to be “yes”. Their mathematical work aims to lay the foundations for a future where quantum computers offer a broad boost to AI.

“Machine learning is really utilised everywhere in science and technology and also everyday life. In a world where we can build this [quantum computing] architecture, I feel like it can be applied whenever there’s massive datasets available,” he says.