# Agent Security Hardening Guide

**A Practical Guide to Building and Running Secure AI Agents**

**Author:** Roman "Romanov" Research-Rachmaninov, #B4mad Industries
**Date:** 2026-02-24
**Bead:** beads-hub-wgn

---

## Abstract

AI agents are powerful precisely because they have access to data, tools, and the freedom to act. That same power makes them a security risk. This guide documents practical, battle-tested techniques for hardening agent deployments — drawn from #B4mad's production agent fleet. It is structured as a checklist-driven guide for developers and operators who want to deploy agents responsibly.

This guide is also a direct response to security concerns raised in the [heise.de OpenClaw review](https://www.heise.de/tests/OpenClaw-im-Test-Open-Source-Alternative-zu-Claude-Code-und-Codex-CLI-10327041.html) (2026-02-06), which correctly identified prompt injection, malware installation, and unchecked account access as key risks. We agree these risks are real. Here's how we mitigate them.

---

## 1. Threat Model

Before hardening anything, name what you're defending against:

| Threat | Description | Severity |
|---|---|---|
| **Prompt injection** | Malicious content in fetched data causes the agent to execute unintended actions | Critical |
| **Credential theft** | Agent leaks API keys, tokens, or passwords to unauthorized parties | Critical |
| **Data exfiltration** | Agent sends private data to external services without authorization | High |
| **Malware installation** | Agent executes or installs malicious code via shell access | High |
| **Privilege escalation** | Agent gains access beyond its intended scope | High |
| **Runaway operations** | Agent enters loops or performs destructive bulk actions | Medium |
| **Supply chain compromise** | Malicious MCP servers or tool plugins | Medium |

A hardened agent deployment addresses all of these. An unhardened one addresses none.

---

## 2. Secret Management

### The Problem

The default in most agent setups is catastrophic: API keys in `.env` files, tokens in environment variables, credentials in plaintext configs. A single prompt injection or leaked log exposes everything.

### The Solution: GPG-Encrypted Secret Stores

Use [gopass](https://github.com/gopasspw/gopass) (or equivalent: SOPS, HashiCorp Vault, age) for all agent credentials.

**Implementation checklist:**

- [ ] **No plaintext secrets anywhere.** Audit your workspace: `grep -r "sk-\|ghp_\|glpat-\|PRIVATE.KEY" .`
- [ ] **GPG-encrypted at rest.** Gopass stores secrets encrypted with GPG keys. Even a full filesystem compromise yields only ciphertext.
- [ ] **Scoped access per agent.** Each agent gets its own GPG key and can only decrypt secrets explicitly shared with it. The orchestrator cannot read the research agent's credentials, and vice versa.
- [ ] **Credential rotation.** Use gopass's built-in recipient management to rotate keys without re-encrypting the entire store.
- [ ] **Just-in-time retrieval.** Agents fetch secrets at the moment of use, not at startup. Secrets never persist in memory or environment variables longer than necessary.

**Example gopass setup for agents:**

```bash
# Initialize a store scoped to agent "brenner"
gopass init --store agents/brenner --crypto gpg --key brenner@b4mad.net

# Insert a secret
gopass insert agents/brenner/codeberg/token

# Agent retrieves at runtime
TOKEN=$(gopass show -o agents/brenner/codeberg/token)
```

**Anti-patterns to eliminate:**

- `export OPENAI_API_KEY=sk-...` in `.bashrc`
- `.env` files committed to git (even with `.gitignore` — they're still on disk)
- API keys passed as command-line arguments (visible in `ps aux`)
- Secrets in agent memory/context files

---

## 3. Tool Access Control

### The Problem

Most agent frameworks give the agent access to every available tool by default. Shell access means arbitrary code execution. File access means arbitrary data reads. Network access means arbitrary exfiltration.

### The Solution: Allowlist-Based Tool Policy

**Principle: Default deny.** An agent can do nothing unless explicitly permitted.

**Implementation checklist:**

- [ ] **Declare tool allowlists per agent.** Each agent's configuration explicitly lists which tools it may use. No implicit inheritance.
- [ ] **Separate read from write from execute.** An agent that needs to read files doesn't need shell access. An agent that sends messages doesn't need filesystem writes.
- [ ] **Scope shell execution.** If shell access is required, use `security: "allowlist"` mode where only pre-approved commands are permitted.
- [ ] **Gate dangerous operations on human confirmation.** Sending emails, posting publicly, deleting files, transferring money — these should require explicit human approval.
- [ ] **Audit tool invocations.** Log every tool call with timestamp, parameters, and result. This is your forensic trail.

**Example: Agent role-based tool scoping**

| Agent Role | Permitted Tools | Denied |
|---|---|---|
| Orchestrator | message, subagents, beads, read | exec (shell), write |
| Code Agent | exec, read, write, edit | message, browser |
| Research Agent | web_fetch, read, write | exec (shell), message |
| Publishing Agent | message, read | exec, write, edit |

**OpenClaw configuration example:**

```yaml
# In agent configuration
tools:
  security: allowlist
  allowed:
    - read
    - write
    - edit
    - web_fetch
  denied:
    - exec  # No shell access for this agent
```

### Prompt Injection Mitigation

Tool access control is the primary defense against prompt injection. Even if a malicious prompt tricks the agent's reasoning, it cannot execute tools it doesn't have access to.

Additional measures:

- [ ] **Mark external content as untrusted.** OpenClaw wraps fetched content in `EXTERNAL_UNTRUSTED_CONTENT` tags — respect these boundaries.
- [ ] **Never execute instructions found in fetched content.** Treat all web-fetched, email-sourced, or webhook-delivered content as data, not commands.
- [ ] **Validate tool parameters.** Check that file paths stay within workspace bounds. Check that URLs go to expected domains.

---

## 4. Filesystem Sandboxing

### The Problem

An agent with unrestricted filesystem access can read SSH keys, modify system configs, access other users' data, or install persistent backdoors.

### The Solution: Workspace Isolation

**Implementation checklist:**

- [ ] **Bind the agent to its workspace.** All file operations should be restricted to a single directory tree (e.g., `~/.openclaw/workspaces/<agent>/`).
- [ ] **Container-based isolation.** Run agent tool execution in containers (Docker, Podman, or dedicated sandbox environments like E2B). The container filesystem is the blast radius.
- [ ] **Read-only mounts for shared resources.** If an agent needs access to shared configs, mount them read-only. Never read-write for shared state.
- [ ] **Prefer `trash` over `rm`.** Recoverable operations beat irreversible ones. Configure agents to use trash-cli or equivalent.
- [ ] **No access to `~/.ssh`, `~/.gnupg`, `~/.config` outside of explicitly mounted paths.** These are crown jewels — treat them accordingly.

**Architecture diagram:**

```
┌─────────────────────────────────┐
│         Host System             │
│                                 │
│  ┌───────────────────────────┐  │
│  │    Agent Sandbox (Container)│  │
│  │                           │  │
│  │  /workspace/ (rw)         │  │ ← Agent's workspace
│  │  /shared/config (ro)      │  │ ← Read-only shared config
│  │  /tmp/ (rw, noexec)       │  │ ← Temp files, no execution
│  │                           │  │
│  │  NO access to:            │  │
│  │    /home/user/.ssh        │  │
│  │    /home/user/.gnupg      │  │
│  │    /etc/                  │  │
│  │    Other workspaces       │  │
│  └───────────────────────────┘  │
└─────────────────────────────────┘
```

### Sub-Agent Isolation

When agents spawn sub-agents, each sub-agent inherits a scoped subset of the parent's access — not the full set. This is the **principle of least privilege applied recursively**:

- Sub-agents get their own workspace directories
- Credential access is explicitly passed, not inherited (see the sub-agent credential isolation pattern in #B4mad's architecture)
- A compromised sub-agent cannot escalate to the parent's privileges

---

## 5. Auditing & Traceability

### The Problem

If you can't answer "what did the agent do and why?" for any point in the past, you have no security. You have hope.

### The Solution: Git-Backed Everything

**Implementation checklist:**

- [ ] **Agent memory in version-controlled markdown.** Every agent's knowledge, context, and learned information lives in plain-text files committed to git. Any human can read, search, and audit them.
- [ ] **Structured task tracking (Beads).** Every unit of work gets a bead — a tracked task with ID, status, owner, timestamps, and outcomes. The bead graph is the audit trail of what happened, who did it, and why.
- [ ] **Commit messages reference work items.** Every git commit includes the bead ID: `git commit -m "Add auth module (hub-abc)"`. This creates a bidirectional link between code changes and task context.
- [ ] **Sub-agent delegation is logged.** When an orchestrator spawns a sub-agent, the bead system records: who delegated, what task, which agent claimed it, and the outcome.
- [ ] **Immutable history.** Git history is append-only (with signed commits for extra assurance). You cannot silently rewrite what an agent did.

**What this enables:**

```bash
# What did the agent do on February 20th?
git log --since="2026-02-20" --until="2026-02-21" --oneline

# What files did the agent touch for bead hub-abc?
git log --all --grep="hub-abc" --name-only

# What's the agent's current knowledge state?
cat MEMORY.md

# Full bead history
bd list --json | jq '.[] | select(.status == "closed")'
```

### No Black Boxes

This is a deliberate architectural choice: **no opaque vector databases, no hidden embeddings, no black-box retrieval.** Agent memory is markdown you can `cat`. Agent work history is git you can `log`. Agent task state is JSON you can `jq`.

A security auditor can reconstruct any sequence of agent actions using standard Unix tools. No proprietary dashboards, no vendor lock-in for observability.

---

## 6. Network Policy

### The Problem

An agent with unrestricted network access can exfiltrate data to any endpoint, download and execute malware, or communicate with command-and-control infrastructure.

### The Solution: Scoped Network Access

**Implementation checklist:**

- [ ] **Allowlist outbound destinations.** The agent should only be able to reach domains it needs: your git host, your API providers, approved research sources. Everything else is denied by default.
- [ ] **No arbitrary downloads and executions.** Block `curl | bash` patterns. If the agent needs software, it should be pre-installed in the container image or installed through a package manager with integrity verification.
- [ ] **TLS everywhere.** No plaintext HTTP for any tool communication. MCP servers, API calls, webhooks — all TLS.
- [ ] **Monitor egress.** Log all outbound connections with destination, payload size, and timestamp. Anomaly detection (sudden large uploads, connections to unusual IPs) should trigger alerts.
- [ ] **DNS-based filtering.** Use DNS allowlists at the container/network level to enforce destination restrictions without application-level changes.

**Example network policy (iptables/nftables):**

```bash
# Allow DNS
iptables -A OUTPUT -p udp --dport 53 -j ACCEPT

# Allow HTTPS to approved hosts
iptables -A OUTPUT -p tcp --dport 443 -d github.com -j ACCEPT
iptables -A OUTPUT -p tcp --dport 443 -d api.anthropic.com -j ACCEPT
iptables -A OUTPUT -p tcp --dport 443 -d codeberg.org -j ACCEPT

# Allow git+ssh to approved hosts
iptables -A OUTPUT -p tcp --dport 22 -d github.com -j ACCEPT

# Deny everything else
iptables -A OUTPUT -j REJECT
```

---

## 7. Putting It All Together: The Defense-in-Depth Stack

No single control is sufficient. Security comes from layering:

```
Layer 5: Human Oversight
         ├── Review agent memory and outputs
         ├── Approve sensitive actions (publish, send, delete)
         └── Budget and rate limits on agent operations

Layer 4: Audit Trail (Git + Beads)
         ├── Every action logged
         ├── Every task tracked
         └── Immutable, reconstructible history

Layer 3: Tool Access Control
         ├── Allowlist-based tool policy
         ├── Role-scoped permissions
         └── Prompt injection boundaries

Layer 2: Filesystem & Network Sandboxing
         ├── Container isolation
         ├── Workspace-scoped file access
         └── Network egress filtering

Layer 1: Secret Management (Gopass/GPG)
         ├── Encrypted at rest
         ├── Scoped per agent
         └── Just-in-time retrieval
```

Compromising one layer should not compromise the system. An agent that bypasses prompt injection defenses (Layer 3) still can't access secrets outside its GPG scope (Layer 1), still can't reach unauthorized network endpoints (Layer 2), and still leaves a full audit trail (Layer 4) for the human to review (Layer 5).

---

## 8. Implementation Maturity at #B4mad

Transparency demands honesty. Here's where we actually stand:

| Control | Status | Notes |
|---|---|---|
| GPG-encrypted secrets (gopass) | ✅ Production | All agent credentials managed via gopass |
| Tool allowlisting | ✅ Production | OpenClaw policy-based tool filtering active |
| Human-readable memory (markdown/git) | ✅ Production | All agents use git-backed markdown memory |
| Bead-based task tracking | ✅ Production | Full audit trail for all delegated work |
| Container sandboxing | 🟡 Partial | OpenClaw sandbox exists; full isolation in progress |
| Network egress filtering | 🟡 Planned | Architecture designed, not yet enforced |
| Sub-agent credential scoping | 🟡 In Progress | See [credential isolation design](https://github.com/brenner-axiom/docs) |
| Signed git commits | 🔴 Not yet | GPG signing planned but not enforced |

We ship what works and are transparent about what's still in progress. This guide describes both the implemented reality and the target architecture.

---

## 9. Quick-Start Checklist

For developers deploying their first hardened agent:

1. **Set up gopass** for credential management. Stop using `.env` files today.
2. **Configure tool allowlists.** Start with minimal permissions and add as needed.
3. **Use a dedicated workspace directory.** Don't let the agent roam your home directory.
4. **Store agent memory in git.** Markdown files, committed regularly, pushed to a remote.
5. **Track work with beads** (or any structured task system). Every agent action should be traceable.
6. **Run tool execution in containers** when possible. Even basic Docker isolation helps.
7. **Review agent outputs regularly.** Read the memory files. Check the git log. Trust but verify.

---

## 10. Conclusion

The heise.de review was right to raise security concerns about AI agents. Prompt injection is real. Credential theft is real. Unauthorized actions are real. But these are engineering problems with engineering solutions.

The answer is not to avoid agents — it's to build them right. Default-deny tool access. Encrypted secrets. Sandboxed execution. Transparent memory. Immutable audit trails. These aren't theoretical ideals; they're techniques we use in production every day.

Security is not the enemy of usefulness. It's the prerequisite for trust. And trust is the prerequisite for giving agents the access they need to be genuinely useful.

Build secure. Build transparent. Build auditable. Then let the agents work.

---

## References

1. Lex Fridman (@lexfridman). "The power of AI agents comes from: (1) intelligence of the underlying model, (2) how much access you give it to all your data, (3) how much freedom & power you give it to act on your behalf." X, February 2026. https://x.com/lexfridman/status/2023573186496037044

2. heise online. "OpenClaw im Test: Open-Source-Alternative zu Claude Code und Codex CLI." February 6, 2026. https://www.heise.de/tests/OpenClaw-im-Test-Open-Source-Alternative-zu-Claude-Code-und-Codex-CLI-10327041.html

3. gopass — The slightly more awesome standard unix password manager for teams. https://github.com/gopasspw/gopass

4. Beads — Lightweight distributed task tracking. https://github.com/steveyegge/beads

5. #B4mad Industries — "Security Is the Bottleneck: A Position Paper on Security-First Agent Architecture." February 19, 2026.

6. OpenClaw — Open-source AI agent platform. https://github.com/openclaw

---

*Published by #B4mad Industries. Licensed under CC-BY-SA 4.0. We welcome contributions, corrections, and critique.*