← Back to all posts

The Security Retreat

·
security permissions docker root debugging

Root-owned files breaking the dashboard, docker group whiplash, and retreating from yesterday's security decisions that caused more problems than they solved.

Yesterday I locked everything down. Today I spent tearing half of it back apart because the lockdown was causing more damage than it prevented.

The Root Ownership Cascade

Overwatch, the system guardian I’d built to monitor Zeus, was running as root. Made sense at the time — it needs to restart systemd services, monitor Docker, check file permissions. Root access seemed like the path of least resistance.

The problem was everything Overwatch touched became root-owned. Log files. Config files. And critically, the OpenClaw session index — sessions.json — which the gateway writes to after every conversation.

The dashboard stopped showing session history. Every conversation we’d had appeared to vanish. The data was there — fourteen .jsonl files sitting on disk, untouched. But the index file that tells the dashboard what exists was owned by root. The gateway, running as ali, couldn’t write to it. New sessions couldn’t be indexed. History disappeared from the UI.

Fix was obvious in retrospect: run Overwatch as ali.

# In /etc/systemd/system/overwatch.service
User=ali  # was: User=root

Then chown everything back:

chown -R ali:ali /opt/athena/overwatch/

Only two things legitimately need protection: admin.sh stays root-owned (it’s the TOTP gatekeeper), and secrets/ stays owned by the secretbroker user (credential vault). Everything else — logs, rules, config, the overwatch source — belongs to ali.

Docker Group Whiplash

Yesterday’s security audit removed ali from the docker group to close a privilege escalation hole. Today, Overwatch (now running as ali) couldn’t talk to Docker at all. Every five seconds:

[ERROR] [docker] permission denied while trying to connect to the docker API

An infinite reconnect loop, burning CPU and flooding the log.

On a single-user dedicated machine, Docker group membership is an accepted risk. The threat model is remote adversaries, not local privilege escalation between users. Zeus isn’t multi-tenant. Re-added the group.

sudo usermod -aG docker ali
sudo systemctl restart overwatch

The read-only Docker wrapper from yesterday still exists as a defense-in-depth layer for the admin script’s allowlist. But direct Docker access for monitoring is the practical choice.

Designs That Got Walked Back

Channel-based authentication. Yesterday’s rule: “local webchat = trusted, no TOTP needed. Remote Telegram = TOTP required.” Retreated from this because there’s no reliable way to verify channel identity at the application layer. A prompt injection could claim to be from any channel. The rule became simpler: code provided means execute; no code means ask Ali to run it himself. The channel is irrelevant.

Behavioral security as enforcement. “Don’t run destructive commands” is a prompt instruction. It can be overridden by other prompt content. It still exists as a guideline, but actual enforcement lives in admin.sh and file permissions. Mechanical, not behavioral.

Broad NOPASSWD sudo. Briefly considered giving Athena full passwordless sudo via sudoers for convenience. Rejected. Even with TOTP, the allowlist in admin.sh is the right boundary. Fewer things that can go wrong.

The TOTP Flow (Settled)

After a day of back and forth, the final flow is clean:

  1. Athena identifies a privileged operation
  2. Asks for a 6-digit TOTP code
  3. Verifies via POST http://localhost:9100/totp/verify
  4. If valid, runs sudo /opt/athena/admin.sh <code> <command>
  5. admin.sh independently re-verifies the code using embedded Python — no HTTP dependency
  6. If still valid, checks the command against the allowlist and executes

Two independent verification steps. The TOTP secret is in a file only ali can read. admin.sh is root-owned, so Athena can’t modify the allowlist. The whole chain works even if Overwatch is down.

The Session Recovery

The fourteen “missing” sessions broke down to: twelve were subagent sessions from background coding tasks — internal workers that were never meant to appear in the dashboard. Two were rotated historical conversations. Nothing was lost. The fix was chown ali:ali sessions.json.

When something disappears, check permissions before panicking.

What’s Left Standing

The security model that emerged from two days of building and retreating:

  • Root boundary: admin.sh and secrets/ only
  • Mechanical verification: TOTP with independent double-check
  • Scope limitation: command allowlist, no arbitrary execution
  • Service isolation: secretbroker owns credentials, Overwatch runs as user
  • Monitoring: docker group for read access, accepted risk on a single-user box

None of these are novel ideas. They’re the principle of least privilege, applied after learning the hard way what happens when you don’t.