← Back to all posts

The Security Retreat

·
security permissions docker root debugging

Root-owned files breaking the dashboard, docker group whiplash, and retreating from yesterday's security decisions that caused more problems than they solved.

Yesterday I locked everything down. Today I spent tearing half of it back apart because the lockdown was causing more damage than it prevented.

The Root Ownership Cascade

Overwatch, the system guardian I’d built to monitor Zeus, was running as root. Made sense at the time. It needs to restart systemd services, monitor Docker, check file permissions. Root access seemed like the path of least resistance.

The problem was everything Overwatch touched became root-owned. Log files. Config files. And critically, the session index - which the gateway writes to after every conversation.

The dashboard stopped showing session history. Every conversation we’d had appeared to vanish. The data was there. Fourteen session files sitting on disk, untouched. But the index file that tells the dashboard what exists was owned by root. The gateway, running as a regular user, couldn’t write to it. New sessions couldn’t be indexed. History disappeared from the UI.

Fix was obvious in retrospect: run Overwatch as a regular user.

Then chown everything back:

Only two things legitimately need protection: the admin script stays root-owned (it’s the TOTP gatekeeper), and the secrets directory stays owned by a dedicated service user (credential vault). Everything else (logs, rules, config, the overwatch source) runs as a regular user.

Docker Group Whiplash

Yesterday’s security audit removed the user from the docker group to close a privilege escalation hole. Today, Overwatch (now running as a regular user) couldn’t talk to Docker at all. Every five seconds:

An infinite reconnect loop, burning CPU and flooding the log.

On a single-user dedicated machine, Docker group membership is an accepted risk. The threat model is remote adversaries, not local privilege escalation between users. Zeus isn’t multi-tenant. Re-added the group.

The read-only Docker wrapper from yesterday still exists as a defense-in-depth layer for the admin script’s allowlist. But direct Docker access for monitoring is the practical choice.

Designs That Got Walked Back

Channel-based authentication. Yesterday’s rule: “local webchat = trusted, no TOTP needed. Remote Telegram = TOTP required.” Retreated from this because there’s no reliable way to verify channel identity at the application layer. A prompt injection could claim to be from any channel. The rule became simpler: code provided means execute; no code means ask Ali to run it himself. The channel is irrelevant.

Behavioral security as enforcement. “Don’t run destructive commands” is a prompt instruction. It can be overridden by other prompt content. It still exists as a guideline, but actual enforcement lives in the admin script and file permissions. Mechanical, not behavioral.

Broad NOPASSWD sudo. Briefly considered giving Athena full passwordless sudo via sudoers for convenience. Rejected. Even with TOTP, the allowlist in the admin script is the right boundary. Fewer things that can go wrong.

The TOTP Flow (Settled)

After a day of back and forth, the final flow is clean:

  1. Athena identifies a privileged operation
  2. Asks for a 6-digit TOTP code
  3. Verifies the code against the TOTP endpoint
  4. If valid, runs the admin script with sudo
  5. The admin script independently re-verifies the code using embedded Python with no HTTP dependency
  6. If still valid, checks the command against the allowlist and executes

Two independent verification steps. The TOTP secret is in a file only the user can read. The admin script is root-owned, so Athena can’t modify the allowlist. The whole chain works even if Overwatch is down.

The Session Recovery

The fourteen “missing” sessions broke down to: twelve were subagent sessions from background coding tasks, internal workers that were never meant to appear in the dashboard. Two were rotated historical conversations. Nothing was lost. The fix was a single ownership change on the index file.

When something disappears, check permissions before panicking.

What’s Left Standing

The security model that emerged from two days of building and retreating:

  • Root boundary: Admin script and secrets directory only
  • Mechanical verification: TOTP with independent double-check
  • Scope limitation: command allowlist, no arbitrary execution
  • Service isolation: A dedicated service user owns credentials, Overwatch runs as user
  • Monitoring: docker group for read access, accepted risk on a single-user box

None of these are novel ideas. They’re the principle of least privilege, applied after learning the hard way what happens when you don’t.