Safety Model
How SikkerKey prevents split credentials, data loss, and zombie agents.
The sync agent manages database credentials. A bug here doesn't just lose data — it locks users out of their production database. The safety model is designed around one principle: the live secret must always match what the database accepts.
Two-Phase Rotation
SikkerKey never updates the live secret value until the database is confirmed to have the new password. This is the core safety guarantee.
Without two-phase (naive approach)
- SikkerKey rotates the password
- Agent polls, sees the change, tries to apply
- If the agent is down or the apply fails, the database has the old password but SikkerKey has the new one
- Every service reading from SikkerKey gets credentials that don't work
With two-phase (what SikkerKey does)
- SikkerKey generates a new password and holds it as pending
- Agent detects the pending rotation and applies it to the database
- Agent verifies the new password works by connecting as the managed user
- Agent confirms back to SikkerKey
- SikkerKey promotes the pending value to the live secret
- If any step fails, the live secret stays at the old (working) password
The key difference: the secret value that machines read via SDK/CLI is always the password that the database actually accepts. There is no window where they can disagree.
Verification After Apply
The agent does not trust that ALTER ROLE succeeded just because it didn't return an error. After applying the new credentials, the agent makes a separate connection to the database using the managed username and the new password. Only if this test connection succeeds does the agent confirm.
This catches edge cases like:
- The ALTER command succeeded but the password policy rejected the value
- The connection pool returned a stale connection that didn't reflect the change
- A different process reverted the password between the ALTER and the verify
Rollback on Verify Failure
If the apply succeeds but verification fails (the new password doesn't work despite the ALTER returning success), the agent:
- Fetches the current live secret from SikkerKey (which is still the old password)
- Applies the old password back to the database
- Rejects the rotation with both the verify error and the rollback result
- SikkerKey clears the pending state and logs the error
If the rollback also fails, the error message includes both failures so the operator can intervene manually.
Agent Lock
Each managed secret is locked to exactly one machine. The first machine to fetch the sync config becomes the registered agent (agentMachineId is recorded). Subsequent requests from a different machine are rejected with HTTP 409 on sync-config fetch and HTTP 403 on confirm/reject.
This prevents:
- Two agents racing to apply the same rotation (double ALTER ROLE)
- A decommissioned machine's agent interfering with the replacement
To transfer agent ownership to a new machine, delete the managed secret and recreate it, or reset the agent lock from the employee portal.
Zombie Prevention
The agent exits cleanly when the managed secret is deleted or access is revoked:
- Secret deleted (404 on sync-config poll): agent prints "Secret deleted or access revoked" and exits
- Access revoked (403 on sync-config poll): same behavior
- Agent disabled: returns 404 (config not found with
enabled = true)
For agents running as system services (systemd/launchd), the exit triggers a restart. The restart immediately hits the same 404/403 and exits again. Systemd's restart backoff eventually stops retrying.
Pending Rotation Hold
SikkerKey will not generate a new pending rotation if:
- A rotation is already pending: the previous one must be confirmed or rejected first
- The agent is unhealthy: no heartbeat for 90+ seconds
- The agent status is "no_agent": no agent has ever connected
This prevents:
- Stacking multiple unconfirmed rotations
- Generating rotations that nobody will apply
- Wasting entropy on passwords that will never be used
Idempotent Confirm
Each pending rotation has a unique rotationId (UUID). The confirm endpoint verifies the ID matches before promoting. If the agent sends a duplicate confirm (e.g. retry after network timeout), the second request is rejected with "Rotation ID mismatch" because the first confirm already cleared the pending state. The secret is not double-promoted.
Retry After Failure
When a rotation fails, rotationStatus is set to failed with the error message. The next rotation timer tick generates a new pending rotation with a fresh password (the failed password is discarded). The user can also click "Retry rotation" in the dashboard to reset the status to idle, triggering a new rotation on the next cycle.
Failed passwords are not retried. A fresh password is generated each time. This is intentional — a password that failed to apply should not be reused.
What the Dashboard Shows
The dashboard surfaces agent health and rotation state for each managed secret:
- Agent status: Healthy, Unhealthy, Error, No Agent
- Rotation status: Idle (normal), Pending (waiting for agent), Failed (with error message and retry button)
- Last heartbeat: when the agent last checked in
- Last rotated: when the last rotation was confirmed (not attempted)
Status changes are audit-logged as agent_status_change. Rotation failures are logged as secret_rotate_denied.