Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 42 additions & 24 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 6 additions & 1 deletion architecture/sandbox.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,12 @@ Each sandbox workload has two trust levels:
| Agent child | Runs as an unprivileged user with filesystem, process, and network restrictions applied. |

The supervisor keeps enough privilege to manage the sandbox, but the agent child
loses that privilege before user code runs.
loses that privilege before user code runs. On Linux, child setup clears the
capability bounding set during privilege drop so later execs cannot regain
container-granted capabilities. This is fail-closed: the supervisor retains
`CAP_SETPCAP` solely to perform the clear, and spawning the workload or SSH shell
aborts unless the bounding set ends up empty. A `setpcap` `EPERM` is tolerated
only when the set is already empty; any other outcome fails the spawn.

## Startup Flow

Expand Down
9 changes: 6 additions & 3 deletions crates/openshell-driver-podman/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The container spec in `container.rs` sets these security-critical fields:
|---|---|---|
| `user` | `0:0` | The supervisor needs root inside the container for namespace creation, proxy setup, Landlock, seccomp, and filesystem preparation. |
| `cap_drop` | Selected unneeded defaults | Podman's default capability set is already restricted. The driver drops capabilities the supervisor does not need. |
| `cap_add` | `SYS_ADMIN`, `NET_ADMIN`, `SYS_PTRACE`, `SYSLOG`, `DAC_READ_SEARCH` | Grants supervisor-only capabilities required for namespace setup, process identity, and bypass diagnostics. |
| `cap_add` | `SYS_ADMIN`, `NET_ADMIN`, `SYS_PTRACE`, `SYSLOG`, `DAC_READ_SEARCH`, `SETPCAP` | Grants supervisor-only capabilities required for namespace setup, process identity, bypass diagnostics, and child bounding-set cleanup. |
| `no_new_privileges` | `true` | Prevents privilege escalation after exec. |
| `seccomp_profile_path` | `unconfined` | The supervisor installs its own policy-aware BPF filter. A container-level profile can block Landlock/seccomp syscalls during setup. |
| `mounts` | Private tmpfs at `/run/netns` | Lets the supervisor create named network namespaces in rootless Podman. |
Expand Down Expand Up @@ -98,12 +98,15 @@ openshell sandbox create \
| `SYS_PTRACE` | Reading `/proc/<pid>/exe` and walking process ancestry for binary identity. |
| `SYSLOG` | Reading `/dev/kmsg` for bypass-detection diagnostics. |
| `DAC_READ_SEARCH` | Reading `/proc/<pid>/fd/` across UIDs so the proxy can resolve the binary responsible for a connection. |
| `SETPCAP` | Clearing the restricted child process capability bounding set before exec. |

The driver intentionally keeps Podman's default `SETUID`, `SETGID`, `CHOWN`,
and `FOWNER` capabilities because the supervisor needs them to drop privileges
and prepare writable sandbox directories. It drops unneeded defaults such as
and prepare writable sandbox directories. It also keeps `SETPCAP` until child
setup so `drop_privileges()` can clear the child capability bounding set before
exec. It drops unneeded defaults such as
`DAC_OVERRIDE`, `FSETID`, `KILL`, `NET_BIND_SERVICE`, `NET_RAW`, `SETFCAP`,
`SETPCAP`, and `SYS_CHROOT`.
and `SYS_CHROOT`.

## Supervisor Sideloading

Expand Down
23 changes: 16 additions & 7 deletions crates/openshell-driver-podman/src/container.rs
Original file line number Diff line number Diff line change
Expand Up @@ -877,8 +877,6 @@ pub fn build_container_spec_with_token_and_gpu_devices(
"NET_RAW".into(),
// Not needed: the supervisor does not manipulate file capabilities.
"SETFCAP".into(),
// Not needed: the supervisor does not manage its own capability bounding set.
"SETPCAP".into(),
// Not needed: the supervisor does not call chroot().
"SYS_CHROOT".into(),
],
Expand All @@ -899,13 +897,18 @@ pub fn build_container_spec_with_token_and_gpu_devices(
// Without it the proxy cannot determine which binary made each outbound
// connection and all traffic is denied.
"DAC_READ_SEARCH".into(),
// Child setup clears the capability bounding set before exec, which
// requires CAP_SETPCAP in the supervisor until drop_privileges().
"SETPCAP".into(),
],
// SETUID, SETGID, CHOWN, and FOWNER are intentionally kept from Podman's
// default set and not dropped:
// SETUID, SETGID, SETPCAP, CHOWN, and FOWNER are intentionally kept from
// Podman's default set and not dropped:
// SETUID/SETGID – drop_privileges(): setuid()/setgid()/initgroups() to the
// sandbox user. In rootless Podman cap_drop:ALL removes them
// from the bounding set even though uid=0 owns the user
// namespace — so we keep them by not dropping them explicitly.
// SETPCAP – drop_privileges(): clears the child capability
// bounding set before the sandbox user execs.
// CHOWN – prepare_filesystem(): chown(path, uid, gid) on newly
// created read_write directories so the sandbox user can
// write to them.
Expand Down Expand Up @@ -1451,12 +1454,14 @@ mod tests {
added.contains(&"DAC_READ_SEARCH"),
"missing DAC_READ_SEARCH"
);
assert!(added.contains(&"SETPCAP"), "missing SETPCAP");

// SETUID and SETGID are NOT in cap_add — they remain available from the
// default bounding set because we no longer use cap_drop:ALL. Verify they
// are also not explicitly dropped. Similarly CHOWN and FOWNER must not be
// dropped because prepare_filesystem() calls chown() on newly created
// read_write directories before the supervisor drops privileges.
// are also not explicitly dropped. Similarly SETPCAP, CHOWN and FOWNER
// must not be dropped because child setup clears the bounding set and
// prepare_filesystem() calls chown() on newly created read_write
// directories before the supervisor drops privileges.
let dropped: Vec<&str> = spec["cap_drop"]
.as_array()
.expect("cap_drop should be an array")
Expand All @@ -1473,6 +1478,10 @@ mod tests {
!dropped.contains(&"FOWNER"),
"FOWNER must not be dropped (needed for chown on non-owned files)"
);
assert!(
!dropped.contains(&"SETPCAP"),
"SETPCAP must not be dropped (needed for child bounding-set clear)"
);
assert!(
!dropped.contains(&"ALL"),
"must not use cap_drop:ALL in rootless Podman"
Expand Down
1 change: 1 addition & 0 deletions crates/openshell-supervisor-process/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ libc = "0.2"
rustix = { workspace = true }

[target.'cfg(target_os = "linux")'.dependencies]
capctl = "0.2.4"
landlock = "0.4"
seccompiler = "0.5"
tempfile = "3"
Expand Down
Loading
Loading