Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .agents/skills/debug-openshell-cluster/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,33 @@ kubectl -n openshell get configmap openshell-config -o jsonpath='{.data.gateway\
kubectl -n <sandbox-namespace> get sandbox <sandbox-name> -o jsonpath='{.spec.template.spec.serviceAccountName}{"\n"}'
```

If `supervisor_topology = "sidecar"` is rendered, sandbox pods should have an
`openshell-network-init` init container running `--mode=network-init`, an
`agent` container running `openshell-sandbox --mode=process`, and an
`openshell-supervisor-network` container running `--mode=network`. The init
container owns nftables setup and should be the only sidecar topology container
with `NET_ADMIN`. It also needs `CHOWN`/`FOWNER` to hand shared emptyDir state
to `sidecar_proxy_uid`. The long-running network sidecar runs as
`sidecar_proxy_uid` with primary GID `0` so it can read the root-owned,
group-readable projected service-account token. In sidecar topology the
`openshell-sa-token` projected volume should render `defaultMode: 288` (`0440`);
if the proxy logs `failed to read K8s SA token`, verify this token mode and the
network sidecar security context. The process container should also publish the
workload entrypoint PID to `OPENSHELL_ENTRYPOINT_PID_FILE`
(`/run/openshell-sidecar/entrypoint.pid` by default), and the network sidecar
should read it for binary-scoped policy decisions; if allowed network rules are
all denied, inspect that file and the network sidecar logs.
Inspect all three when sandbox registration or egress enforcement fails:

```bash
kubectl -n openshell get configmap openshell-config -o jsonpath='{.data.gateway\.toml}' | grep supervisor_topology
kubectl -n <sandbox-namespace> get pod <sandbox-pod> -o jsonpath='{range .spec.initContainers[*]}{.name}{" "}{.command}{"\n"}{end}'
kubectl -n <sandbox-namespace> get pod <sandbox-pod> -o jsonpath='{range .spec.containers[*]}{.name}{" "}{.command}{"\n"}{end}'
kubectl -n <sandbox-namespace> logs <sandbox-pod> -c openshell-network-init --tail=200
kubectl -n <sandbox-namespace> logs <sandbox-pod> -c openshell-supervisor-network --tail=200
kubectl -n <sandbox-namespace> logs <sandbox-pod> -c agent --tail=200
```

### Step 6: Check VM-Backed Gateways

Use the VM driver logs and host diagnostics available in the user's environment. Verify:
Expand Down
19 changes: 17 additions & 2 deletions .agents/skills/helm-dev-environment/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,17 @@ mise run helm:skaffold:dev
mise run helm:skaffold:run
```

**Supervisor sidecar topology** (build once and leave running):
```bash
mise run helm:skaffold:run:sidecar
```

Both commands build the `gateway` and `supervisor` images and deploy the OpenShell Helm
chart. The `pkiInitJob` hook (a pre-install Job that runs `openshell-gateway generate-certs`)
generates mTLS secrets on first install. Envoy Gateway opt-in; see the Optional Add-ons section below.
chart. The sidecar profile renders an `openshell-network-init` init container for
nftables setup and a non-root `openshell-supervisor-network` runtime sidecar for
proxying. The `pkiInitJob` hook (a pre-install Job that runs `openshell-gateway
generate-certs`) generates mTLS secrets on first install. Envoy Gateway opt-in;
see the Optional Add-ons section below.

The gateway Service uses ClusterIP. Access is via Envoy Gateway (port `8080`) or `kubectl port-forward`.

Expand Down Expand Up @@ -126,6 +134,12 @@ openshell sandbox list --gateway-endpoint https://localhost:8090
mise run helm:skaffold:delete
```

For a sidecar-profile deployment:

```bash
mise run helm:skaffold:delete:sidecar
```

### Delete the cluster entirely

```bash
Expand Down Expand Up @@ -250,6 +264,7 @@ for dependencies still declared in `Chart.yaml`.
| `deploy/helm/openshell/ci/values-gateway.yaml` | Envoy Gateway GRPCRoute + Gateway overlay |
| `deploy/helm/openshell/ci/values-high-availability.yaml` | HA test overlay (`replicaCount: 2` with external PostgreSQL Secret) |
| `deploy/helm/openshell/ci/values-keycloak.yaml` | Keycloak OIDC overlay |
| `deploy/helm/openshell/ci/values-sidecar.yaml` | Supervisor sidecar topology overlay for Kubernetes e2e/dev |
| `deploy/helm/openshell/ci/values-spire.yaml` | SPIFFE/SPIRE provider token grant overlay |
| `deploy/helm/openshell/ci/values-spire-stack.yaml` | SPIRE hardened chart values for local dev |
| `deploy/helm/openshell/ci/values-tls-disabled.yaml` | Lint-only: TLS + auth disabled (reverse-proxy edge termination) |
Expand Down
3 changes: 3 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 5 additions & 4 deletions architecture/build.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,10 +91,11 @@ Runtime layout:
as a release artifact. Linux GNU VM driver binaries must not reference
`GLIBC_*` symbols newer than `GLIBC_2.28`; release workflows verify this
before publishing artifacts.
- **Supervisor**: `scratch` base, static musl binary at `/openshell-sandbox`.
Static linkage is required because the image is mounted/extracted into
sandbox environments (Docker extraction, Podman image volumes, Kubernetes
init-container copy-self) and cannot rely on a dynamic loader.
- **Supervisor**: Alpine base with `nftables`, static musl binary at
`/openshell-sandbox`. Static linkage keeps the binary usable when the image
is mounted/extracted into sandbox environments (Docker extraction, Podman
image volumes, Kubernetes init-container copy-self), while `nftables` supports
Kubernetes supervisor sidecar egress enforcement.

Gateway image builds bake the corresponding supervisor image tag into the
gateway binary so Docker sandboxes do not depend on `:latest` by default.
Expand Down
16 changes: 15 additions & 1 deletion architecture/compute-runtimes.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,14 +81,28 @@ The supervisor must be available inside each sandbox workload:
|---|---|
| Docker | Bind-mounted local supervisor binary, or a binary extracted from the configured supervisor image. |
| Podman | Read-only OCI image volume containing the supervisor binary. |
| Kubernetes | Sandbox pod image or pod template configuration. |
| Kubernetes | Supervisor image side-loaded into the sandbox pod by image volume or init container. |
| VM | Embedded in the guest rootfs bundle. |
| Extension | Defined by the out-of-tree driver. |

Driver-controlled environment variables must override sandbox image or template
values for sandbox ID, sandbox name, gateway endpoint, relay socket path, TLS
paths, and command metadata.

Kubernetes can run the supervisor in the default combined topology or in a
sidecar topology. Combined mode keeps network and process supervision in the
agent container. Sidecar mode runs network enforcement, the proxy, and gateway
loopback forwarding in a dedicated sidecar, while the agent container runs only
the process-supervision leaf and launches the user workload after the sidecar
signals readiness. In sidecar mode, an init container performs the privileged
pod-network nftables setup with `NET_ADMIN` and hands shared state ownership to
the configured proxy UID; the long-running network sidecar runs as that UID and
does not keep `NET_ADMIN`. The agent container runs as the resolved sandbox
UID/GID with no added Linux capabilities. Sidecar mode preserves gateway session
and SSH behavior, but treats the process leaf as network-only: Landlock
filesystem policy, process privilege dropping, and process/binary identity
checks are not applied there.

## Images

The gateway image and Helm chart are built from this repository. Sandbox images
Expand Down
7 changes: 6 additions & 1 deletion crates/openshell-core/src/grpc_client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -167,9 +167,14 @@ async fn build_plain_channel(endpoint: &str) -> Result<Channel> {
.into_diagnostic()
.wrap_err_with(|| format!("failed to read client key from {key_path}"))?;

let tls_config = ClientTlsConfig::new()
let mut tls_config = ClientTlsConfig::new()
.ca_certificate(Certificate::from_pem(ca_pem))
.identity(Identity::from_pem(cert_pem, key_pem));
if let Ok(server_name) = std::env::var(sandbox_env::GATEWAY_TLS_SERVER_NAME)
&& !server_name.is_empty()
{
tls_config = tls_config.domain_name(server_name);
}

ep = ep
.tls_config(tls_config)
Expand Down
56 changes: 56 additions & 0 deletions crates/openshell-core/src/sandbox_env.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,47 @@ pub const SANDBOX_COMMAND: &str = "OPENSHELL_SANDBOX_COMMAND";
/// Deployment-controlled telemetry toggle propagated to the sandbox supervisor.
pub const TELEMETRY_ENABLED: &str = "OPENSHELL_TELEMETRY_ENABLED";

/// Supervisor pod/runtime topology. Kubernetes sidecar mode sets this to
/// `"sidecar"`; the default combined supervisor path omits it.
pub const SUPERVISOR_TOPOLOGY: &str = "OPENSHELL_SUPERVISOR_TOPOLOGY";

/// Network enforcement backend selected by the compute driver.
pub const NETWORK_ENFORCEMENT_MODE: &str = "OPENSHELL_NETWORK_ENFORCEMENT_MODE";

/// Process enforcement mode selected by the compute driver.
///
/// The default when unset is `"full"`, where the process supervisor enforces
/// filesystem/process policy before spawning workloads. Kubernetes sidecar
/// topology sets this to `"network-only"` so the process wrapper can run as
/// the sandbox UID without Linux capabilities while preserving SSH/session
/// behavior.
pub const PROCESS_ENFORCEMENT_MODE: &str = "OPENSHELL_PROCESS_ENFORCEMENT_MODE";

/// Whether network policy evaluation must bind requests to the peer binary.
///
/// The default when unset is `"required"`. Kubernetes sidecar experiments may
/// set this to `"relaxed"` to enforce endpoint and L7 policy without per-binary
/// `/proc` identity binding.
pub const NETWORK_BINARY_IDENTITY: &str = "OPENSHELL_NETWORK_BINARY_IDENTITY";

/// File written by the network supervisor when sidecar networking is ready.
pub const SUPERVISOR_READY_FILE: &str = "OPENSHELL_SUPERVISOR_READY_FILE";

/// File written by the process supervisor with the workload entrypoint PID and
/// read by the network sidecar for process/binary-bound network policy checks.
pub const ENTRYPOINT_PID_FILE: &str = "OPENSHELL_ENTRYPOINT_PID_FILE";

/// Loopback address where the network sidecar forwards gateway gRPC traffic.
pub const GATEWAY_FORWARD_ADDR: &str = "OPENSHELL_GATEWAY_FORWARD_ADDR";

/// Optional TLS server name used when the process supervisor reaches the
/// gateway through a loopback TCP forward.
pub const GATEWAY_TLS_SERVER_NAME: &str = "OPENSHELL_GATEWAY_TLS_SERVER_NAME";

/// Directory where the network supervisor writes the proxy CA files consumed
/// by workload child processes.
pub const PROXY_TLS_DIR: &str = "OPENSHELL_PROXY_TLS_DIR";

/// Path to the CA certificate for mTLS communication with the gateway.
pub const TLS_CA: &str = "OPENSHELL_TLS_CA";

Expand Down Expand Up @@ -71,3 +112,18 @@ pub const K8S_SA_TOKEN_FILE: &str = "OPENSHELL_K8S_SA_TOKEN_FILE";
/// exchanges without using SPIFFE for gateway authentication.
pub const PROVIDER_SPIFFE_WORKLOAD_API_SOCKET: &str =
"OPENSHELL_PROVIDER_SPIFFE_WORKLOAD_API_SOCKET";

/// Resolved sandbox UID used to override `run_as_user` when the policy
/// specifies a numeric value instead of the hardcoded "sandbox" user name.
///
/// Set by compute drivers (Kubernetes, Docker, VM) from resolved config or
/// cluster autodetection. The supervisor reads this at startup and uses it
/// directly with `setuid()` / `chown()` without requiring an `/etc/passwd`
/// entry in the sandbox image.
pub const SANDBOX_UID: &str = "OPENSHELL_SANDBOX_UID";

/// Resolved sandbox GID paired with [`SANDBOX_UID`].
///
/// Used alongside UID for PVC init container `chown` operations and when the
/// supervisor drops privileges to a group other than the UID's primary group.
pub const SANDBOX_GID: &str = "OPENSHELL_SANDBOX_GID";
1 change: 1 addition & 0 deletions crates/openshell-driver-kubernetes/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ path = "src/main.rs"

[dependencies]
openshell-core = { path = "../openshell-core", default-features = false }
openshell-policy = { path = "../openshell-policy" }

tokio = { workspace = true }
tonic = { workspace = true, features = ["transport"] }
Expand Down
24 changes: 21 additions & 3 deletions crates/openshell-driver-kubernetes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,27 @@ pods do not need direct external ingress for SSH.

## Container Security Context

The driver grants the sandbox agent container the Linux capabilities the
supervisor needs for namespace setup and policy enforcement. It can also request
a Kubernetes AppArmor profile through `app_armor_profile`.
The default `combined` supervisor topology grants the sandbox agent container
the Linux capabilities the supervisor needs for namespace setup and process,
filesystem, and network policy enforcement.

The `sidecar` supervisor topology moves pod-level network setup into a root init
container and runs the long-lived network sidecar as a non-root UID with no
added Linux capabilities. The agent container also runs as the resolved sandbox
UID/GID with `allowPrivilegeEscalation: false` and `capabilities.drop: ["ALL"]`.
In this mode OpenShell preserves gateway session and SSH behavior, but the
process supervisor runs in network-only mode and does not apply Landlock
filesystem policy, process privilege dropping, or process/binary identity
checks. Network endpoint and L7 policy remain enforced by the network sidecar.

Sidecar mode uses the pod `fsGroup` to make the projected service-account token
and sandbox client TLS secret group-readable so the non-root process supervisor
can authenticate to the gateway. Treat the agent container as trusted with
respect to those in-pod gateway credentials until a narrower credential handoff
exists.

The driver can request a Kubernetes AppArmor profile through
`app_armor_profile`.

Supported values are `Unconfined`, `RuntimeDefault`, and
`Localhost/<profile-name>`. An empty or unset value omits
Expand Down
Loading
Loading