Skip to content

feat(kubernetes): add sidecar sandbox topology#2016

Draft
TaylorMutch wants to merge 7 commits into
mainfrom
feat/kubernetes-sidecar-topology
Draft

feat(kubernetes): add sidecar sandbox topology#2016
TaylorMutch wants to merge 7 commits into
mainfrom
feat/kubernetes-sidecar-topology

Conversation

@TaylorMutch

Copy link
Copy Markdown
Collaborator

Summary

Add an opt-in Kubernetes supervisor sidecar topology that moves pod-level network enforcement and gateway forwarding into a dedicated network sidecar. The default combined topology remains unchanged, while sidecar mode lets the agent container run as the resolved sandbox UID/GID with runAsNonRoot, no privilege escalation, and all Linux capabilities dropped.

This is a draft for early feedback on the topology and security tradeoffs. Sidecar mode preserves gateway session and SSH behavior, but intentionally runs the process supervisor in network-only mode, so filesystem policy, process privilege dropping, and process/binary identity checks are not applied there.

Related Issue

References #1973.
References #1827.
References #981.
References #899.
References #1305.

Changes

  • Accept numeric sandbox UIDs and thread resolved UID/GID values through policy, supervisor, Docker/Podman, Kubernetes, and VM paths.
  • Resolve Kubernetes sandbox UID/GID from explicit config or OpenShift SCC namespace annotations, with non-OpenShift fallback to UID/GID 1000.
  • Add Kubernetes supervisor_topology / Helm supervisor.topology values for combined and sidecar modes.
  • Render sidecar-mode sandbox pods with a privileged network init container, non-root network sidecar, and unprivileged agent container.
  • Add process-supervisor network-only behavior for sidecar mode while keeping SSH/session relay behavior intact.
  • Add sidecar e2e Helm values and Skaffold profile support.
  • Document the topology choice, permission model, and network-only tradeoffs in Kubernetes and reference docs.
  • Update sandbox infrastructure/debugging docs for the new Helm/dev environment flow.

Testing

  • mise run pre-commit passes.
  • cargo check -p openshell-core -p openshell-supervisor-process -p openshell-sandbox -p openshell-driver-kubernetes passes.
  • cargo test -p openshell-driver-kubernetes --lib passes.
  • cargo test -p openshell-supervisor-process --lib passes.
  • cargo test -p openshell-sandbox --lib passes.
  • HELM_K3S_LB_HOST_PORT=18080 mise run e2e:kubernetes:sidecar passes.

Checklist

  • Follows Conventional Commits.
  • Commits are signed off (DCO).
  • Architecture docs updated (if applicable).

sjenning and others added 7 commits June 25, 2026 15:32
Allow run_as_user and run_as_group to be either the literal 'sandbox'
or a numeric UID/GID within [1000, 2_000_000_000]. This removes the
hard dependency on a baked-in 'sandbox' user in container images,
enabling compute drivers to inject resolved UIDs at sandbox creation.

Phase 1 of #1959.

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Allow run_as_user and run_as_group to be numeric UIDs/GIDs, removing
the hard dependency on a baked-in 'sandbox' user in container images.

Changes:
- validate_sandbox_user(): accepts numeric UIDs without passwd lookup
  (logs OCSF event); keeps passwd check for "sandbox" name; rejects
  non-numeric non-sandbox strings that fail passwd lookup
- prepare_filesystem(): passes numeric UIDs/GIDs directly to chown()
  instead of requiring a passwd entry
- drop_privileges(): resolves numeric UIDs/GIDs directly via UID::from_raw
  / Gid::from_raw; skips initgroups when target uid matches current euid;
  uses guard conditions before setgid/setuid calls
- session_user_and_home(): falls back to ("{uid}", "/sandbox") for
  numeric UIDs, avoiding a passwd lookup that will fail

Re-exports MIN_SANDBOX_UID and MAX_SANDBOX_UID from openshell-policy
so callers have consistent range constants.

Phase 2 of #1959.

Signed-off-by: Seth Jennings <sjenning@redhat.com>
…hift SCC annotations

Phase 3 of the numeric-UID plan: allow operators to specify explicit
sandbox_uid/sandbox_gid in Kubernetes driver config, auto-detect from
OpenShift SCC namespace annotations, and propagate resolved values to
supervisor container env vars and PVC init container securityContext.

Changes:
- Add sandbox_uid/sandbox_gid fields to KubernetesComputeConfig
- Add SANDBOX_UID/SANDBOX_GID env var constants to openshell-core
- Implement resolve_sandbox_identity() to fetch namespace annotations
  and auto-detect OpenShift SCC UID ranges (sa.scc.uid-range)
- Pass resolved UID/GID through SandboxPodParams to pod spec builder
- Inject SANDBOX_UID/SANDBOX_GID env vars into supervisor container
- Update PVC init container securityContext with resolved UID/GID
  instead of hard-coded root
- Add comprehensive unit tests for resolution logic and annotation
  parsing (resolve_sandbox_uid, resolve_sandbox_gid, OpenShift SCC
  annotation parsing)

Signed-off-by: Seth Jennings <sjenning@redhat.com>
…mples

Phase 4 of the numeric-UID plan: replace hardcoded SANDBOX_UID (10001)
in VM rootfs preparation with configurable sandbox_uid/sandbox_gid fields.

Changes:
- Add sandbox_uid/sandbox_gid to VmDriverConfig with serde derives
- Pass resolved UID/GID through prepare_sandbox_rootfs_from_image_root
  to ensure_sandbox_guest_user which writes /etc/passwd/group/gshadow
- Update BYOC Dockerfile: remove groupadd/useradd, document runtime UID
  injection and the ability to skip baked-in sandbox user
- Update gateway-config.mdx: document sandbox_uid/sandbox_gid for both
  Kubernetes (with OpenShift SCC autodetection) and VM drivers
- Update sandbox-compute-drivers.mdx: add Sandbox User Identity section
  explaining numeric UID support across all compute drivers
- Update rootfs tests to use non-default UIDs, verify config passthrough

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants