Skip to content

Attribute remote SSH session WSFS activity via command origin#5728

Open
sbauersfeld wants to merge 2 commits into
databricks:mainfrom
sbauersfeld:add-remote-ssh-command-origin
Open

Attribute remote SSH session WSFS activity via command origin#5728
sbauersfeld wants to merge 2 commits into
databricks:mainfrom
sbauersfeld:add-remote-ssh-command-origin

Conversation

@sbauersfeld

@sbauersfeld sbauersfeld commented Jun 25, 2026

Copy link
Copy Markdown

Changes

The SSH server bootstrap notebook (experimental/ssh/internal/client/ssh-server-bootstrap.py) now writes RemoteSshServer to /Workspace/.proc/self/metadata/command_origin just before launching the SSH server.

Why

The bootstrap runs as a notebook job on the cluster, so without this, all workspace-file (WSFS) activity from a remote SSH session is attributed to the generic PythonDriver command origin. Writing a dedicated origin makes that activity attributable in WSFS logs.

Tests

  • python3 -m py_compile on the bootstrap script (syntax check).
  • The .proc/.../command_origin write path is exercised server-side by the WSFS TestMetadataCommandOrigin unit test, which confirms the file is writable and that the write updates the command's origin.

This pull request and its description were written by Isaac.

The SSH server bootstrap notebook writes "RemoteSshServer" to
/Workspace/.proc/self/metadata/command_origin so workspace-file activity
from a remote SSH session is attributed to its own WSFS command origin
instead of "PythonDriver". WSFS resolves each request to its leaf-most
registered ancestor, so the SSH server subprocess and the shells it spawns
inherit this origin. Best-effort: never blocks server startup if .proc is
unavailable.

Pairs with the WsfsOperation.CommandOrigin enum value
COMMAND_ORIGIN_REMOTE_SSH_SERVER added in databricks-eng/universe.

Signed-off-by: Scott Bauersfeld <scott.bauersfeld@databricks.com>
Co-authored-by: Isaac
@sbauersfeld sbauersfeld force-pushed the add-remote-ssh-command-origin branch from 81e5f5a to cf9aa85 Compare June 25, 2026 15:15
@sbauersfeld sbauersfeld marked this pull request as ready for review June 25, 2026 15:15

@anton-107 anton-107 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed alongside the backend enum PR (databricks-eng/universe#2127479). The mechanism is sound: WSFS registers command origin per-PID and resolves it by walking up the process ancestry, so the single write of RemoteSshServer at the bootstrap notebook PID propagates to the SSH server subprocess and all session shells — which the existing PR_SET_CHILD_SUBREAPER keeps within the bootstrap's process tree. The best-effort try/except OSError is the right call for clusters where the path isn't writable (serverless/shared), and it correctly avoids turning a metadata hiccup into a bootstrap failure.

Two non-blocking notes:

  • Coordinate merge timing with the proto PR: until COMMAND_ORIGIN_REMOTE_SSH_SERVER is regenerated and deployed to the WSFS logger, sessions log as COMMAND_ORIGIN_UNSPECIFIED. Backward-compatible and self-healing, so order doesn't strictly matter.
  • Worth one e2e confirmation when the proto is live: ssh connect, write a workspace file, and verify the WSFS log line shows commandOrigin: RemoteSshServer.

LGTM.

@github-actions

Copy link
Copy Markdown
Contributor

An authorized user can trigger integration tests manually by following the instructions below:

Trigger:
go/deco-tests-run/cli

Inputs:

  • PR number: 5728
  • Commit SHA: 04fed1f621c6e71bed1408f42bc28bb54d8b081a

Checks will be approved automatically on success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants