Better observability for run times and worker utilization

We run self hosted Spacelift and would like the following observability.

Run Execution Metrics

End-to-end run duration (histogram): Distribution of total wall-clock time from run creation to terminal state. Enables tracking p50/p90/p99 durations, detecting regressions, and setting SLOs on deployment latency. Should support labels for stack, space, run type, and terminal state.

Worker Pool Metrics

Runs queued for workers (gauge): Count of runs currently waiting for a worker, queryable over time. Enables alerting on queue saturation and right-sizing worker pools. Should support labels for stack, space, and worker pool.

Per-run worker wait time (histogram): Distribution of time each run spends waiting for a worker before execution begins. Should support labels for stack, space, and worker pool.

Spacelift

Better observability for run times and worker utilization

We run self hosted Spacelift and would like the following observability.

Run Execution Metrics

Subscribe to post

Subscribe to post

Spacelift

Better observability for run times and worker utilization

We run self hosted Spacelift and would like the following observability.Run Execution Metrics

Subscribe to post

Subscribe to post

We run self hosted Spacelift and would like the following observability.

Run Execution Metrics