Better observability for run times and worker utilization

We run self hosted Spacelift and would like the following observability.

Run Execution Metrics

End-to-end run duration (histogram): Distribution of total wall-clock time from run creation to terminal state. Enables tracking p50/p90/p99 durations, detecting regressions, and setting SLOs on deployment latency. Should support labels for stack, space, run type, and terminal state.


Worker Pool Metrics

Runs queued for workers (gauge): Count of runs currently waiting for a worker, queryable over time. Enables alerting on queue saturation and right-sizing worker pools. Should support labels for stack, space, and worker pool.

Per-run worker wait time (histogram): Distribution of time each run spends waiting for a worker before execution begins. Should support labels for stack, space, and worker pool.

Workaround
None to my knowledge. But happy to hear if there are
Problem
Set SLOs and understand dev experience. And correctly size workers

Please authenticate to join the conversation.

Upvoters
Status

⬆️ Gathering votes

Board

πŸ’‘ Feature Requests

Tags

Workers

Date

2 days ago

Subscribe to post

Get notified by email when there are changes.