Add opt-in option to automatically clean up failed worker pods

Currently, Spacelift WorkerPools retain failed pods indefinitely to allow for debugging. While this is useful for short-term analysis, in certain scenarios (e.g., when a stack’s Project root is removed in Git but drift detection remains active), these failed pods can accumulate rapidly and exhaust the worker pool.

Proposed Feature:
Introduce an optional configuration (e.g., keepFailedPods=false or a TTL-based cleanup) that allows customers to opt-in to automatic cleanup of failed pods, similar to the existing keepSuccessfulPods flag. This would let customers balance between log retention and cluster stability.Value:

Prevents runaway resource consumption when jobs repeatedly fail at the workspace preparation stage.
Reduces operational burden of manually cleaning up failed pods.
Maintains flexibility by keeping the current default (retain failed pods), but provides an escape hatch for customers prioritizing stability.

Spacelift

Add opt-in option to automatically clean up failed worker pods

Subscribe to post

Subscribe to post