Add opt-in option to automatically clean up failed worker pods

Currently, Spacelift WorkerPools retain failed pods indefinitely to allow for debugging. While this is useful for short-term analysis, in certain scenarios (e.g., when a stack’s Project root is removed in Git but drift detection remains active), these failed pods can accumulate rapidly and exhaust the worker pool.

Proposed Feature:
Introduce an optional configuration (e.g., keepFailedPods=false or a TTL-based cleanup) that allows customers to opt-in to automatic cleanup of failed pods, similar to the existing keepSuccessfulPods flag. This would let customers balance between log retention and cluster stability.Value:

  • Prevents runaway resource consumption when jobs repeatedly fail at the workspace preparation stage.

  • Reduces operational burden of manually cleaning up failed pods.

  • Maintains flexibility by keeping the current default (retain failed pods), but provides an escape hatch for customers prioritizing stability.

Workaround
-
Problem
-

Please authenticate to join the conversation.

Upvoters
Status

⚙️ In Progress

Board

💡 Feature Requests

Tags

Kubernetes

Date

6 months ago

Subscribe to post

Get notified by email when there are changes.