Agent-based stack refactor and cleanup

We have hundreds of stacks. More than half of them are in an error state. Most of those correspond to development/test infrastructure, and staffing a team to do the cleanup would be expensive.

Lots of those stacks are “someone manually upgraded the database, so we need to create a PR to reflect the new DB version” or “this resource got created manually, and now the apply is failing due to the duplicate resource creation attempt.”

In addition to failing stacks, some stacks are simply in disrepair. Missing imports, missing move blocks, and other defects mean that maintaining stack state is difficult.

Obviously teams using Terraform should be consistent at using Terraform, rather than performing manual operations, but sometimes that’s significantly more expensive/time-consuming than clicking a button in AWS for non-production systems.

There’s also a very large category of refactor-type work that’s very expensive to do manually, where you want to restructure some modules or move resources across stacks.

I’d like to propose that these tasks are exceptionally appropriate for LLM agent-based workflows. We already have the technology to restrict permissions for terraform plans to dramatically minimize the risk of malicious configuration being introduced at that phase. Spacelift already has sophisticated workflow approvals. Often, the thing I want is for something or someone to operate in a loop making configuration changes and manipulating the state until terraform plan produces a plan with no delta, e.g. “No changes. Your infrastructure matches the configuration.“

That’s an incredibly clear objective to give to an LLM! I’d be thrilled if Spacelift implemented tooling to help teams achieve better resource management.

Spacelift

Agent-based stack refactor and cleanup

Subscribe to post

Subscribe to post