When creating or updating a Coder workspace, you may see Terraform fail with an error like:
Failed to install provider Error while installing coder/coder vX.Y.Z: open /home/coder/.cache/coder/provisioner-N/tf/registry.terraform.io/<provider_org>/<provider_name>/X.Y.Z/linux_amd64/terraform-provider-<provider_name>_vX.Y.Z: text file busy
Why this happens
This error comes from the Linux kernel and means a binary is being overwritten while it is still in use. In the context of Terraform and Coder:
- Concurrent Terraform runs: Two or more Terraform processes attempt to install or use the same provider binary in the same cache path at the same time.
- Stale provider processes: A previous workspace build may not have exited cleanly, leaving the provider binary open. A new run that tries to overwrite that file will fail.
-
Shared or global plugin cache: If multiple provisioner daemons or runners share a
TF_PLUGIN_CACHE_DIRor a common~/.cachevolume, race conditions are likely. - External file locks: In some environments, antivirus or filesystem security agents can hold temporary locks on provider binaries during install, leading to the same error.
Troubleshooting steps
When you encounter this error:
- Check if multiple workspace operations are running concurrently on the same provisioner. If so, wait for one to finish before starting another.
- Verify that the provisioner process does not have a shared
TF_PLUGIN_CACHE_DIR. Coder manages per-runner caches for its internal provisioners under~/.cache/coder/provisioner-N/tf, where N is the index of the provisioner. - Confirm your template includes a committed
.terraform.lock.hclfile. This reduces provider re-installs between runs. -
On the provisioner host, list active provider processes:
ps aux | grep terraform-provider-
Kill any stale processes that are still holding open provider binaries.
- If antivirus or endpoint protection is present, configure exclusions for the Coder provisioner cache directory.
Mitigation and prevention
- Run isolated caches: Ensure each provisioner daemon has a unique cache path. In the case of external provisioner daemons, using separate users and isolated home directories can reduce the likelihood of encountering this issue when running multiple daemons on a single node.
-
Avoid global plugin caches: Do not set
TF_PLUGIN_CACHE_DIRfor provisioners. - Set explicit versions for providers in templates: Using explicit provider versions in your templates enables Terraform to better use the provider cache and prevent providers that have been updated from being fetched, potentially attempting to overwrite a binary in use.
-
Lock provisioners for all OS/Architectures in use: Use the
terraform provider lockand its--platformflag to create references in the.terraform.lock.hclfor each platform's provisioner binary. This can prevent unexpected provider downloads on unspecified operating system and machine architectures. -
Lock files in templates: Always commit
.terraform.lock.hclby runningterraform initlocally before uploading the template to Coder so provider versions are pinned and stable. Reference: https://coder.com/docs/tutorials/best-practices/speed-up-templates#set-up-terraform-provider-caching - Scale provisioners: If you need higher concurrency, deploy additional provisioner replicas so each has its own cache and lifecycle.
How you can help
If you find terraform-provider-coder binaries that are hanging and not exiting when they should, you can help us debug the issue by collecting a core dump of the hanging process, so that we can examine why it's hanging.
Collecting a core dump involves:
- Getting a running shell inside the Coder server or external provisioner where the build is failing (e.g. via
kubectl exec) -
Verifying that an errant
terraform-provider-coderprocess is running there using the following command.ps aux | grep terraform-provider- - Increasing its resource limits to allow it to dump a core file with prlimit
- Triggering the core dump with
gdborgcore
If you can submit a core dump, attaching it to a GitHub issue, or for Coder customers a support ticket, would be much appreciated.
Another line of inquiry is looking into the filesystem_mirror option, so if you are using a .terraform.rc file, attaching a sanitized version of the file to the ticket would be helpful as well.
Summary
The text file busy error is not specific to Coder, but arises from how Terraform installs and executes providers. By ensuring isolation of cache directories, committing lock files, and avoiding concurrent writes to provider binaries, you can prevent this error from disrupting workspace builds.