Speeding up Terraform
caching with OverlayFS

Published Apr 13, 2025 by Ricard Bejarano

The Terraform plugin cache does not support↗ concurrent terraform init runs.

This is a massive inefficiency for Terraform users past a certain scale, since all we can do is disable caching or serialize all inits, both of which are suboptimal past a certain number of providers or concurrent plan/apply operations, respectively.

From what I could find in Terraform’s circles, this is a known problem↗ and there’s intent of fixing it↗, but it’s complicated↗. And since it seems like we’re a long way from a native solution, we had to get creative.

Introducing OverlayFS

OverlayFS↗ is a Linux filesystem which combines the contents of multiple read-only directories with a writable layer on top, into a single volume.

Kubernetes and Docker use it, CoreOS used it, live Linux distributions use it, etc.

But what brings such a low-level tool like OverlayFS this high up the stack?

We’re going to use OverlayFS to give each of our concurrent Terraform inits the illusion that they’re sharing the same Terraform plugin cache, but redirect writes to their respective writable local layers on top, which we can then sync back if necessary.

Setup

The following steps need to be performed for each terraform init.

1. Before Terraform init

First, we’ll need the following directories:

mkdir --parents /path/to/cache .terraform/cache/{upperdir,workdir,mountdir}

Now, let’s mount our overlay:

mount -t overlay overlay -o 'lowerdir=/path/to/cache,upperdir=.terraform/cache/upperdir,workdir=.terraform/cache/workdir' .terraform/cache/mountdir

This creates an OverlayFS using:

workdir is internal to OverlayFS and we shall not touch it.

2. During Terraform init

Now, let’s enable Terraform’s plugin cache:

export TF_PLUGIN_CACHE_DIR=".terraform/cache/mountdir"

If you don’t use .terraform.lock.hcl files, you also have to set:

export TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE="true"

These only need to be set for terraform init, all subsequent Terraform commands will go off what they find in the .terraform directory.

3. After Terraform init

Once terraform init is completed, we’ll need to sync back any new providers that might’ve been downloaded into the local cache, back to the central cache:

if find .terraform/cache/upperdir -type f -print -quit | grep -q .; then
  flock --timeout 60 /path/to/cache \
    rsync --archive --ignore-existing .terraform/cache/upperdir/ /path/to/cache/
fi

Here’s a breakdown of what this does:

It’s okay to write to the central cache while other Terraform inits are running, changes to lowerdir are not reflected in already-mounted overlay filesystems.

You can tweak the flock timeout setting to your own particular threshold, you make that trade-off, 60 seconds is just the default I chose.

Note: if you’re really going to try this, email me, the code here optimizes for illustrating the mechanism, rather than performance. There are some better ways to do it.

4. After Terraform plan/apply

Once you’re done with Terraform, make sure to clean up:

umount .terraform/cache/mountdir

This is to prevent “device or resource busy” errors when deleting .terraform.
You should do this in a trap↗, this is code is to illustrate the mechanism.

Results

This implements a sort of write-back↗ Terraform plugin cache using OverlayFS:

If two concurrent inits are missing the same provider version, we will download it twice, but since we serialize rsync it will only be written back once to the central cache.

Thanks for dropping by!

Did you find what you were looking for?
Let me know if you didn't.

Have a great day!