Using an infrastructure-as-code (IAC) platform, such as Terraform or Palumi, has quickly become one of the primary ways organizations realize the cloud’s promise of scale and efficiency. Entire global, multi-cloud deployments can be provisioned instantly via consistent, repeatable templates that have been checked into source control and validated through an enterprise change management process. There is no doubt this has led to improved efficiency, security, and a host of other operational benefits. However, it’s not without some risks, and these risks can largely be addressed with a robust visibility strategy.
Changes can, and likely will, happen outside of your IAC pipeline and the cloud infrastructure state in your repository doesn’t necessarily include all the attributes and context of your runtime infrastructure. So, it’s critical that one accounts for these blind spots with visibility into the actual runtime cloud infrastructure to augment the IAC process. There is no better feedback into your IAC pipeline than information about what actually got deployed, what’s actually running, and whether all those things are configured correctly.
Terraform has some built-in ability to detect configuration drift, that is, those changes made to cloud resources that happen outside of terraform. The command ‘terraform plan’ will let you preview the changes Terraform is going to make to your infrastructure the next time ‘terraform apply’ is run. This allows you to determine the effect of changes before submitting to code review.
Here’s what happens when I’m deploying changes through Terraform. In this case I added an additional policy to this IAM user. Next to the new resource’s configuration values is a ‘+’ which denotes something being created or added to your cloud footprint.
Now, here’s what happens when Terraform finds changes to resources made outside of Terraform. Next to this resource and values is a “-” which means Terraform will remove these out-of-band changes when the terraform plan is applied. This is how Terraform natively handles drift. It’s extremely useful but not bulletproof.
In the AWS console I edited my Security Group inbound rules to allow TCP traffic on port 1521 to my RDS instance running Oracle.
Once this change is made ‘terraform plan’ will show the following. Notice the ‘-’ next to my new inbound rule. The next time I apply my terraform plan it will destroy the new inbound rule and re-instantiate the original inbound rule configuration.
In this particular environment I have a VPC, internet gateway, security group, an IAM user, and an RDS instance all managed by Terraform. As I mentioned earlier, there are situations where changes made outside of Terraform are not captured by ‘terraform plan’. For example, and I’ve seen this happen many times, a new access key is added to the user and ‘terraform plan’ doesn’t detect the change.
Here I went into the AWS console and added a second access key to our IAM user.
However, when I run ‘terraform plan’ the output is that there are no changes and my infrastructure matches the current configuration. We know this isn’t true so what next?
I can use the OpsCompass CLI to determine whether there has been any drift in my runtime configurations. If terraform says there have been no changes but Opscompass is telling me there has been changes, then I know that I need to investigate further.
Notice that Opscompass is seeing 1 ‘open drift’ on our IAM user resource which is at odds with our Terraform state. In addition to the open drift, I can also see that this user resource has been misconfigured and is failing compliance checks. From here I can continue my investigation into this resource, its history, and other important context.
Combining OpsCompass with ‘terraform plan’ helps close the visibility gap that’s inherent to IAC driven environments and allows you to find problems and risks faster.