What are the most important Terraform best practices for beginners?

Start with three: use variables instead of hardcoded values, configure a remote backend with state locking, and pin your provider versions. These alone prevent the majority of beginner mistakes.

How do I manage Terraform state across multiple teams?

Use Terraform Cloud workspaces or separate S3 state files per team/environment with strict IAM policies. Never share a single state file across teams.

What is terraform fmt and why does it matter?

terraform fmt standardizes HCL formatting according to Terraform's style conventions. It eliminates formatting debates in code reviews and makes diffs meaningful rather than noisy.

Should I use Terraform modules for every resource?

Not necessarily — single-resource modules add overhead without value. Create modules when a pattern is reused across multiple environments or projects, typically grouping 3–10 related resources.

How do I prevent Terraform from destroying production databases?

Use lifecycle { prevent_destroy = true } on all stateful resources. Also enforce this via Sentinel or OPA policies that block destroy operations on tagged production resources without explicit override.

Terraform Best Practices: 15 Mistakes Costing 20+ Hours/Week | CloudOps AI

Your Terraform Is Technically Working. That's the Problem.

Most Terraform codebases don't fail dramatically. They decay slowly — one hardcoded value here, one skipped lock there, one "I'll refactor this later" module that never gets refactored. Six months in, your team spends more time fighting the codebase than shipping infrastructure.

We audited dozens of engineering teams and found a consistent pattern: the same 15 mistakes were responsible for the majority of wasted hours — debugging drift, untangling state corruption, re-doing work that should have been automated.

This article names every one of them — and for each, explains how to fix it permanently.

Mistake #1: Hardcoding Values Instead of Using Variables

Time wasted per week: 2–3 hours

# ❌ What most teams write
resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.large"
  subnet_id     = "subnet-0bb1c79de3EXAMPLE"
}

Hardcoded AMI IDs, instance types, and subnet IDs turn your Terraform into a fragile, environment-specific mess. When you need to deploy to staging, update the AMI, or change regions, you're doing a full find-and-replace across dozens of files — with no safety net.

The fix:

# ✅ Parameterized, reusable
variable "instance_type" {
  description = "EC2 instance type"
  type        = string
  default     = "t3.large"
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = var.instance_type
  subnet_id     = var.subnet_id
}

Every environment-specific value belongs in variables.tf with a type, description, and sensible default. Use terraform.tfvars files per environment, never inline literals.

CloudOps AI prevents this by auto-extracting hardcoded values into variables when generating or importing Terraform code — so you start clean, not technical-debt-first.

Mistake #2: No Remote State Backend

Time wasted per week: 1–2 hours

Local terraform.tfstate is a single-engineer solution masquerading as a team workflow. The moment two people run terraform apply from different machines, you have state divergence — and debugging it is brutal.

The fix:

terraform {
  backend "s3" {
    bucket         = "my-tf-state-prod"
    key            = "infra/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

Use S3 + DynamoDB for state locking, or Terraform Cloud/Enterprise for a fully managed experience. This is non-negotiable for any team with more than one engineer.

CloudOps AI generates a ready-to-use backend.tf as part of every code export, with S3 backend, DynamoDB lock table, and encryption configured by default.

Mistake #3: Missing State Locking

Time wasted per week: 2–4 hours (when it breaks)

Even teams with remote backends often skip DynamoDB state locking. The result: two concurrent terraform apply runs corrupt the state file. Recovering from a corrupted state file can take an afternoon.

The fix:

Create the DynamoDB table once:

resource "aws_dynamodb_table" "terraform_lock" {
  name         = "terraform-state-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Then reference it in every backend config. Always. No exceptions.

Mistake #4: Not Using Modules

Time wasted per week: 3–4 hours

Copy-pasting the same VPC, security group, or ECS task configuration across five projects isn't reuse — it's five separate things to maintain. One security fix means five PRs. One breaking change in AWS means five broken configs.

The fix:

Structure reusable patterns as modules:

modules/
  vpc/
    main.tf
    variables.tf
    outputs.tf
  rds/
    main.tf
    variables.tf
    outputs.tf
environments/
  prod/
    main.tf   ← calls modules
  staging/
    main.tf   ← calls same modules, different vars

Consume them cleanly:

module "vpc" {
  source  = "../../modules/vpc"
  version = "1.2.0"

  cidr_block   = var.vpc_cidr
  environment  = var.environment
  project_name = var.project_name
}

Use the Terraform Registry for battle-tested community modules before writing your own.

CloudOps AI organizes generated code into modules by default — VPC, compute, storage, and IAM are separated from day one.

Mistake #5: Skipping `terraform plan` Reviews

Time wasted per week: 1–3 hours

Treating terraform plan output as a formality — scrolling past it and hitting apply — is how production resources get accidentally destroyed. The # aws_rds_instance.main must be replaced line is easy to miss in 200 lines of diff.

The fix:

Make plan review a formal step:

Save plan output: terraform plan -out=tfplan
Review with your team in PRs (use terraform show tfplan)
Automate plan output as a PR comment in CI/CD
Set up Sentinel or OPA policies to block destructive changes without approval

Never run terraform apply without an explicit plan review, especially in production.

Mistake #6: Not Pinning Provider and Module Versions

Time wasted per week: 1–2 hours

# ❌ A breaking change will find you at the worst time
terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
  }
}

An unpinned AWS provider means that terraform init on a new machine in six months pulls a major version with breaking changes — and your pipeline breaks in ways that are hard to trace.

The fix:

# ✅ Explicit, reproducible
terraform {
  required_version = ">= 1.5.0, < 2.0.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

Pin modules to specific tags, not branches:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.1.2"   # ✅ pinned tag
}

Use dependabot or Renovate to automate version bump PRs with controlled review.

Mistake #7: Secrets and Credentials in `.tf` Files

Time wasted per week: 2–5 hours when a breach occurs

# ❌ This gets committed to Git. Every time.
resource "aws_db_instance" "main" {
  username = "admin"
  password = "MyS3cur3P@ssw0rd"
}

Hardcoded passwords, API keys, and tokens in Terraform files eventually end up in Git history — even if you catch them and delete them. Git history is forever.

Mistake #8: One Giant `main.tf` File

Time wasted per week: 1–2 hours

A 1,500-line main.tf containing your VPC, EC2 fleet, RDS cluster, IAM policies, CloudWatch alarms, and S3 buckets is not infrastructure as code. It's infrastructure as archaeology.

The fix:

Split by resource concern:

main.tf          ← provider config, backend, data sources
vpc.tf           ← VPC, subnets, route tables, NACLs
compute.tf       ← EC2, ASG, launch templates
database.tf      ← RDS, parameter groups, subnet groups
iam.tf           ← roles, policies, instance profiles
monitoring.tf    ← CloudWatch, SNS, alarms
outputs.tf       ← all outputs
variables.tf     ← all variables

Each file should be independently readable and under 200 lines where possible.

Mistake #9: No Tagging Strategy

Time wasted per week: 1–2 hours

Untagged resources are invisible resources. When your AWS bill spikes, untagged infrastructure makes cost attribution impossible. When an incident fires at 2am, untagged EC2 instances can't be traced to a team, project, or environment.

The fix:

Define a mandatory tagging baseline using default_tags at the provider level — so every resource inherits it automatically:

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Project     = var.project_name
      Environment = var.environment
      ManagedBy   = "terraform"
      Team        = var.team_name
      CostCenter  = var.cost_center
    }
  }
}

This is better than tagging each resource individually — it's enforced, consistent, and automatic.

CloudOps AI applies default_tags to every generated provider block and prompts you for tag values during setup.

Mistake #10: Ignoring `terraform fmt` and `terraform validate`

Time wasted per week: 30 min–1 hour

Inconsistent formatting creates noisy diffs, slows down code reviews, and makes it harder to spot real changes. Skipping terraform validate means syntax errors reach CI/CD instead of being caught locally in seconds.

The fix:

Make both mandatory in your workflow:

# Run before every commit
terraform fmt -recursive
terraform validate

Add to pre-commit hooks:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.83.5
    hooks:
      - id: terraform_fmt
      - id: terraform_validate
      - id: terraform_tflint

Mistake #11: No CI/CD Pipeline for Terraform

Time wasted per week: 2–3 hours

Teams that apply Terraform manually from local machines can't audit who changed what, when, and why. They also can't enforce plan reviews, policy checks, or automated testing.

The fix:

A minimal Terraform CI/CD pipeline in GitHub Actions:

name: Terraform

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
      - run: terraform fmt -check
      - run: terraform validate
      - run: terraform plan -out=tfplan
      - name: Post plan to PR
        uses: actions/github-script@v7
        # ... comment tfplan output on PR

  apply:
    needs: plan
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production   # requires manual approval
    steps:
      - run: terraform apply tfplan

Every apply to production should require a human approval gate.

Mistake #12: Not Using `terraform.tfvars` Per Environment

Time wasted per week: 1 hour

Using the same variable values across dev, staging, and production is how you accidentally deploy production-scale infrastructure to a dev sandbox — or worse, point dev workloads at production databases.

The fix:

Maintain per-environment var files:

environments/
  dev.tfvars
  staging.tfvars
  prod.tfvars

# Explicit, never ambiguous
terraform apply -var-file="environments/prod.tfvars"

Use a locals block to derive environment-specific settings from a single environment variable when values follow a pattern, reducing the number of vars you need to maintain.

Mistake #13: Neglecting Resource Lifecycle Rules

Time wasted per week: 2–3 hours

Without lifecycle rules, Terraform can destroy and recreate stateful resources — databases, Elasticsearch clusters, S3 buckets — in ways that cause downtime or data loss. Terraform doesn't know the difference between "this is just a config server" and "this is your primary database."

The fix:

Use lifecycle rules to protect critical resources:

resource "aws_db_instance" "primary" {
  # ...

  lifecycle {
    prevent_destroy       = true   # block accidental deletion
    create_before_destroy = true   # zero-downtime replacements
    ignore_changes        = [
      snapshot_identifier,         # don't track snapshot drift
    ]
  }
}

prevent_destroy = true on your RDS instance, Elasticsearch domain, and any stateful infrastructure is cheap insurance against an accidental terraform destroy.

Mistake #14: Not Running `tflint` or Security Scanning

Time wasted per week: 2–4 hours (incident cost)

Terraform validates HCL syntax but won't catch an S3 bucket with public read access, an unrestricted security group, or an unencrypted EBS volume. These don't fail terraform plan — they become security incidents.

The fix:

Add static analysis to your pipeline:

# tflint — catches resource-level misconfigurations
tflint --init && tflint

# tfsec — security scanning
brew install tfsec
tfsec .

# checkov — compliance-as-code
pip install checkov
checkov -d .

Configure rules to match your organization's security baseline. Fail the pipeline on high-severity findings.

Mistake #15: No Documentation on Modules and Outputs

Time wasted per week: 1–2 hours

Undocumented Terraform modules are black boxes. Your colleague (or future you) has to read through 300 lines of HCL to understand what a module does, what it requires, and what it produces. Multiply that by ten modules and an onboarding engineer, and you've lost days.

The fix:

Use terraform-docs to auto-generate documentation from your code:

brew install terraform-docs
terraform-docs markdown . > README.md

This generates a formatted README.md from your variables.tf and outputs.tf descriptions — which means good variable descriptions become free documentation:

variable "instance_type" {
  description = "EC2 instance type. Use t3.micro for dev, t3.large for prod."
  type        = string
  default     = "t3.micro"

  validation {
    condition     = contains(["t3.micro", "t3.large", "m5.xlarge"], var.instance_type)
    error_message = "Must be an approved instance type."
  }
}

CloudOps AI generates a README.md alongside every module — documenting inputs, outputs, dependencies, and example usage automatically.

The Compounding Cost of Getting This Wrong

Each mistake alone might cost an hour or two. Together, they compound:

Mistake Category Weekly Hours Lost Hardcoded values / no variables 2–3 hrs State management issues 1–4 hrs No modules (copy-paste sprawl) 3–4 hrs Missing CI/CD and plan reviews 2–3 hrs Security incidents from no scanning 2–4 hrs Undocumented modules 1–2 hrs Formatting and validation gaps 1 hr Missing tagging (cost attribution) 1–2 hrs Total 13–23 hrs/week

That's a part-time engineer worth of time, every single week, on avoidable friction.

How CloudOps AI Eliminates These Mistakes by Default

The reason most teams make these mistakes isn't carelessness — it's that setting up all of this correctly from scratch takes time that new projects never have. CloudOps AI changes the starting point.

When you generate or import infrastructure with CloudOps AI:

Variables are extracted automatically — no hardcoded values
A remote backend with state locking is configured out of the box
Code is organized into logical modules from day one
default_tags are applied to the provider block
Sensitive values are identified and replaced with variable references
A README.md is generated for every module
Output is clean, idiomatic HCL — ready for review, not cleanup

You still write Terraform. You just skip the part where you pay the technical debt tax for six months before getting there.

Start generating production-ready Terraform today →

Quick Reference Checklist

Before shipping any Terraform codebase, run through this list:

[ ] All environment-specific values in variables.tf
[ ] Remote backend configured with state locking
[ ] Provider and module versions pinned
[ ] Resources organized into modules by concern
[ ] Secrets sourced from Vault, SSM, or environment variables — never in .tf files
[ ] default_tags on provider block
[ ] prevent_destroy on stateful resources
[ ] terraform fmt and terraform validate in pre-commit hooks
[ ] CI/CD pipeline with plan review and manual apply gate
[ ] tflint and tfsec in the pipeline
[ ] terraform-docs generating README.md for all modules
[ ] Per-environment .tfvars files

Terraform Best Practices: 15 Mistakes Costing 20+ Hours/Week

Your Terraform Is Technically Working. That's the Problem.

Mistake #1: Hardcoding Values Instead of Using Variables

Mistake #2: No Remote State Backend

Mistake #3: Missing State Locking

Mistake #4: Not Using Modules

Mistake #5: Skipping `terraform plan` Reviews

Mistake #6: Not Pinning Provider and Module Versions

Mistake #7: Secrets and Credentials in `.tf` Files

Mistake #8: One Giant `main.tf` File

Mistake #9: No Tagging Strategy

Mistake #10: Ignoring `terraform fmt` and `terraform validate`

Mistake #11: No CI/CD Pipeline for Terraform

Mistake #12: Not Using `terraform.tfvars` Per Environment

Mistake #13: Neglecting Resource Lifecycle Rules

Mistake #14: Not Running `tflint` or Security Scanning

Mistake #15: No Documentation on Modules and Outputs

The Compounding Cost of Getting This Wrong

How CloudOps AI Eliminates These Mistakes by Default

Quick Reference Checklist

Ready to optimise your cloud operations?

Frequently Asked Questions

Terraform Best Practices: 15 Mistakes Costing 20+ Hours/Week

Your Terraform Is Technically Working. That's the Problem.

Mistake #1: Hardcoding Values Instead of Using Variables

Mistake #2: No Remote State Backend

Mistake #3: Missing State Locking

Mistake #4: Not Using Modules

Mistake #5: Skipping terraform plan Reviews

Mistake #6: Not Pinning Provider and Module Versions

Mistake #7: Secrets and Credentials in .tf Files

Mistake #8: One Giant main.tf File

Mistake #9: No Tagging Strategy

Mistake #10: Ignoring terraform fmt and terraform validate

Mistake #11: No CI/CD Pipeline for Terraform

Mistake #12: Not Using terraform.tfvars Per Environment

Mistake #13: Neglecting Resource Lifecycle Rules

Mistake #14: Not Running tflint or Security Scanning

Mistake #15: No Documentation on Modules and Outputs

The Compounding Cost of Getting This Wrong

How CloudOps AI Eliminates These Mistakes by Default

Quick Reference Checklist

Ready to optimise your cloud operations?

Frequently Asked Questions

Mistake #5: Skipping `terraform plan` Reviews

Mistake #7: Secrets and Credentials in `.tf` Files

Mistake #8: One Giant `main.tf` File

Mistake #10: Ignoring `terraform fmt` and `terraform validate`

Mistake #12: Not Using `terraform.tfvars` Per Environment

Mistake #14: Not Running `tflint` or Security Scanning