使用Terraform的最佳做法 [英] Best practices when using Terraform

查看:106
本文介绍了使用Terraform的最佳做法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将基础架构转换为terraform。
实际管理terraform文件和状态的最佳实践是什么?
我意识到它是代码的基础架构,我将.tf文件提交到git中,但是我也提交tfstate吗?那应该在S3之类的地方吗?我希望最终由CI来管理所有这一切,但是这牵扯很大,需要我弄清楚文件的移动部分。

I'm in the process of swapping over our infrastructure into terraform. What's the best practice for actually managing the terraform files and state? I realize it's infrastructure as code, and i'll commit my .tf files into git, but do I commit tfstate as well? Should that reside somewhere like S3 ? I would like eventually for CI to manage all of this, but that's far stretched and requires me to figure out the moving pieces for the files.

我真的只是在寻找看看外面的人如何在生产中实际利用这种东西

I'm really just looking to see how people out there actually utilize this type of stuff in production

推荐答案

我也处于迁移现有AWS基础设施的状态对Terraform的了解,因此应着眼于随着我的发展而更新答案。

I am also in a state of migrating existing AWS infrastructure to Terraform so shall aim to update the answer as I develop.

我一直严重依赖官方Terraform 示例和多次尝试和错误来充实我不确定的区域。

I have been relying heavily on the official Terraform examples and multiple trial and error to flesh out areas that I have been uncertain in.

.tfstate 文件

.tfstate files

Terraform配置可用于在不同基础结构上置备许多盒子,每个盒子的状态可能不同。由于也可以由多个人运行,因此该状态应位于集中位置(如S3),但不是 git。

Terraform config can be used to provision many boxes on different infrastructure, each of which could have a different state. As it can also be run by multiple people this state should be in a centralised location (like S3) but not git.

可以查看Terraform .gitignore

This can be confirmed looking at the Terraform .gitignore.

开发者控制

我们的目标是为开发人员提供对基础架构的更多控制,同时保持完整的审核(git日志)和健全检查更改(拉动请求)的能力。考虑到这一点,我要针对的新基础架构工作流程是:

Our aim is to provide more control of the infrastructure to developers whilst maintaining a full audit (git log) and the ability to sanity check changes (pull requests). With that in mind the new infrastructure workflow I am aiming towards is:


  1. 常见AMI的基础,其中包括可重复使用的模块,例如

  2. DevOps使用Terraform提供的核心基础结构。

  3. 开发人员根据需要在Git中更改Terraform配置(实例数;新的VPC;添加了区域/可用区域等。)

  4. 推送了Git配置,并提交了一个提取请求,以供DevOps小组成员进行健全性检查。

  5. 如果已批准,将webhook调用到CI以进行构建和部署(不确定如何在此时划分多个环境)

  1. Base foundation of common AMI's that include reusable modules e.g. puppet.
  2. Core infrastructure provisioned by DevOps using Terraform.
  3. Developers change Terraform configuration in Git as needed (number of instances; new VPC; addition of region/availability zone etc).
  4. Git configuration pushed and a pull request submitted to be sanity checked by a member of DevOps squad.
  5. If approved, calls webhook to CI to build and deploy (unsure how to partition multiple environments at this time)

编辑1-更新在当前状态下

自从开始回答以来,我已经写了很多TF代码,对我们的工作状态感到更加自在。我们在使用过程中遇到了许多错误和限制,但是我接受这是使用快速变化的新软件的特征。

Since starting this answer I have written a lot of TF code and feel more comfortable in our state of affairs. We have hit bugs and restrictions along the way but I accept this is a characteristic of using new, rapidly changing software.

布局

我们有一个复杂的AWS基础架构,其中包含多个VPC,每个VPC都具有多个子网。轻松管理此问题的关键是定义一个灵活的分类法,其中涵盖了区域,环境,服务和所有者,我们可以使用它们来组织基础结构代码(terraform和puppet)。

We have a complicated AWS infrastructure with multiple VPC's each with multiple subnets. Key to easily managing this was to define a flexible taxonomy that encompasses region, environment, service and owner which we can use to organise our infrastructure code (both terraform and puppet).

模块

下一步是创建一个git存储库来存储我们的terraform模块。这些模块的顶级目录结构如下:

Next step was to create a single git repository to store our terraform modules. Our top level dir structure for the modules looks like this:

tree -L 1 .

结果:

├── README.md
├── aws-asg
├── aws-ec2
├── aws-elb
├── aws-rds
├── aws-sg
├── aws-vpc
└── templates

每个设置一些合理的默认值,但将它们公开为可以被我们的胶水覆盖的变量。

Each one sets some sane defaults but exposes them as variables that can be overwritten by our "glue".

胶水

我们还有第二个存储库,其中包含我们的 glue ,该存储库使用了上述模块。其布局与我们的分类法文件一致:

We have a second repository with our glue that makes use of the modules mentioned above. It is laid out in line with our taxonomy document:

.
├── README.md
├── clientA
│   ├── eu-west-1
│   │   └── dev
│   └── us-east-1
│       └── dev
├── clientB
│   ├── eu-west-1
│   │   ├── dev
│   │   ├── ec2-keys.tf
│   │   ├── prod
│   │   └── terraform.tfstate
│   ├── iam.tf
│   ├── terraform.tfstate
│   └── terraform.tfstate.backup
└── clientC
    ├── eu-west-1
    │   ├── aws.tf
    │   ├── dev
    │   ├── iam-roles.tf
    │   ├── ec2-keys.tf
    │   ├── prod
    │   ├── stg
    │   └── terraform.tfstate
    └── iam.tf

内部在客户端级别,我们具有特定于AWS帐户的 .tf 文件,这些文件用于配置全局资源(例如IAM角色);接下来是具有EC2 SSH公钥的区域级别;最后在我们的环境中( dev stg prod 等)是我们的VPC设置,实例创建和对等连接等存储的位置。

Inside the client level we have AWS account specific .tf files that provision global resources (like IAM roles); next is region level with EC2 SSH public keys; Finally in our environment (dev, stg, prod etc) are our VPC setups, instance creation and peering connections etc. are stored.

侧面说明:如您所见,在保留 terraform.tfstate之上,我违背了我自己的建议在git中。这是临时措施,直到我移至S3为止,但适合我,因为我是目前唯一的开发人员。

Side Note: As you can see I'm going against my own advice above keeping terraform.tfstate in git. This is a temporary measure until I move to S3 but suits me as I'm currently the only developer.

后续步骤

这仍然是手动过程,在Jenkins中还没有,但是我们正在移植相当庞大,复杂的基础结构,到目前为止还算不错。就像我说的那样,很少有错误,但是进展顺利!

This is still a manual process and not in Jenkins yet but we're porting a rather large, complicated infrastructure and so far so good. Like I said, few bugs but going well!

编辑2-更改

距离我写这篇文章已经快一年了最初的答案以及Terraform和我本人的状态都发生了很大变化。我现在在使用Terraform来管理Azure群集的新职位,并且Terraform现在是 v0.10.7

It's been almost a year since I wrote this initial answer and the state of both Terraform and myself have changed significantly. I am now at a new position using Terraform to manage an Azure cluster and Terraform is now v0.10.7.

人们反复告诉我,州应进入Git-这是正确的。我们将此作为与两个人的团队的临时措施,该团队依靠开发人员的沟通和纪律。通过更大的分布式团队,我们现在可以通过锁定来充分利用S3中的远程状态。由DynamoDB提供。理想情况下,现在它是削减跨云提供商的v1.0版本,将被迁移到领事。

People have repeatedly told me state should not go in Git - and they are correct. We used this as an interim measure with a two person team that relied on developer communication and discipline. With a larger, distributed team we are now fully leveraging remote state in S3 with locking provided by DynamoDB. Ideally this will be migrated to consul now it is v1.0 to cut cross cloud providers.

模块

以前,我们创建并使用了内部模块。情况仍然如此,但是随着 Terraform注册表的出现和增长,我们尝试至少将它们用作基础。

Previously we created and used internal modules. This is still the case but with the advent and growth of the Terraform registry we try to use these as at least a base.

文件结构

新职位的分类要简单得多,只有两个infx环境- dev prod 。每个模块都有自己的变量和输出,可重复使用上面创建的模块。 remote_state

The new position has a much simpler taxonomy with only two infx environments - dev and prod. Each has their own variables and outputs, reusing our modules created above. The remote_state provider also helps in sharing outputs of created resources between environments. Our scenario is subdomains in different Azure resource groups to a globally managed TLD.

├── main.tf
├── dev
│   ├── main.tf
│   ├── output.tf
│   └── variables.tf
└── prod
    ├── main.tf
    ├── output.tf
    └── variables.tf

计划

再次面对分布式团队的额外挑战,我们现在始终保存 terraform的输出plan 命令。我们可以检查并知道将运行什么,而不会在计划 apply 阶段之间进行任何更改(尽管锁定)对此有帮助)。请记住删除此计划文件,因为它可能包含纯文本秘密变量。

Again with extra challenges of a distributed team, we now always save our output of the terraform plan command. We can inspect and know what will be run without the risk of some changes between the plan and apply stage (although locking helps with this). Remember to delete this plan file as it could potentially contain plain text "secret" variables.

总体而言,我们对Terraform感到非常满意,并通过添加的新功能继续学习和改进。

Overall we are very happy with Terraform and continue to learn and improve with the new features added.

这篇关于使用Terraform的最佳做法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆