使用remote-exec Provisioner时,instance_count大于2时Terraform卡住 [英] Terraform stucks when instance_count is more than 2 while using remote-exec provisioner
问题描述
- 我正在尝试使用null_resource通过Terraform的remote-exec供应器供应多个Windows EC2实例.
$ terraform -v
Terraform v0.12.6
provider.aws v2.23.0
provider.null v2.1.2
$ terraform -v
Terraform v0.12.6
provider.aws v2.23.0
provider.null v2.1.2
- 最初,我与三个远程执行预配置程序一起工作(其中两个涉及重新启动实例),而没有null_resource和单个实例,一切工作都很好.
- 然后,我需要增加计数,并基于几个链接,最终使用null_resource. 因此,我已将问题减少到无法使用null_resource为两个以上的Windows EC2实例运行一个远程执行预配器的程度.
- Originally, I was working with three remote-exec provisioners (Two of them involved rebooting the instance) without null_resource and for a single instance, everything worked absolutely fine.
- I then needed to increase the count and based on several links, ended up using null_resource. So, I have reduced the issue to the point where I am not even able to run one remote-exec provisioner for more than 2 Windows EC2 instances using null_resource.
Terraform模板以重现错误消息:
//VARIABLES
variable "aws_access_key" {
default = "AK"
}
variable "aws_secret_key" {
default = "SAK"
}
variable "instance_count" {
default = "3"
}
variable "username" {
default = "Administrator"
}
variable "admin_password" {
default = "Password"
}
variable "instance_name" {
default = "Testing"
}
variable "vpc_id" {
default = "vpc-id"
}
//PROVIDERS
provider "aws" {
access_key = "${var.aws_access_key}"
secret_key = "${var.aws_secret_key}"
region = "ap-southeast-2"
}
//RESOURCES
resource "aws_instance" "ec2instance" {
count = "${var.instance_count}"
ami = "Windows AMI"
instance_type = "t2.xlarge"
key_name = "ec2_key"
subnet_id = "subnet-id"
vpc_security_group_ids = ["${aws_security_group.ec2instance-sg.id}"]
tags = {
Name = "${var.instance_name}-${count.index}"
}
}
resource "null_resource" "nullresource" {
count = "${var.instance_count}"
connection {
type = "winrm"
host = "${element(aws_instance.ec2instance.*.private_ip, count.index)}"
user = "${var.username}"
password = "${var.admin_password}"
timeout = "10m"
}
provisioner "remote-exec" {
inline = [
"powershell.exe Write-Host Instance_No=${count.index}"
]
}
// provisioner "local-exec" {
// command = "powershell.exe Write-Host Instance_No=${count.index}"
// }
// provisioner "file" {
// source = "testscript"
// destination = "D:/testscript"
// }
}
resource "aws_security_group" "ec2instance-sg" {
name = "${var.instance_name}-sg"
vpc_id = "${var.vpc_id}"
// RDP
ingress {
from_port = 3389
to_port = 3389
protocol = "tcp"
cidr_blocks = ["CIDR"]
}
// WinRM access from the machine running TF to the instance
ingress {
from_port = 5985
to_port = 5985
protocol = "tcp"
cidr_blocks = ["CIDR"]
}
tags = {
Name = "${var.instance_name}-sg"
}
}
//OUTPUTS
output "private_ip" {
value = "${aws_instance.ec2instance.*.private_ip}"
}
观察:
- 对于一个远程执行配置器,如果将count设置为1或2,则可以正常工作.对于count 3,无法预料所有配置器将在所有实例上每次运行.但是,有一点可以确定,Terraform永远不会完成并且不会显示输出变量.它一直显示"null_resource.nullresource [count.index]:仍在创建..."
- 对于本地执行人员配置者-一切正常.用count的值分别为1、2和7进行测试.
- 对于文件供应商,它在1、2和3上的运行情况还不错,但在7上还没有完成,但是文件已在所有7个实例上复制.它一直显示"null_resource.nullresource [count.index]:仍在创建..."
- 此外,在每次尝试中,remote-exec提供者都可以连接到实例,而与count的值无关,仅此而已,它不会触发内联命令,而是随机选择跳过该命令并开始显示仍在创建... 消息.
- 我已经在这个问题上停留了很长时间了.在调试日志中也找不到重要的东西.我知道不建议将Terraform用作配置mgmt工具,但是,即使实例计数仅为1(即使没有null_resource),即使使用复杂的配置脚本,一切都可以正常工作,这表明Terraform应该很容易处理这样的问题.基本配置要求.
- TF_DEBUG日志:
- count = 2,TF成功完成并显示Apply complete!.
- count = 3,TF在所有三个实例上运行remote-exec,但是不完整,不显示输出变量.停留在仍在创建..."上
- count = 3,TF仅在两个实例上运行remote-exec,然后跳过nullresource [1],不完整,不显示输出变量.停留在仍在创建..."上
- 任何指针将不胜感激!
- With one remote-exec provisioner, it works fine if count is set to 1 or 2. With count 3, it's unpredictable that all the provisioners will run everytime on all the instances. However one thing is for sure that Terraform never completes and does not show the output variables. It keeps showing "null_resource.nullresource[count.index]: Still creating..."
- For the local-exec provisioner - Everything works fine. Tested with count's value as 1, 2 and 7.
- For file provisioner its working fine for 1, 2 and 3 however does not finish for 7 but the file was copied on all the 7 instances. It keeps showing "null_resource.nullresource[count.index]: Still creating..."
- Also, in every attempt, remote-exec provisioner is able to connect to the instances irrespective of count's value and it's just that, it's doesnt trigger the inline command and randomly chooses to skip that and starts showing "Still creating..." message.
- I have been stuck with this issue for quite some time now. Couldnt find anything significant in debug logs as well. I know Terraform is not recommended to be used as a config mgmt tool however, everything's working fine even with complex provisioning scripts if the instance count is just 1 (Even without null_resource) which indicates that it should be easily possible for Terraform to handle such a basic provisioning requirement.
- TF_DEBUG logs:
- count=2, TF completes successfully and shows Apply complete!.
- count=3, TF runs the remote-exec on all the three instances however does not complete and doesn't not show the outputs variables. Stuck at "Still creating..."
- count=3, TF runs the remote-exec only on two instances and skips on nullresource[1] , does not complete and doesn't not show the outputs variables. Stuck at "Still creating..."
- Any pointers will be greatly appreciated!
推荐答案
更新:最终的诀窍是按照
Update: what eventually did the trick was downgrading Terraform to v11.14
as per this issue comment.
您可以尝试的一些操作:
A few things you can try:
- 内联
remote-exec
:
resource "aws_instance" "ec2instance" {
count = "${var.instance_count}"
# ...
provisioner "remote-exec" {
connection {
# ...
}
inline = [
# ...
]
}
}
Now you can refer to self
inside the connection
block to get the instance's private IP.
- 将
triggers
添加到null_resource
:
- Add
triggers
tonull_resource
:
resource "null_resource" "nullresource" {
triggers {
host = "${element(aws_instance.ec2instance.*.private_ip, count.index)}" # Rerun when IP changes
version = "${timestamp()}" # ...or rerun every time
}
# ...
}
您可以使用 triggers
属性重新创建null_resource
,从而重新执行remote-exec
.
You can use the triggers
attribute to recreate null_resource
and thus re-execute remote-exec
.
这篇关于使用remote-exec Provisioner时,instance_count大于2时Terraform卡住的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!