设置多个资源时,terraform中的Chef设置程序挂起 [英] Chef provisioner in terraform hangs when provisioning more than one resource
问题描述
在使用Terraform设置多台机器并使用Terraform Chef设置器来配置机器时,只有在Terraform运行中仅对一个资源进行烹饪时,我才能使它工作。当仅针对一个VM时,一切都将完美运行。
如果配置了多个资源,则厨师运行将挂在正在创建配置文件...
步骤。
When using Terraform to provision multiple machines, and the Terraform Chef provisioner to configure the machine, I am only able to get it work if only one "resource" is being cheffed in the Terraform run. Everything works perfectly when only one VM is targeted.
When more than one resource is provisioned, the chef run will hang at the Creating configuration files...
step.
我尝试使用模块,在每个资源内进行配置,最近一次使用 null_resource
s在创建虚拟机资源后进行配置。
( null_resource
已被证明非常有用,因为它允许我快速迭代厨师运行,而不必每次都重新旋转VM资源,因为我
I have tried using modules, provisioning inside each resource, and most recently using null_resource
s to provision the vm resources after they've been created.
(The null_resource
has proven very useful, as it allows me to iterate on the chef run quickly without having to re-spin the VM resource every time, as I did when the provisioner was inside the resource block.)
这是在TF 0.11上发生的,并在v0.12中继续:
This happened on TF 0.11, and continues in v0.12:
Terraform v0.12.8
+ provider.null v2.1.2
+ provider.vra7 v0.4.1
资源内的资源调配人:
Provisioner inside the resource:
resource "vra7_deployment" "vra-vm" {
...
resource_configuration = {
"vSphere_Machine_1.name" = ""
"vSphere_Machine_1.ip_address" = ""
"vSphere_Machine_1.description" = "Terraform ICE SQL"
}
...
provisioner "chef" {
# This is for TF to talk to the new node
connection {
host = self.resource_configuration["vSphere_Machine_1.ip_address"]
type = "winrm"
user = var.KT_USER
password = var.KT_PASS
insecure = true
}
# This is for TF to talk to the chef_server
# Note! the version constraint doesn't work
server_url = var.chef_server_url
node_name = "ICE-SQL-${self.resource_configuration["vSphere_Machine_1.name"]}"
run_list = var.sql_run_list
recreate_client = true
environment = "_default"
ssl_verify_mode = ":verify_none"
version = "~> 12"
user_name = local.username
user_key = file("${local.user_key_path}")
}
Provisioner使用 null_resource
块:
Provisioner using null_resource
block:
resource "vra7_deployment" "ICE-SQL" {
count = var.sql_count # will be 1/on or 0/off
...
resource_configuration = {
"vSphere_Machine_1.name" = ""
"vSphere_Machine_1.ip_address" = ""
"vSphere_Machine_1.description" = "Terraform ICE SQL"
}
}
locals {
sql_ip = vra7_deployment.ICE-SQL[0].resource_configuration["vSphere_Machine_1.ip_address"]
sql_name = vra7_deployment.ICE-SQL[0].resource_configuration["vSphere_Machine_1.name"]
}
resource "null_resource" "sql-chef" {
# we can use count to switch creating this on or off for testing
count = 0
provisioner "chef" {
# This is for TF to talk to the new node
connection {
host = local.sql_ip
type = "winrm"
user = var.KT_USER
password = var.KT_PASS
insecure = true
}
# This is for TF to talk to the chef_server
# Don't use the local var here, so TF knows to create the dependency
server_url = var.chef_server_url
node_name = "ICE-SQL-${vra7_deployment.ICE-SQL[0].resource_configuration["vSphere_Machine_1.name"]}"
run_list = var.sql_run_list
recreate_client = true
environment = "_default"
ssl_verify_mode = ":verify_none"
version = "12"
user_name = local.username
user_key = file("${local.user_key_path}")
client_options = var.chef_client_options
}
}
模块
modules
### main.tf
module "SQL" {
source = "./modules/vra-chef"
VRA_USER = var.VRA_USER
VRA_PASS = var.VRA_PASS
KT_USER = var.KT_USER
KT_PASS = var.KT_PASS
description = "ICE SQL"
run_list = var.sql_run_list
}
### modules/vra-chef/main.tf
resource "vra7_deployment" "vra-chef" {
count = var.server_count
...
resource_configuration = {
"vSphere_Machine_1.name" = var.resource_name
"vSphere_Machine_1.ip_address" = var.resource_ip
"vSphere_Machine_1.description" = "${var.description}-${count.index}"
}
provisioner "chef" {
# This is for TF to talk to the new node
connection {
host = self.resource_configuration["vSphere_Machine_1.ip_address"]
type = "winrm"
user = var.KT_USER
password = var.KT_PASS
insecure = true
}
# This is for TF to talk to the chef_server
server_url = var.chef_server_url
node_name = self.resource_configuration["vSphere_Machine_1.name"]
run_list = var.run_list
recreate_client = true
environment = "_default"
ssl_verify_mode = ":verify_none"
version = "~> 12"
user_name = local.username
user_key = file(local.user_key_path)
client_options = [ "chef_license 'accept'" ]
# pass custom attributes to the new node
attributes_json = var.input_json
}
}
预期结果:
厨师配置它所应用的所有资源。
Expected Results:
Chef configures all resources that it is applied to.
Terraform Chef预配器将连接到它所应用的所有资源,并在客户端上安装Chef。当它进入创建配置文件时。 。
步骤,它将停止发送更多更新,并且Terraform运行将每10秒更新一次状态,仍在为每个资源创建...
。
The Terraform Chef provisioner will connect to all resources that it is applied to, and install chef on the clients. When it gets to the creating configuration files...
step, it stops sending any more updates, and the Terraform run will keep updating the status every 10s, still creating...
for each resource.
vra7_deployment.ICE-REMOTE[0]: Still creating... [9m30s elapsed]
vra7_deployment.ICE-SQL[0]: Still creating... [9m30s elapsed]
vra7_deployment.ICE-MASTER[0]: Still creating... [9m30s elapsed]
vra7_deployment.ICE-MASTER[0]: Creation complete after 9m39s [id=feecf983-48d5-425e-b713-65a1a05fa3ba]
vra7_deployment.ICE-REMOTE[0]: Still creating... [9m40s elapsed]
vra7_deployment.ICE-SQL[0]: Still creating... [9m40s elapsed]
...
vra7_deployment.ICE-SQL[0]: Still creating... [12m10s elapsed]
vra7_deployment.ICE-REMOTE[0]: Still creating... [12m10s elapsed]
vra7_deployment.ICE-REMOTE[0]: Creation complete after 12m11s [id=df64f5ab-af12-4493-8e7d-d7debd93780d]
vra7_deployment.ICE-SQL[0]: Still creating... [12m20s elapsed]
...
vra7_deployment.ICE-SQL[0]: Still creating... [13m10s elapsed]
vra7_deployment.ICE-SQL[0]: Creation complete after 13m11s [id=08ec31f4-124d-470e-b2ba-1833a6f22792]
null_resource.sql-chef[0]: Creating...
null_resource.master-chef[0]: Creating...
null_resource.remote-chef[0]: Creating...
null_resource.sql-chef[0]: Provisioning with 'chef'...
null_resource.master-chef[0]: Provisioning with 'chef'...
null_resource.remote-chef[0]: Provisioning with 'chef'...
null_resource.master-chef[0] (chef): Connecting to remote host via WinRM...
null_resource.master-chef[0] (chef): Host: 10.12.235.61
null_resource.master-chef[0] (chef): Port: 5985
null_resource.master-chef[0] (chef): User: engineering
null_resource.master-chef[0] (chef): Password: true
null_resource.master-chef[0] (chef): HTTPS: false
null_resource.master-chef[0] (chef): Insecure: true
null_resource.master-chef[0] (chef): NTLM: false
null_resource.master-chef[0] (chef): CACert: false
null_resource.sql-chef[0] (chef): Connecting to remote host via WinRM...
null_resource.sql-chef[0] (chef): Host: 10.12.235.50
null_resource.sql-chef[0] (chef): Port: 5985
null_resource.sql-chef[0] (chef): User: engineering
null_resource.sql-chef[0] (chef): Password: true
null_resource.sql-chef[0] (chef): HTTPS: false
null_resource.sql-chef[0] (chef): Insecure: true
null_resource.sql-chef[0] (chef): NTLM: false
null_resource.sql-chef[0] (chef): CACert: false
null_resource.remote-chef[0] (chef): Connecting to remote host via WinRM...
null_resource.remote-chef[0] (chef): Host: 10.12.233.51
null_resource.remote-chef[0] (chef): Port: 5985
null_resource.remote-chef[0] (chef): User: engineering
null_resource.remote-chef[0] (chef): Password: true
null_resource.remote-chef[0] (chef): HTTPS: false
null_resource.remote-chef[0] (chef): Insecure: true
null_resource.remote-chef[0] (chef): NTLM: false
null_resource.remote-chef[0] (chef): CACert: false
null_resource.sql-chef[0] (chef): Connected!
null_resource.remote-chef[0] (chef): Connected!
null_resource.master-chef[0] (chef): Connected!
null_resource.remote-chef[0] (chef): Downloading Chef Client...
null_resource.sql-chef[0] (chef): Downloading Chef Client...
null_resource.remote-chef[0] (chef): Installing Chef Client...
null_resource.sql-chef[0] (chef): Installing Chef Client...
null_resource.remote-chef[0]: Still creating... [10s elapsed]
null_resource.master-chef[0]: Still creating... [10s elapsed]
null_resource.sql-chef[0]: Still creating... [10s elapsed]
null_resource.sql-chef[0] (chef): Creating configuration files...
null_resource.remote-chef[0] (chef): Creating configuration files...
null_resource.master-chef[0] (chef): Downloading Chef Client...
null_resource.master-chef[0] (chef): Installing Chef Client...
null_resource.master-chef[0] (chef): Creating configuration files...
null_resource.remote-chef[0]: Still creating... [20s elapsed]
null_resource.master-chef[0]: Still creating... [20s elapsed]
null_resource.sql-chef[0]: Still creating... [20s elapsed]
null_resource.remote-chef[0]: Still creating... [30s elapsed]
null_resource.sql-chef[0]: Still creating... [30s elapsed]
null_resource.master-chef[0]: Still creating... [30s elapsed]
null_resource.remote-chef[0]: Still creating... [40s elapsed]
null_resource.sql-chef[0]: Still creating... [40s elapsed]
null_resource.master-chef[0]: Still creating... [40s elapsed]
null_resource.remote-chef[0]: Still creating... [50s elapsed]
null_resource.sql-chef[0]: Still creating... [50s elapsed]
null_resource.master-chef[0]: Still creating... [50s elapsed]
null_resource.remote-chef[0]: Still creating... [1m0s elapsed]
null_resource.sql-chef[0]: Still creating... [1m0s elapsed]
null_resource.master-chef[0]: Still creating... [1m0s elapsed]
...loops waiting forever...
其他上下文:
我在Terraform的github 上对此进行了记录,但没有任何回应。我从那里的评论:
Other context:
I've logged this at Terraform's github, with no response. My comments from there:
我发现,似乎不喜欢一次厨师配置多台机器。到目前为止,我发现以下情况:每4台计算机中的1台将完美配置,而其他计算机则在全部打印正在创建配置文件...
状态后挂起。保持第一个处于活动状态,在下一次运行时,其他三个将再次在同一位置挂起。最后,我调整了代码,仅重新配置了其中一台机器,它运行良好。 要清楚:与先前运行时相同的代码,在单独运行时将完美执行。我认为这是调试此代码的关键线索。
What i've found is that it seems to not like chef-provisioning more than one machine at a time. So far I've found cases where 1 out of 4 machines will provision perfectly, and the others just hang after they all print the creating configuration files...
status. Leaving the first one active, on the next run, the other three will all hang again at the same place. Finally, i tweaked the code to only re-provision one of the machines, and it worked perfectly. To be clear: the same exact code that hangs on a prior run, will execute perfectly when run by itself. I think that's a critical clue to debugging this.
要重申:当卡住时,厨师调配始终挂在创建配置文件...
步骤。如果超出该限制,它将始终有效。
To reiterate: When it gets stuck, the chef provisioning always hangs at the creating configuration files...
step. If it gets past that, it always works.
以下是使用null_provisioner在两个资源上运行的厨师的要点,两个资源均挂起: https://gist.github.com/mcascone/0b71948f50d52648389e661d00c8e31c
Here is a gist of a chef run using null_provisioner on two resources, both of which hang: https://gist.github.com/mcascone/0b71948f50d52648389e661d00c8e31c
这是成功的一资源运行之一: https://gist.github.com / mcascone / 858855b5bd9d5d1cf655d5e10df67801
And this is one of a successful, 1-resource run: https://gist.github.com/mcascone/858855b5bd9d5d1cf655d5e10df67801
我一直认为这是一个问题,因为同一预配置程序在同一main.tf文件中被多次调用。我会在一次申请运行中致电给厨师供应商3次以上。是供应者的多个实例相互冲突,还是实际上不支持同一供应者的多个运行,而它们都在同一个实例中实例化并相互破坏?
I keep thinking this is an issue with the same provisioner being called multiple times in the same main.tf file. I'm calling the chef provisioner 3+ times in one apply run. Could it be that the multiple instances of the provisioner are colliding with each other, or there isn't actually support for multiple runs of the same provisioner, and they're all getting instantiated in the same instance and corrupting each other?
推荐答案
看起来,至少到目前为止,我们必须降级到v0.11才能运行多个预配置。请查看此线程:使用远程执行配置程序时,instance_count大于2时,Terraform卡住了
It looks like, for now at least, we have to downgrade to v0.11 to get multiple provision runs to work. Please see this thread: Terraform stucks when instance_count is more than 2 while using remote-exec provisioner
这篇关于设置多个资源时,terraform中的Chef设置程序挂起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!