如何修复“发生未知错误"同时创建具有私有IP的多个Google Cloud SQL实例时? [英] How to fix "An Unknown Error Occurred" when creating multiple Google Cloud SQL instances with private IP simultaneously?

查看:98
本文介绍了如何修复“发生未知错误"同时创建具有私有IP的多个Google Cloud SQL实例时?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的云后端设置包含5个用于Postgres实例的Cloud SQL.我们使用Terraform管理基础架构.我们正在使用公共IP和 Cloud SQL容器从GKE连接它们.

Our cloud backend setup contains 5 Cloud SQL for Postgres instances. We manage our infrastructure using Terraform. We are using connecting them from GKE using a public IP and the Cloud SQL container.

为了简化我们的设置,我们希望通过转移到私有IP来摆脱代理容器.我尝试遵循地形指南 .虽然创建一个实例很好,但是尝试同时创建5个实例会以4个失败实例和一个成功实例结束:

In order to simplify our setup we wish to get rid of the proxy containers by moving to a private IP. I tried following the Terraform guide. While a creating a single instance works fine, trying to create 5 instances simultaneously ends in 4 failed ones and one successful:

出现在失败实例上的Google Clod控制台中的错误是发生未知错误":

The error which appears in the Google Clod Console on the failed instances is "An Unknown Error occurred":

以下是复制它的代码.注意count = 5行:

Following is the code which reproduces it. Pay attention to the count = 5 line:

resource "google_compute_network" "private_network" {
  provider = "google-beta"

  name = "private-network"
}

resource "google_compute_global_address" "private_ip_address" {
  provider = "google-beta"

  name = "private-ip-address"
  purpose = "VPC_PEERING"
  address_type = "INTERNAL"
  prefix_length = 16
  network = "${google_compute_network.private_network.self_link}"
}

resource "google_service_networking_connection" "private_vpc_connection" {
  provider = "google-beta"

  network = "${google_compute_network.private_network.self_link}"
  service = "servicenetworking.googleapis.com"
  reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}

resource "google_sql_database_instance" "instance" {
  provider = "google-beta"
  count = 5

  name = "private-instance-${count.index}"
  database_version = "POSTGRES_9_6"

  depends_on = [
    "google_service_networking_connection.private_vpc_connection"
  ]

  settings {
    tier = "db-custom-1-3840"
    availability_type = "REGIONAL"
    ip_configuration {
      ipv4_enabled = "false"
      private_network = "${google_compute_network.private_network.self_link}"
    }
  }
}

provider "google-beta" {
  version = "~> 2.5"
  credentials = "credentials.json"
  project = "PROJECT_ID"
  region = "us-central1"
  zone = "us-central1-a"
}

我尝试了几种选择:

  • 在创建google_service_networking_connection之后等待一分钟,然后同时创建所有实例,但是我遇到了相同的错误.
  • 为每个实例创建一个地址范围和一个google_service_networking_connection,但是我收到一个错误,指出不能同时创建google_service_networking_connection.
  • 为每个实例创建一个地址范围,并创建一个链接到所有实例的单个google_service_networking_connection,但是我遇到了相同的错误.
  • Waiting a minute after creating the google_service_networking_connection and then creating all the instances simultaneously, but I got the same error.
  • Creating an address range and a google_service_networking_connection per instance, but I got an error that google_service_networking_connection cannot be created simultaneously.
  • Creating an address range per instance and a single google_service_networking_connection which links to all of them, but I got the same error.

推荐答案

找到了一个难看但可行的解决方案. 是GCP中的错误,尽管无法完成实例创建,但它不会阻止同时创建实例.没有关于它的文档,也没有有意义的错误消息.它也出现在 Terraform Google提供商问题跟踪器中.

Found an ugly yet working solution. There is a bug in GCP which does not prevent simultaneous creation of instances although it cannot be completed. There is neither documentation about it nor a meaningful error message. It appears in the Terraform Google provider issue tracker as well.

一种选择是在实例之间添加依赖关系.这使他们的创建成功完成.但是,每个实例需要几分钟的时间来创建.这累积了很多时间.如果我们在实例创建之间添加了60秒的人为延迟,那么我们将设法避免失败.注意:

One alternative is adding a dependence between the instances. This allows their creation to complete successfully. However, each instance takes several minutes to create. This accumulates to many spent minutes. If we add an artificial delay of 60 seconds between instance creation, we manage to avoid the failures. Notes:

  • 所需的延迟秒数取决于实例层.例如,对于db-f1-micro,30秒就足够了.它们不足以容纳db-custom-1-3840.
  • 我不确定db-custom-1-3840所需的确切秒数是多少. 30秒还不够,60秒就够了.
  • The needed amount of seconds to delay depends on the instance tier. For example, for db-f1-micro, 30 seconds were enough. They were not enough for db-custom-1-3840.
  • I am not sure what is the exact number of needed seconds for db-custom-1-3840. 30 seconds were not enough, 60 were.

以下是解决此问题的代码示例.它仅显示2个实例,因为由于depends_on的限制,我无法使用计数功能,并且显示5个实例的完整代码非常长.对于5个实例,其工作原理相同:

Following is a the code sample to resolve the issue. It shows 2 instances only since due to depends_on limitations I could not use the count feature and showing the full code for 5 instances would be very long. It works the same for 5 instances:

resource "google_compute_network" "private_network" {
  provider = "google-beta"

  name = "private-network"
}

resource "google_compute_global_address" "private_ip_address" {
  provider = "google-beta"

  name = "private-ip-address"
  purpose = "VPC_PEERING"
  address_type = "INTERNAL"
  prefix_length = 16
  network = "${google_compute_network.private_network.self_link}"
}

resource "google_service_networking_connection" "private_vpc_connection" {
  provider = "google-beta"

  network = "${google_compute_network.private_network.self_link}"
  service = "servicenetworking.googleapis.com"
  reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}

locals {
  db_instance_creation_delay_factor_seconds = 60
}

resource "null_resource" "delayer_1" {
  depends_on = ["google_service_networking_connection.private_vpc_connection"]

  provisioner "local-exec" {
    command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 0}"
  }
}

resource "google_sql_database_instance" "instance_1" {
  provider = "google-beta"

  name = "private-instance-delayed-1"
  database_version = "POSTGRES_9_6"

  depends_on = [
    "google_service_networking_connection.private_vpc_connection",
    "null_resource.delayer_1"
  ]

  settings {
    tier = "db-custom-1-3840"
    availability_type = "REGIONAL"
    ip_configuration {
      ipv4_enabled = "false"
      private_network = "${google_compute_network.private_network.self_link}"
    }
  }
}

resource "null_resource" "delayer_2" {
  depends_on = ["google_service_networking_connection.private_vpc_connection"]

  provisioner "local-exec" {
    command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 1}"
  }
}

resource "google_sql_database_instance" "instance_2" {
  provider = "google-beta"

  name = "private-instance-delayed-2"
  database_version = "POSTGRES_9_6"

  depends_on = [
    "google_service_networking_connection.private_vpc_connection",
    "null_resource.delayer_2"
  ]

  settings {
    tier = "db-custom-1-3840"
    availability_type = "REGIONAL"
    ip_configuration {
      ipv4_enabled = "false"
      private_network = "${google_compute_network.private_network.self_link}"
    }
  }
}

provider "google-beta" {
  version = "~> 2.5"
  credentials = "credentials.json"
  project = "PROJECT_ID"
  region = "us-central1"
  zone = "us-central1-a"
}

provider "null" {
  version = "~> 1.0"
}

这篇关于如何修复“发生未知错误"同时创建具有私有IP的多个Google Cloud SQL实例时?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆