GKE 集群无法从同一项目(GitLab Kubernetes 集成)中的 GCR 注册表中提取(ErrImagePull):为什么? [英] GKE Cluster can't pull (ErrImagePull) from GCR Registry in same project (GitLab Kubernetes Integration): Why?

查看:17
本文介绍了GKE 集群无法从同一项目(GitLab Kubernetes 集成)中的 GCR 注册表中提取(ErrImagePull):为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以在谷歌上搜索了一下(被 Pull Secrets 有问题的人污染了)之后,我将这个发布在这里 - 并发布到 GCP 支持(我听到的会更新).

So after googling a little bit (which is polluted by people having trouble with Pull Secrets) I am posting this here — and to GCP Support (will update as I hear).

我通过 GitLab Kubernetes 集成创建了一个集群(文档:https://about.gitlab.com/solutions/kubernetes) 与我的 GCR 注册表/图像在同一个项目中.

I created a Cluster from GitLab Kubernetes integration (docs: https://about.gitlab.com/solutions/kubernetes) within the same project as my GCR registry / images.

当我使用 Kubectl(它依赖于该项目中 GCR 注册表中的私有映像)向该集群添加新服务/部署时,GitLab 创建的集群中的 pod 无法从 GCR 拉取:ErrImagePull.

When I add a new service / deployment to this Cluster using Kubectl (which relies on a private image within the GCR Registry in this project) the pods in the GitLab created cluster fail to pull from GCR with: ErrImagePull.

明确一点——我不是从 GitLab 私有注册表中提取,而是尝试从与从 GitLab 创建的 GKE 集群相同的项目中的 GCR 注册表中提取(这不应该需要 Pull Secret).

To be clear — I am NOT pulling from a GitLab private registry, I am attempting to pull from a GCR Registry within the same project as the GKE cluster created from GitLab (which should not require a Pull Secret).

此项目中的其他集群(从 GCP 控制台创建)可以正确访问相同的图像,所以我认为通过 API 创建的集群(在本例中来自 GitLab)与从 GCP 控制台创建的集群之间存在一些差异.

Other Clusters (created from GCP console) within this project can properly access the same image so my thinking is that there is some difference between Clusters created via an API (in this case from GitLab) vs Clusters created from the GCP console.

我希望过去有人遇到过这种情况——或者可以解释可能导致问题的服务帐户等方面的差异.

I am hoping someone has run into this in the past — or can explain the differences in the Service Accounts etc that could be causing the problem.

我将尝试创建一个服务帐户并手动为其授予 Project Viewer 角色,看看是否能解决问题.

I am going to attempt to create a service account and manually grant it Project Viewer role to see if that solves the problem.

更新:手动配置的服务帐户没有解决问题.

Update: manually configured Service Account did not solve issue.

注意:我正在尝试将图像拉入集群,而不是拉入集群上运行的 GitLab Runner.IE.我想要一个单独的服务/部署与我的 GitLab 基础架构一起运行.

推荐答案

TL;DR — 由 GitLab-Ci Kubernetes Integration 创建的集群将无法从 GCR Registry 中的与容器镜像相同的项目——无需修改节点权限(范围).

TL;DR — Clusters created by GitLab-Ci Kubernetes Integration will not be able to pull an image from a GCR Registry in the same project as the container images — without modifying the Node(s) permissions (scopes).

虽然您可以手动修改单个节点计算机上的权限以授予应用程序默认凭据(请参阅:https://developers.google.com/identity/protocols/application-default-credentials) 实时正确的范围——这样做意味着如果你的节点是在将来的某个时候重新创建它不会有您修改的范围,并且事情会中断.

While you CAN manually modify the permissions on an Individual Node machine(s) to grant the Application Default Credentials (see: https://developers.google.com/identity/protocols/application-default-credentials) the proper scopes in real time — doing it this way would mean that if your node is re-created at some point in the future it WOULD NOT have your modified scopes and things would break.

与其手动修改权限,不如创建一个具有适当范围的新节点池来访问您所需的 GCP 服务.

Instead of modifying the permissions manually — create a new Node pool that has the proper Scope(s) to access your required GCP services.

以下是我用来参考的一些资源:

Here are some resources I used for reference:

  1. https://medium.com/google-cloud/updating-google-container-engine-vm-scopes-with-zero-downtime-50bff87e5f80
  2. https://adilsoncarvalho.com/changeing-a-running-kubernetes-cluster-permissions-aka-scopes-3e90a3b95636

创建适当范围的节点池通常如下所示

gcloud container node-pools create [new pool name] 
 --cluster [cluster name] 
 --machine-type [your desired machine type] 
 --num-nodes [same-number-nodes] 
 --scopes [your new set of scopes]

如果您不确定所需范围的名称是什么 - 您可以在此处查看范围和范围别名的完整列表:https://cloud.google.com/sdk/gcloud/reference/container/node-pools/create

If you aren't sure what the names of your required Scopes are — You can see a full list of Scopes AND Scope Aliases over here: https://cloud.google.com/sdk/gcloud/reference/container/node-pools/create

对我来说,我做了 gke-default(与我的其他集群相同)和 sql-admin.这样做的原因是我需要能够在部分构建期间访问 Cloud SQL 中的 SQL 数据库,并且我不想连接到公共 IP 来执行此操作.

For me I did gke-default (same as my other cluster) and sql-admin. The reason for this being that I need to be able to access an SQL Database in Cloud SQL during part of my build and I don't want to have to connect to a pubic IP to do that.

  1. https://www.googleapis.com/auth/devstorage.read_only (允许你拉)
  2. https://www.googleapis.com/auth/logging.write
  3. https://www.googleapis.com/auth/monitoring
  4. https://www.googleapis.com/auth/service.management.readonly
  5. https://www.googleapis.com/auth/servicecontrol
  6. https://www.googleapis.com/auth/trace.append
  1. https://www.googleapis.com/auth/devstorage.read_only (allows you to pull)
  2. https://www.googleapis.com/auth/logging.write
  3. https://www.googleapis.com/auth/monitoring
  4. https://www.googleapis.com/auth/service.management.readonly
  5. https://www.googleapis.com/auth/servicecontrol
  6. https://www.googleapis.com/auth/trace.append

将上述内容与来自 GitLab-Ci 创建的集群的更多锁定权限进行对比(只有这两个:https://www.googleapis.com/auth/logging.write, https://www.googleapis.com/auth/monitoring):

Contrast the above with more locked down permissions from a GitLab-Ci created cluster ( ONLY these two: https://www.googleapis.com/auth/logging.write, https://www.googleapis.com/auth/monitoring):

显然,将您的集群配置为仅所需的最低权限是肯定的方法.一旦你弄清楚那是什么并创建新的适当范围的节点池......

Obviosuly configuring your cluster to ONLY the minimum permissions needed is for sure the way to go here. Once you figure out what that is and create your new properly scoped Node Pool...

列出您的节点:

kubectl get nodes

您刚刚创建的(最近的)具有新设置,而旧选项是可以从 GCR 中提取的默认 gitlab 集群.

The one you just created (most recent) is has the new settings while the older option is the default gitlab cluster that can pull from the GCR.

然后:

kubectl cordon [your-node-name-here]

之后你想排干:

kubectl drain [your-node-name-here] --force

我遇到的问题是,我安装了 GitLab Runner,这意味着由于用于控制它的本地数据/守护程序集,我无法正常排空 pod.

I ran into issues where the fact that I had a GitLab Runner installed meant that I couldn't drain the pods normally due to the local data / daemon set that was used to control it.

因此,一旦我封锁了我的节点,我就从 Kubectl 中删除了该节点(不确定这是否会导致问题——但这对我来说很好).删除节点后,您需要删除 GitLab 创建的默认池"节点池.

For that reason once I cordon'd my Node I just deleted the node from Kubectl (not sure if this will cause problems — but it was fine for me). Once your node is deleted you need to delete the 'default-pool' node pool created by GitLab.

列出您的节点池:

gcloud container node-pools list --cluster [CLUSTER_NAME]

查看 gitlab 创建的旧范围:

See the old scopes created by gitlab:

gcloud container node-pools describe default-pool 
    --cluster [CLUSTER_NAME]

检查您是否有正确的新范围(您刚刚添加的):

Check to see if you have the correct new scopes (that you just added):

gcloud container node-pools describe [NEW_POOL_NAME] 
    --cluster [CLUSTER_NAME]

如果您的新节点池具有正确的范围,您的部署现在可以删除默认池:

If your new Node Pool has the right scopes your deployments can now delete the default pool with:

gcloud container node-pools delete default-pool 
   --cluster <YOUR_CLUSTER_NAME> --zone <YOUR_ZONE>

就我个人而言,我仍在尝试弄清楚如何允许访问私有网络(即通过私有 IP 访问 Cloud SQL),但我现在可以提取图像,所以我已经完成了一半.

In my personal case I am still trying to figure out how to allow access to the private network (ie. get to Cloud SQL via private IP) but I can pull my images now so I am half way there.

我想就是这样 - 希望它为您节省了几分钟!

I think that's it — hope it saved you a few minutes!

这篇关于GKE 集群无法从同一项目(GitLab Kubernetes 集成)中的 GCR 注册表中提取(ErrImagePull):为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆