在Kubernetes cron作业中运行的应用程序未连接到同一Kubernetes集群中的数据库 [英] Application running in Kubernetes cron job does not connect to database in same Kubernetes cluster

查看:53
本文介绍了在Kubernetes cron作业中运行的应用程序未连接到同一Kubernetes集群中的数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Kubernetes集群运行一个PostgreSQL数据库,一个Grafana仪表板和一个Python单运行应用程序(构建为Docker映像),该应用程序每小时在Kubernetes CronJob 中运行(请参见下面的清单)).此外,这一切都可以通过ArgoCD和Istio侧车注入进行部署.

我遇到的问题(如标题所示)是我的Python应用程序无法连接到集群中的数据库.这对我来说很奇怪,因为实际上仪表板可以连接到数据库,所以我不确定Python应用程序可能有什么不同.

以下是我的清单(更改了一些内容以删除可识别的信息):

database.yaml 的内容:

 <代码> ---apiVersion:apps/v1种类:部署元数据:标签:应用:数据库名称:数据库规格:复制品:1选择器:matchLabels:应用:数据库战略: {}模板:元数据:标签:应用:数据库规格:容器:-图片:postgres:12.5imagePullPolicy:"名称:数据库端口:-containerPort:5432环境:-名称:POSTGRES_DBvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_DB-名称:POSTGRES_USERvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_USER-名称:POSTGRES_PASSWORDvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_PASSWORD资源: {}准备情况:initialDelaySeconds:30tcpSocket:端口:5432restartPolicy:始终serviceAccountName:"卷:null地位: {}---apiVersion:v1种类:服务元数据:标签:应用:数据库名称:数据库规格:端口:-名称:"5432";端口:5432targetPort:5432选择器:应用:数据库地位:loadBalancer:{} 

dashboard.yaml 的内容:

 <代码> ---apiVersion:apps/v1种类:部署元数据:标签:应用:仪表板名称:仪表板规格:复制品:1选择器:matchLabels:应用:仪表板战略: {}模板:元数据:标签:应用:仪表板规格:容器:-图片:grafana:7.3.3imagePullPolicy:"名称:仪表板端口:-containerPort:3000资源: {}环境:-名称:POSTGRES_DBvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_DB-名称:POSTGRES_USERvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_USER-名称:POSTGRES_PASSWORDvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_PASSWORDvolumeMounts:-名称:grafana-datasourcemountPath:/etc/grafana/provisioning/datasources准备情况:initialDelaySeconds:30httpGet:小路:/端口:3000restartPolicy:始终serviceAccountName:"数量:-名称:grafana-datasourceconfigMap:defaultMode:420名称:grafana-datasource-名称:grafana-dashboard-provision地位: {}---apiVersion:v1种类:服务元数据:标签:应用:仪表板名称:仪表板规格:端口:-名称:"3000";端口:3000targetPort:3000选择器:应用:仪表板地位:loadBalancer:{} 

cronjob.yaml 的内容:

 <代码> ---apiVersion:批处理/v1beta1种类:CronJob元数据:名称:蟒蛇规格:concurrencyPolicy:替换#TODO:完成测试/问题排查后,返回每小时一次#时间表:"@ hourly"时间表:"*/15 * * * *"jobTemplate:规格:模板:规格:容器:-图片:python-tool:1.0.5imagePullPolicy:"名称:蟒蛇args:[]命令:-/bin/sh- -C->-回声" $(POSTGRES_USER)">信用/db.creds;回声" $(POSTGRES_PASSWORD)">>信用/db.creds;回声"$(SERVICE1_TOKEN)">信用/service1.creds;回声"$(SERVICE2_TOKEN)">信用/service2.creds;回声"$(SERVICE3_TOKEN)">信用/service3.creds;python3 -u main.py;回显作业以退出代码$?完成";环境:-名称:POSTGRES_DBvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_DB-名称:POSTGRES_USERvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_USER-名称:POSTGRES_PASSWORDvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_PASSWORD-名称:SERVICE1_TOKENvalueFrom:secretKeyRef:名称:api-tokens-secret密钥:SERVICE1_TOKEN-名称:SERVICE2_TOKENvalueFrom:secretKeyRef:名称:api-tokens-secret密钥:SERVICE2_TOKEN-名称:SERVICE3_TOKENvalueFrom:secretKeyRef:名称:api-tokens-secret密钥:SERVICE3_TOKENrestartPolicy:OnFailureserviceAccountName:"地位: {} 

现在,正如我提到的那样,Istio也是该图中的一部分,因此我为仪表板提供了虚拟服务,因为它应该可以在集群外部访问,但仅此而已.

所有这些都解决了,这是我为解决这个问题所做的一切,我自己:

  1. 确认 CronJob 使用正确的连接设置(即主机,数据库名称,用户名和密码)来连接数据库.

    为此,我在 CronJob 部署中添加了echo语句,其中显示了用户名和密码(我知道,我知道),它们是期望值.我还知道这些是数据库的正确连接设置,因为我逐字使用它们将仪表板连接到数据库,从而建立了成功的连接.

    Grafana仪表板的数据源设置:

    来自Python应用程序的错误消息(显示在容器的ArgoCD日志中):

  2. 认为Istio可能是导致此问题的原因,我尝试为 CronJob 资源禁用Istio侧面注入(通过将此注释添加到 metadata.annotations 部分: sidecar.istio.io/inject:false ),但注释实际上从未出现在Argo日志中,并且在运行 CronJob 时未观察到任何更改.

  3. 我尝试了 kubectl exec 到运行Python脚本的 CronJob 容器中进行更多调试,但由于容器退出后实际上无法执行当发生连接错误时.

也就是说,我已经将我的头撞到墙上足够长的时间了.任何人都可以发现我可能会想念的东西,并向我指出正确的方向吗?

解决方案

我认为问题是您的pod在istio sidecar准备好之前尝试连接到数据库.因此无法建立连接.

Istio运行一个初始化容器,该容器配置Pod路由表,以便所有流量都通过边车进行路由.因此,如果Sidecar没有运行,而另一个Pod尝试连接到数据库,则无法建立连接.

有两种解决方案.

首先,您的工作可能要等待30秒钟,然后才能通过一些睡眠命令调用 main.py .

或者,您可以启用 holdApplicationUntilProxyStarts .通过这个主容器,直到小车运行时才启动.

I have a Kubernetes cluster running the a PostgreSQL database, a Grafana dashboard, and a Python single-run application (built as a Docker image) that runs hourly inside a Kubernetes CronJob (see manifests below). Additionally, this is all being deployed using ArgoCD with Istio side-car injection.

The issue I'm having (as the title indicates) is that my Python application cannot connect to the database in the cluster. This is very strange to me since the dashboard, in fact, can connect to the database so I'm not sure what might be different for the Python app.

Following are my manifests (with a few things changed to remove identifiable information):

Contents of database.yaml:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: database
  name: database
spec:
  replicas: 1
  selector:
    matchLabels:
      app: database
  strategy: {}
  template:
    metadata:
      labels:
        app: database
    spec:
      containers:
      - image: postgres:12.5
        imagePullPolicy: ""
        name: database
        ports:
        - containerPort: 5432
        env:
          - name: POSTGRES_DB
            valueFrom:
              secretKeyRef:
                name: postgres-secret
                key: POSTGRES_DB
          - name: POSTGRES_USER
            valueFrom:
              secretKeyRef:
                name: postgres-secret
                key: POSTGRES_USER
          - name: POSTGRES_PASSWORD
            valueFrom:
              secretKeyRef:
                name: postgres-secret
                key: POSTGRES_PASSWORD
        resources: {}
        readinessProbe:
          initialDelaySeconds: 30
          tcpSocket:
            port: 5432
      restartPolicy: Always
      serviceAccountName: ""
      volumes: null
status: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: database
  name: database
spec:
  ports:
  - name: "5432"
    port: 5432
    targetPort: 5432
  selector:
    app: database
status:
  loadBalancer: {}

Contents of dashboard.yaml:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: dashboard
  name: dashboard
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dashboard
  strategy: {}
  template:
    metadata:
      labels:
        app: dashboard
    spec:
      containers:
      - image: grafana:7.3.3
        imagePullPolicy: ""
        name: dashboard
        ports:
          - containerPort: 3000
        resources: {}
        env:
          - name: POSTGRES_DB
            valueFrom:
              secretKeyRef:
                name: postgres-secret
                key: POSTGRES_DB
          - name: POSTGRES_USER
            valueFrom:
              secretKeyRef:
                name: postgres-secret
                key: POSTGRES_USER
          - name: POSTGRES_PASSWORD
            valueFrom:
              secretKeyRef:
                name: postgres-secret
                key: POSTGRES_PASSWORD
        volumeMounts:
          - name: grafana-datasource
            mountPath: /etc/grafana/provisioning/datasources
        readinessProbe:
          initialDelaySeconds: 30
          httpGet:
            path: /
            port: 3000
      restartPolicy: Always
      serviceAccountName: ""
      volumes:
        - name: grafana-datasource
          configMap:
            defaultMode: 420
            name: grafana-datasource
        - name: grafana-dashboard-provision
status: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: dashboard
  name: dashboard
spec:
  ports:
  - name: "3000"
    port: 3000
    targetPort: 3000
  selector:
    app: dashboard
status:
  loadBalancer: {}

Contents of cronjob.yaml:

---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: python
spec:
  concurrencyPolicy: Replace
  # TODO: Go back to hourly when finished testing/troubleshooting
  # schedule: "@hourly"
  schedule: "*/15 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - image: python-tool:1.0.5
            imagePullPolicy: ""
            name: python
            args: []
            command:
              - /bin/sh
              - -c
              - >-
                echo "$(POSTGRES_USER)" > creds/db.creds;
                echo "$(POSTGRES_PASSWORD)" >> creds/db.creds;
                echo "$(SERVICE1_TOKEN)" > creds/service1.creds;
                echo "$(SERVICE2_TOKEN)" > creds/service2.creds;
                echo "$(SERVICE3_TOKEN)" > creds/service3.creds;
                python3 -u main.py;
                echo "Job finished with exit code $?";
            env:
              - name: POSTGRES_DB
                valueFrom:
                  secretKeyRef:
                    name: postgres-secret
                    key: POSTGRES_DB
              - name: POSTGRES_USER
                valueFrom:
                  secretKeyRef:
                    name: postgres-secret
                    key: POSTGRES_USER
              - name: POSTGRES_PASSWORD
                valueFrom:
                  secretKeyRef:
                    name: postgres-secret
                    key: POSTGRES_PASSWORD
              - name: SERVICE1_TOKEN
                valueFrom:
                  secretKeyRef:
                    name: api-tokens-secret
                    key: SERVICE1_TOKEN
              - name: SERVICE2_TOKEN
                valueFrom:
                  secretKeyRef:
                    name: api-tokens-secret
                    key: SERVICE2_TOKEN
              - name: SERVICE3_TOKEN
                valueFrom:
                  secretKeyRef:
                    name: api-tokens-secret
                    key: SERVICE3_TOKEN
          restartPolicy: OnFailure
          serviceAccountName: ""
status: {}

Now, as I mentioned Istio is also a part of this picture so I have a Virtual service for the dashboard since it should be accessible outside of the cluster, but that's it.

With all of that out of the way, here's what I've done to try and solve this, myself:

  1. Confirm the CronJob is using the correct connection settings (i.e. host, database name, username, and password) for connecting to the database.

    For this, I added echo statements to the CronJob deployment showing the username and password (I know, I know) and they were the expected values. I also know those were the correct connection settings for the database because I used them verbatim to connect the dashboard to the database, which gave a successful connection.

    The data source settings for the Grafana dashboard:

    The error message from the Python application (shown in the ArgoCD logs for the container):

  2. Thinking Istio might be causing this problem, I tried disabling Istio side-car injection for the CronJob resource (by adding this annotation to the metadata.annotations section: sidecar.istio.io/inject: false) but the annotation never actually showed up in the Argo logs and no change was observed when the CronJob was running.

  3. I tried kubectl execing into the CronJob container that was running the Python script to debug more but was never actually able to since the container exited as soon as the connection error occurs.

That said, I've been banging my head into a wall for long enough on this. Could anyone spot what I might be missing and point me in the right direction, please?

解决方案

I think the problem is that your pod tries to connect to the database before the istio sidecar is ready. And thus the connection can't be established.

Istio runs an init container that configures the pods route table so all traffic is routed through the sidecar. So if the sidecar isn't running and the other pod tries to connect to the db, no connection can be established.

There are two solutions.

First your job could wait for eg 30 seconds before calling main.py with some sleep command.

Alternatively you could enable holdApplicationUntilProxyStarts. By this main container will not start until the sidecar is running.

这篇关于在Kubernetes cron作业中运行的应用程序未连接到同一Kubernetes集群中的数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆