在Kubernetes cron作业中运行的应用程序未连接到同一Kubernetes集群中的数据库 [英] Application running in Kubernetes cron job does not connect to database in same Kubernetes cluster
问题描述
我有一个Kubernetes集群运行一个PostgreSQL数据库,一个Grafana仪表板和一个Python单运行应用程序(构建为Docker映像),该应用程序每小时在Kubernetes CronJob
中运行(请参见下面的清单)).此外,这一切都可以通过ArgoCD和Istio侧车注入进行部署.
我遇到的问题(如标题所示)是我的Python应用程序无法连接到集群中的数据库.这对我来说很奇怪,因为实际上仪表板可以连接到数据库,所以我不确定Python应用程序可能有什么不同.
以下是我的清单(更改了一些内容以删除可识别的信息):
database.yaml 的内容:
<代码> ---apiVersion:apps/v1种类:部署元数据:标签:应用:数据库名称:数据库规格:复制品:1选择器:matchLabels:应用:数据库战略: {}模板:元数据:标签:应用:数据库规格:容器:-图片:postgres:12.5imagePullPolicy:"名称:数据库端口:-containerPort:5432环境:-名称:POSTGRES_DBvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_DB-名称:POSTGRES_USERvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_USER-名称:POSTGRES_PASSWORDvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_PASSWORD资源: {}准备情况:initialDelaySeconds:30tcpSocket:端口:5432restartPolicy:始终serviceAccountName:"卷:null地位: {}---apiVersion:v1种类:服务元数据:标签:应用:数据库名称:数据库规格:端口:-名称:"5432";端口:5432targetPort:5432选择器:应用:数据库地位:loadBalancer:{}
dashboard.yaml
的内容:
<代码> ---apiVersion:apps/v1种类:部署元数据:标签:应用:仪表板名称:仪表板规格:复制品:1选择器:matchLabels:应用:仪表板战略: {}模板:元数据:标签:应用:仪表板规格:容器:-图片:grafana:7.3.3imagePullPolicy:"名称:仪表板端口:-containerPort:3000资源: {}环境:-名称:POSTGRES_DBvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_DB-名称:POSTGRES_USERvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_USER-名称:POSTGRES_PASSWORDvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_PASSWORDvolumeMounts:-名称:grafana-datasourcemountPath:/etc/grafana/provisioning/datasources准备情况:initialDelaySeconds:30httpGet:小路:/端口:3000restartPolicy:始终serviceAccountName:"数量:-名称:grafana-datasourceconfigMap:defaultMode:420名称:grafana-datasource-名称:grafana-dashboard-provision地位: {}---apiVersion:v1种类:服务元数据:标签:应用:仪表板名称:仪表板规格:端口:-名称:"3000";端口:3000targetPort:3000选择器:应用:仪表板地位:loadBalancer:{}
cronjob.yaml 的内容:
<代码> ---apiVersion:批处理/v1beta1种类:CronJob元数据:名称:蟒蛇规格:concurrencyPolicy:替换#TODO:完成测试/问题排查后,返回每小时一次#时间表:"@ hourly"时间表:"*/15 * * * *"jobTemplate:规格:模板:规格:容器:-图片:python-tool:1.0.5imagePullPolicy:"名称:蟒蛇args:[]命令:-/bin/sh- -C->-回声" $(POSTGRES_USER)">信用/db.creds;回声" $(POSTGRES_PASSWORD)">>信用/db.creds;回声"$(SERVICE1_TOKEN)">信用/service1.creds;回声"$(SERVICE2_TOKEN)">信用/service2.creds;回声"$(SERVICE3_TOKEN)">信用/service3.creds;python3 -u main.py;回显作业以退出代码$?完成";环境:-名称:POSTGRES_DBvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_DB-名称:POSTGRES_USERvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_USER-名称:POSTGRES_PASSWORDvalueFrom:secretKeyRef:名称:postgres-secret密钥:POSTGRES_PASSWORD-名称:SERVICE1_TOKENvalueFrom:secretKeyRef:名称:api-tokens-secret密钥:SERVICE1_TOKEN-名称:SERVICE2_TOKENvalueFrom:secretKeyRef:名称:api-tokens-secret密钥:SERVICE2_TOKEN-名称:SERVICE3_TOKENvalueFrom:secretKeyRef:名称:api-tokens-secret密钥:SERVICE3_TOKENrestartPolicy:OnFailureserviceAccountName:"地位: {}
现在,正如我提到的那样,Istio也是该图中的一部分,因此我为仪表板提供了虚拟服务,因为它应该可以在集群外部访问,但仅此而已.
所有这些都解决了,这是我为解决这个问题所做的一切,我自己:
-
确认
CronJob
使用正确的连接设置(即主机,数据库名称,用户名和密码)来连接数据库.为此,我在
CronJob
部署中添加了echo语句,其中显示了用户名和密码(我知道,我知道),它们是期望值.我还知道这些是数据库的正确连接设置,因为我逐字使用它们将仪表板连接到数据库,从而建立了成功的连接.Grafana仪表板的数据源设置:
来自Python应用程序的错误消息(显示在容器的ArgoCD日志中):
-
认为Istio可能是导致此问题的原因,我尝试为
CronJob
资源禁用Istio侧面注入(通过将此注释添加到metadata.annotations
部分:sidecar.istio.io/inject:false
),但注释实际上从未出现在Argo日志中,并且在运行CronJob
时未观察到任何更改.> -
我尝试了
kubectl exec
到运行Python脚本的CronJob
容器中进行更多调试,但由于容器退出后实际上无法执行当发生连接错误时.
也就是说,我已经将我的头撞到墙上足够长的时间了.任何人都可以发现我可能会想念的东西,并向我指出正确的方向吗?
我认为问题是您的pod在istio sidecar准备好之前尝试连接到数据库.因此无法建立连接.
Istio运行一个初始化容器,该容器配置Pod路由表,以便所有流量都通过边车进行路由.因此,如果Sidecar没有运行,而另一个Pod尝试连接到数据库,则无法建立连接.
有两种解决方案.
首先,您的工作可能要等待30秒钟,然后才能通过一些睡眠命令调用 main.py
.
或者,您可以启用 holdApplicationUntilProxyStarts
.通过这个主容器,直到小车运行时才启动.
I have a Kubernetes cluster running the a PostgreSQL database, a Grafana dashboard, and a Python single-run application (built as a Docker image) that runs hourly inside a Kubernetes CronJob
(see manifests below). Additionally, this is all being deployed using ArgoCD with Istio side-car injection.
The issue I'm having (as the title indicates) is that my Python application cannot connect to the database in the cluster. This is very strange to me since the dashboard, in fact, can connect to the database so I'm not sure what might be different for the Python app.
Following are my manifests (with a few things changed to remove identifiable information):
Contents of database.yaml
:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: database
name: database
spec:
replicas: 1
selector:
matchLabels:
app: database
strategy: {}
template:
metadata:
labels:
app: database
spec:
containers:
- image: postgres:12.5
imagePullPolicy: ""
name: database
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_DB
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_USER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_PASSWORD
resources: {}
readinessProbe:
initialDelaySeconds: 30
tcpSocket:
port: 5432
restartPolicy: Always
serviceAccountName: ""
volumes: null
status: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
app: database
name: database
spec:
ports:
- name: "5432"
port: 5432
targetPort: 5432
selector:
app: database
status:
loadBalancer: {}
Contents of dashboard.yaml
:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: dashboard
name: dashboard
spec:
replicas: 1
selector:
matchLabels:
app: dashboard
strategy: {}
template:
metadata:
labels:
app: dashboard
spec:
containers:
- image: grafana:7.3.3
imagePullPolicy: ""
name: dashboard
ports:
- containerPort: 3000
resources: {}
env:
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_DB
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_USER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_PASSWORD
volumeMounts:
- name: grafana-datasource
mountPath: /etc/grafana/provisioning/datasources
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /
port: 3000
restartPolicy: Always
serviceAccountName: ""
volumes:
- name: grafana-datasource
configMap:
defaultMode: 420
name: grafana-datasource
- name: grafana-dashboard-provision
status: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
app: dashboard
name: dashboard
spec:
ports:
- name: "3000"
port: 3000
targetPort: 3000
selector:
app: dashboard
status:
loadBalancer: {}
Contents of cronjob.yaml
:
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: python
spec:
concurrencyPolicy: Replace
# TODO: Go back to hourly when finished testing/troubleshooting
# schedule: "@hourly"
schedule: "*/15 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- image: python-tool:1.0.5
imagePullPolicy: ""
name: python
args: []
command:
- /bin/sh
- -c
- >-
echo "$(POSTGRES_USER)" > creds/db.creds;
echo "$(POSTGRES_PASSWORD)" >> creds/db.creds;
echo "$(SERVICE1_TOKEN)" > creds/service1.creds;
echo "$(SERVICE2_TOKEN)" > creds/service2.creds;
echo "$(SERVICE3_TOKEN)" > creds/service3.creds;
python3 -u main.py;
echo "Job finished with exit code $?";
env:
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_DB
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_USER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_PASSWORD
- name: SERVICE1_TOKEN
valueFrom:
secretKeyRef:
name: api-tokens-secret
key: SERVICE1_TOKEN
- name: SERVICE2_TOKEN
valueFrom:
secretKeyRef:
name: api-tokens-secret
key: SERVICE2_TOKEN
- name: SERVICE3_TOKEN
valueFrom:
secretKeyRef:
name: api-tokens-secret
key: SERVICE3_TOKEN
restartPolicy: OnFailure
serviceAccountName: ""
status: {}
Now, as I mentioned Istio is also a part of this picture so I have a Virtual service for the dashboard since it should be accessible outside of the cluster, but that's it.
With all of that out of the way, here's what I've done to try and solve this, myself:
Confirm the
CronJob
is using the correct connection settings (i.e. host, database name, username, and password) for connecting to the database.For this, I added echo statements to the
CronJob
deployment showing the username and password (I know, I know) and they were the expected values. I also know those were the correct connection settings for the database because I used them verbatim to connect the dashboard to the database, which gave a successful connection.The data source settings for the Grafana dashboard:
The error message from the Python application (shown in the ArgoCD logs for the container):
Thinking Istio might be causing this problem, I tried disabling Istio side-car injection for the
CronJob
resource (by adding this annotation to themetadata.annotations
section:sidecar.istio.io/inject: false
) but the annotation never actually showed up in the Argo logs and no change was observed when theCronJob
was running.I tried
kubectl exec
ing into theCronJob
container that was running the Python script to debug more but was never actually able to since the container exited as soon as the connection error occurs.
That said, I've been banging my head into a wall for long enough on this. Could anyone spot what I might be missing and point me in the right direction, please?
I think the problem is that your pod tries to connect to the database before the istio sidecar is ready. And thus the connection can't be established.
Istio runs an init container that configures the pods route table so all traffic is routed through the sidecar. So if the sidecar isn't running and the other pod tries to connect to the db, no connection can be established.
There are two solutions.
First your job could wait for eg 30 seconds before calling main.py
with some sleep command.
Alternatively you could enable holdApplicationUntilProxyStarts
. By this main container will not start until the sidecar is running.
这篇关于在Kubernetes cron作业中运行的应用程序未连接到同一Kubernetes集群中的数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!