尽管等效的gcloud/mnt可以正常工作,但Kubernetes部署无法装载卷 [英] Kubernetes deployment cannot mount volume despite equivalent gcloud/mnt works fine
问题描述
我有一个Kubernetes部署,其中一个Pod应该安装一个PD.
I have a Kubernetes deployment where a pod should mount a PD.
在spec.template.spec.containers.[*]
下,我有这个:
volumeMounts:
- name: app-volume
mountPath: /mnt/disk/app-pd
,并在spec.template.spec
下表示:
volumes:
- name: app-volume
gcePersistentDisk:
pdName: app-pd
fsType: ext4
app-pd
是GCE永久磁盘,上面带有单个ext4
文件系统(因此没有分区).如果运行kubectl create
,我将从kubectl describe pod
收到这些错误消息:
app-pd
is a GCE persistent disk with a single ext4
file system (hence no partitions) on it. If I run kubectl create
I get these error messages from kubectl describe pod
:
Warning FailedMount Unable to mount volumes for pod "<id>":
timeout expired waiting for volumes to attach/mount for pod"<id>"/"default".
list of unattached/unmounted volumes=[app-volume]
Warning FailedSync Error syncing pod, skipping:
timeout expired waiting for volumes to attach/mount for pod "<id>"/"default".
list of unattached/unmounted volumes=[app-volume]
在运行Pod的VM实例上,/var/log/kubelet.log
包含这些错误消息的重复,这些错误消息可能与上述原因有关,甚至是导致上述错误的原因:
On the VM instance that runs the pod, /var/log/kubelet.log
contain repetitions of these error messages, which are presumably related to or even causing the above:
reconciler.go:179]
VerifyControllerAttachedVolume operation started for volume "kubernetes.io/gce-pd/<id>"
(spec.Name: "<id>") pod "<id>" (UID: "<id>")
goroutinemap.go:155]
Operation for "kubernetes.io/gce-pd/<id>" failed.
No retries permitted until <date> (durationBeforeRetry 2m0s).
error: Volume "kubernetes.io/gce-pd/<id>" (spec.Name: "<id>") pod "<id>" (UID: "<id>")
is not yet attached according to node status.
但是,如果我尝试将PD附加到运行gcloud compute instances attach-disk
和gcloud compute ssh
的pod的VM实例上,我可以看到已经创建了以下文件.
However, if I try to attach the PD to the VM instance which runs the pod with gcloud compute instances attach-disk
and the gcloud compute ssh
into it, I can see that these the following file have been created.
/dev/disk/by-id/google-persistent-disk-1
如果我将其安装(PD),则可以看到并使用预期的文件.
If I mount it (the PD) I can see and work with the expected files.
我如何进一步诊断该问题并最终解决它?
How can I further diagnose this problem and ultimately resolve it?
问题可能是文件名为/dev/disk/google-persistent-disk-1
而不是/dev/disk/google-<id>
吗(如果我从Cloud Console UI挂载文件时会发生这种情况)?
Could the problem be that the file is called /dev/disk/google-persistent-disk-1
instead of /dev/disk/google-<id>
as would happen if I would have mounted them from the Cloud Console UI?
更新我通过使用单个ext4
文件系统(因此没有分区)格式化磁盘来简化设置,并相应地编辑了上面的描述.我还从kubelet.log
添加了更具体的错误指示.
UPDATE I've simplified the setup by formatting the disk with a single ext4
file system (hence no partitions) and edited the description above accordingly. I've also added more specific error indications from kubelet.log
.
更新,如果我在部署到将托管Pod的实例VM之前手动添加PD(在Cloud Console UI中),问题仍然存在. PD和实例VM都在同一区域中.
UPDATE The problem also remains if I manually add the PD (in the Cloud Console UI) before deployment to the instance VM that will host the pod. Both the PD and the instance VM are in the same zone.