Kubernetes Pod 终止 - 退出代码 137 [英] Kubernetes Pods Terminated - Exit Code 137

查看:73
本文介绍了Kubernetes Pod 终止 - 退出代码 137的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一些关于 k8s 1.14 和在其上运行 gitlab 管道的问题的建议.许多作业都抛出退出代码 137 错误,我发现这意味着容器突然终止.

I need some advise on an issue I am facing with k8s 1.14 and running gitlab pipelines on it. Many jobs are throwing up exit code 137 errors and I found that it means that the container is being terminated abruptly.

集群信息:

Kubernetes 版本:1.14正在使用的云:AWS EKS节点:C5.4xLarge

Kubernetes version: 1.14 Cloud being used: AWS EKS Node: C5.4xLarge

深入挖掘后,我发现了以下日志:

After digging in, I found the below logs:

**kubelet: I0114 03:37:08.639450**  4721 image_gc_manager.go:300] [imageGCManager]: Disk usage on image filesystem is at 95% which is over the high threshold (85%). Trying to free 3022784921 bytes down to the low threshold (80%).

**kubelet: E0114 03:37:08.653132**  4721 kubelet.go:1282] Image garbage collection failed once. Stats initialization may not have completed yet: failed to garbage collect required amount of images. Wanted to free 3022784921 bytes, but freed 0 bytes

**kubelet: W0114 03:37:23.240990**  4721 eviction_manager.go:397] eviction manager: timed out waiting for pods runner-u4zrz1by-project-12123209-concurrent-4zz892_gitlab-managed-apps(d9331870-367e-11ea-b638-0673fa95f662) to be cleaned up

**kubelet: W0114 00:15:51.106881**   4781 eviction_manager.go:333] eviction manager: attempting to reclaim ephemeral-storage

**kubelet: I0114 00:15:51.106907**   4781 container_gc.go:85] attempting to delete unused containers

**kubelet: I0114 00:15:51.116286**   4781 image_gc_manager.go:317] attempting to delete unused images

**kubelet: I0114 00:15:51.130499**   4781 eviction_manager.go:344] eviction manager: must evict pod(s) to reclaim ephemeral-storage 

**kubelet: I0114 00:15:51.130648**   4781 eviction_manager.go:362] eviction manager: pods ranked for eviction:

 1. runner-u4zrz1by-project-10310692-concurrent-1mqrmt_gitlab-managed-apps(d16238f0-3661-11ea-b638-0673fa95f662)
 2. runner-u4zrz1by-project-10310692-concurrent-0hnnlm_gitlab-managed-apps(d1017c51-3661-11ea-b638-0673fa95f662)

 3. runner-u4zrz1by-project-13074486-concurrent-0dlcxb_gitlab-managed-apps(63d78af9-3662-11ea-b638-0673fa95f662)

 4. prometheus-deployment-66885d86f-6j9vt_prometheus(da2788bb-3651-11ea-b638-0673fa95f662)

 5. nginx-ingress-controller-7dcc95dfbf-ld67q_ingress-nginx(6bf8d8e0-35ca-11ea-b638-0673fa95f662)

然后 Pod 被终止,导致退出代码 137s.

And then the pods get terminated resulting in the exit code 137s.

谁能帮助我了解原因和克服此问题的可能解决方案?

Can anyone help me understand the reason and a possible solution to overcome this?

谢谢:)

推荐答案

能够解决问题.

节点最初有 20G 的 ebs 卷和 c5.4xlarge 实例类型.我将 ebs 增加到 50 和 100G,但这并没有帮助,因为我一直看到以下错误:

The nodes initially had 20G of ebs volume and on a c5.4xlarge instance type. I increased the ebs to 50 and 100G but that did not help as I kept seeing the below error:

"图像文件系统上的磁盘使用率为 95%,超过了最高值阈值 (85%).尝试将 3022784921 字节释放到低位阈值 (80%)."

"Disk usage on image filesystem is at 95% which is over the high threshold (85%). Trying to free 3022784921 bytes down to the low threshold (80%). "

然后我将实例类型更改为 c5d.4xlarge,它具有 400GB 的缓存存储并提供 300GB 的 EBS.这解决了错误.

I then changed the instance type to c5d.4xlarge which had 400GB of cache storage and gave 300GB of EBS. This solved the error.

一些 gitlab 作业是针对一些占用大量缓存空间和写入大量日志的 Java 应用程序的.

Some of the gitlab jobs were for some java applications that were eating away lot of cache space and writing lot of logs.

这篇关于Kubernetes Pod 终止 - 退出代码 137的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆