Google Compute Engine虚拟机不断崩溃 [英] Google Compute Engine VM constantly crashes

查看:239
本文介绍了Google Compute Engine虚拟机不断崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在us-west-1b的Compute Engine虚拟机上,我使用了近99%的16个vCPU。几个小时后,虚拟机自动崩溃。这不是一次性事件,我必须手动重新启动虚拟机。



有几个CPU使用率突然下降到30%左右,然后反弹回到99%。



虚拟机在崩溃时没有日志。是否有任何其他方式来获取错误日志?



如何防止虚拟机崩溃?



< a href =https://i.stack.imgur.com/ZBHby.png =nofollow noreferrer> CPU使用情况图

解决方案

这可能是您的流程管理员说您的流程资源不足。您可能想考虑进行内核调优,您可以在其中增加VM / OS及其资源上活动进程的数量限制。或者您可以尝试使用更多物理资源的更大机器。简而言之,您的计算机缺乏资源,因此为了保持操作系统的稳定,进程管理器关闭进程。 SSH是这些进程之一。一旦你重新设置了机器,一切都恢复正常。

流程管理器/内核如何决定退出一个流程的方式有很多种。它可能只是一个过程一直保持很长时间以消耗太多资源。另外,需要注意的一点是,用于在GCP上创建虚拟机的操作系统映像是由Google自定义的,以确保它们可以限制在此类计算机上运行的进程的恶意功能。



解决此问题的最佳方法之一是:
$ b $ ul

  • 增加虚拟机的资源$ b $然后返回代码并确定流程或内存中是否有内存泄漏(如果全部失败),那么您可能需要执行一些内核调整以确保您的进程比其他系统进程具有更高的优先级。虽然这是一个坏主意,因为你最终可能会创建一个僵尸虚拟机。


  • On the Compute Engine VM in us-west-1b, I run 16 vCPUs near 99% usage. After a few hours, the VM automatically crashes. This is not a one-time incident, and I have to manually restart the VM.

    There are a few instances of CPU usage suddenly dropping to around 30%, then bouncing back to 99%.

    There are no logs for the VM at the time of the crash. Is there any other way to get the error logs?

    How do I prevent VMs from crashing?

    CPU usage graph

    解决方案

    This could be your process manager saying that your processes are out of resources. You might wanna look into Kernel tuning where you can increase the limits on the number of active processes on your VM/OS and their resources. Or you can try using a bigger machine with more physical resources. In short, your machine is falling short on resources and hence in order to keep the OS up, process manager shuts down the processes. SSH is one of those processes. Once you reset the machine, all comes back to normal.

    How process manager/kernel decides to quit a process varies in many ways. It could simply be that a process has consistently stayed up for way long time to consume too many resources. Also, one thing to note is that OS images that you use to create a VM on GCP is custom hardened by Google to make sure that they can limit malicious capabilities of processes running on such machines.

    One of the best ways to tackle this is:

    • increase the resources of your VM
    • then go back to code and find out if there's something that is leaking in the process or memory
    • if all fails, then you might wanna do some kernel tuning to make sure your processes have higer priority than other system process. Though this is a bad idea since you could end up creating a zombie VM.

    这篇关于Google Compute Engine虚拟机不断崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆