Google Compute Engine虚拟机不断崩溃 [英] Google Compute Engine VM constantly crashes
问题描述
有几个CPU使用率突然下降到30%左右,然后反弹回到99%。
虚拟机在崩溃时没有日志。是否有任何其他方式来获取错误日志?
如何防止虚拟机崩溃?
< a href =https://i.stack.imgur.com/ZBHby.png =nofollow noreferrer> CPU使用情况图
这可能是您的流程管理员说您的流程资源不足。您可能想考虑进行内核调优,您可以在其中增加VM / OS及其资源上活动进程的数量限制。或者您可以尝试使用更多物理资源的更大机器。简而言之,您的计算机缺乏资源,因此为了保持操作系统的稳定,进程管理器关闭进程。 SSH是这些进程之一。一旦你重新设置了机器,一切都恢复正常。
流程管理器/内核如何决定退出一个流程的方式有很多种。它可能只是一个过程一直保持很长时间以消耗太多资源。另外,需要注意的一点是,用于在GCP上创建虚拟机的操作系统映像是由Google自定义的,以确保它们可以限制在此类计算机上运行的进程的恶意功能。
解决此问题的最佳方法之一是:
$ b $ ul
On the Compute Engine VM in us-west-1b, I run 16 vCPUs near 99% usage. After a few hours, the VM automatically crashes. This is not a one-time incident, and I have to manually restart the VM.
There are a few instances of CPU usage suddenly dropping to around 30%, then bouncing back to 99%.
There are no logs for the VM at the time of the crash. Is there any other way to get the error logs?
How do I prevent VMs from crashing?
This could be your process manager saying that your processes are out of resources. You might wanna look into Kernel tuning where you can increase the limits on the number of active processes on your VM/OS and their resources. Or you can try using a bigger machine with more physical resources. In short, your machine is falling short on resources and hence in order to keep the OS up, process manager shuts down the processes. SSH is one of those processes. Once you reset the machine, all comes back to normal.
How process manager/kernel decides to quit a process varies in many ways. It could simply be that a process has consistently stayed up for way long time to consume too many resources. Also, one thing to note is that OS images that you use to create a VM on GCP is custom hardened by Google to make sure that they can limit malicious capabilities of processes running on such machines.
One of the best ways to tackle this is:
- increase the resources of your VM
- then go back to code and find out if there's something that is leaking in the process or memory
- if all fails, then you might wanna do some kernel tuning to make sure your processes have higer priority than other system process. Though this is a bad idea since you could end up creating a zombie VM.
这篇关于Google Compute Engine虚拟机不断崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!