有没有办法找出实例被杀死的原因? [英] Is there a way to find out why instances get killed?

查看:61
本文介绍了有没有办法找出实例被杀死的原因?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个小型Java应用程序(2-10 qps),该应用程序设置为使用F4_1G实例自动缩放.有趣的是,通常只有一个实例真正处于活动状态,但通常会创建两个实例.有时,几个小时后,一个实例消失了,立即被另一个1-2个实例替换,而相应的实例负载又大大增加了延迟.有什么方法可以找出实例被击落的原因吗?我没有看到任何_ah/stop(我认为这对于自动缩放是正常的),也没有任何有关超出内存限制/移动到另一个系统或任何其他错误的消息,只是发生更改时的大延迟.此外,实例使用的内存约为250MB,比1GB少很多.而且,延迟非常低(平均80毫秒).

I have a small java app (2-10 qps) that is set to automatic scaling with F4_1G instances. Interesting while normally only one instance is really active, normally there are two instances created. Sometimes, after a few hours, an instance disappears and is immediately replaced by another 1-2 instances, with a corresponding instance load which spikes latency a lot. Is there any way to find out why an instance is shot down? I dont see any _ah/stop (which I think is normal for automatic scaling) or any messages about exceeding memory limits / moving to another system or any other errors, just big latencies when the change happens. Also, the instances are using around 250MB memory, which is a lot less than 1GB. Also, latencies are very low (average 80ms).

我还尝试了基本扩展,重启次数较少,但也有一些事情发生.我可以在那里看到_ah/stop,但是仍然没有错误消息说明为什么它被停止了(例如,正在日志中搜索移动",超出",内存").

I also tried with basic scaling, where there are less restart, but there are also some happening. I can see the _ah/stop there, but still no error messages of why it was stopped (eg, was searching the log for "move" "exceed" "memory").

从我在堆栈上可以找到的地方,我真的看不到弹出的位置,它在日志中,对吗?关于如何找出问题可能还有其他想法吗?

From what I could find here on stack, I could not really see where this would pop up, it would be in the log, right? Any other ideas of how to figure out what the problem could be?

推荐答案

几个月前,我遇到了同样的问题.即使CPU和内存的使用处于范围之内,实例也将被关闭并产生新的实例,并且实例本身或其响应延迟没有特别的问题,并且没有流量高峰. 经过大量观察和研究,我注意到在处理了50000个请求(或更多请求)之后,实例正在重新启动.

I ran into the same issue a few months back. Instances were being shut down and new instances were spawned, even though the CPU and memory usages were within bounds, and there were no particular issue with the instance itself or its response latencies, and no traffic spikes. After much observation and research, I noticed that the instances were being restarted after having served 50000 requests (or very little more).

在实例重新启动之前,实例所服务的请求数量似乎没有明确的硬性限制,在我的情况下为50000(在App Engine Java标准上为F4或F4_1G实例).其他人也得出了相同的结论(例如,请参见此处 ).

There seems to be an undocumented hard limit on the number of requests served by an instance before it is restarted, in my case 50000 (with F4 or F4_1G instance on app engine java standard). Others have come to the same conclusion (see here for instance).

两年后可能对您来说太迟了,但是我希望这对将来可能会在这里出现的其他人有所帮助.

Probably too late for you after two years, but I hope this helps others that might end up here in the future.

这篇关于有没有办法找出实例被杀死的原因?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆