如何为 Apache Storm 工作人员启用 GC 日志记录,同时防止日志文件覆盖和限制磁盘空间使用 [英] How to enable GC logging for Apache Storm workers, while preventing log file overwrites and capping disk space usage

查看:16
本文介绍了如何为 Apache Storm 工作人员启用 GC 日志记录,同时防止日志文件覆盖和限制磁盘空间使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们最近决定在多个集群上为 Apache Storm 工作程序启用 GC 日志记录(具体版本各不相同),以帮助调查与拓扑相关的内存和垃圾收集问题.我们想为工人这样做,但我们也想避免我们知道可能会发生的两个问题:

We recently decided to enable GC logging for Apache Storm workers on a number of clusters (exact version varies) as a aid to looking into topology-related memory and garbage collection problems. We want to do that for workers, but we also want to avoid two problems we know might happen:

  • 当工作人员因任何原因重新启动时覆盖日志文件
  • 日志使用过多的磁盘空间,导致磁盘被填满(如果集群运行时间足够长,除非进行管理,否则日志文件将填满磁盘)

当一个进程的 Java GC 日志记录开始时,它似乎替换具有相同名称的任何文件的内容.这意味着除非您小心,否则您将丢失 GC 日志记录,也许是在您最有可能需要它的时候.

When Java GC logging starts for a process it seems to replace the content of any file that has the same name. This means that unless you are careful, you will lose the GC logging, perhaps when you are most likely to need it.

推荐答案

您可以通过 Storm.yaml 中的 worker.childopts 属性为 Storm workers 设置 JVM 选项(如果您通过 Apache Ambari 管理 Storm,请查看 Storm 服务 >配置 > 高级风暴站点 > worker.childopts).您将为此添加额外的 JVM 属性.

You can set JVM options for Storm workers via the worker.childopts property in storm.yaml (if you are managing Storm through Apache Ambari, look under Storm service > configs > advanced storm-site > worker.childopts). You will be adding additional JVM properties to that.

要为文件启用 GC 日志记录,您需要添加 -verbose:gc -Xloggc:.

To enable GC logging to a file, you will need to add -verbose:gc -Xloggc:<log-file-location>.

您需要特别考虑日志文件名以防止覆盖.似乎每次调用都需要一个唯一的名称.要实现这一点,请利用 Storm 代码文档.对于唯一性, %WORKER-ID% 就足够了,它(很可能)对于每个工作进程都是唯一的.您可能还希望能够轻松分辨 GC 日志的拓扑结构.在这种情况下,添加 %TOPOLOGY-ID%(您可能需要说 %ID% 一些旧版本的 Storm);它可能很长,但会提供拓扑名称.

You need to give the log file name special consideration to prevent overwrites. It seems like you need to have a unique name for every invocation. To achieve this, take advantage of some of special "%" string replacements mentioned in the Storm code documentation. For uniqueness, %WORKER-ID% is sufficient it is (quite likely) unique for each worker process. You may also want to be able to easily tell what topology the GC log is for. In that case add in %TOPOLOGY-ID% (you may need to say %ID% some older versions of Storm); it may be long but will provide the name of the topology.

到目前为止,JVM 选项是 -verbose:gc -Xloggc:/var/log/storm/storm-worker-%TOPOLOGY-ID%-%WORKER-ID%-gc.log (-%TOPOLOGY-ID% 是可选的,路径应该与你的 Storm 日志目录匹配,如果你愿意,你可以为日志文件命名.

So far the JVM options are -verbose:gc -Xloggc:/var/log/storm/storm-worker-%TOPOLOGY-ID%-%WORKER-ID%-gc.log (the -%TOPOLOGY-ID% is optional, the path should match your Storm logging directory, and you can name the log file differently if you prefer).

现在开始管理磁盘空间的使用.如果有更简单的方法,我会很高兴.

Now onto managing use of disk space. I'll be happy if there is a simpler way that what I have.

首先,利用 Java 的内置 GC 日志文件轮换.-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M 是启用此轮换的示例,最多有 10 个来自 JVM 的 GC 日志文件,每个文件不超过10MB 大小.10 x 10MB 是 100MB 的最大使用量.请注意,这是每个工作实例.

First, take advantage of Java's built-in GC log file rotation. -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M is an example of enabling this rotation, having up to 10 GC log files from the JVM, each of which is no more than 10MB in size. 10 x 10MB is 100MB max usage. Note that this is per worker instance.

GC 日志文件轮换最多可包含 10 个文件,.0"、.1"、....9"将添加到您在 Xloggc 中指定的文件名中..0 将是第一个,在它达到 0.9 后它将替换 .0 并以循环方式继续.在某些 Java 版本中,'.current' 将额外放在当前正在写入的日志文件名称的末尾.

With the GC log file rotation in place with up to 10 files, '.0', '.1', ... '.9' will be added to the file name you gave in Xloggc. .0 will be first and after it reaches .9 it will replace .0 and continue on in a round robin manner. In some versions of Java '.current' will be additionally put on the end of the name of the log file currently being written to.

由于我们显然必须添加唯一的文件命名以避免覆盖,这意味着每个工作进程调用可以有 100MB,因此这不是管理 Storm worker 子 GC 日志使用的磁盘空间的完整解决方案.您最终会为每个进程获得一组最多 10 个 GC 日志文件——这可以加起来.最好的解决方案(在 *nix 下)似乎是使用 logrotate 实用程序定期清理最近N天没有修改的worker GC日志.

Due to the unique file naming we apparently have to add to avoid overwrites, this means you can have 100MB per worker process invocation, so this is not a total solution to managing disk space used by storm worker child GC logs. You will end up with a set of up to 10 GC log files for each process -- this can add up. The best solution (under *nix) to that would seem to be to use the logrotate utility to periodically clean up worker GC logs that have not been modified in the last N days.

请务必进行数学计算并确保您有足够的磁盘空间.

Be sure to do the math and make sure you will have enough disk space.

人们经常希望在其 GC 日志中获得比默认值更多的详细信息和上下文,因此请考虑添加 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps.

People frequently want more details and context in their GC logs than the default, so consider adding in -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps.

一起,您将向 worker.childopts 添加类似以下内容: -verbose:gc -Xloggc:/var/log/storm/storm-worker-%TOPOLOGY-ID%-%WORKER-ID%-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps 加上配置 logrotate.

All together, you will be adding something like the following to worker.childopts: -verbose:gc -Xloggc:/var/log/storm/storm-worker-%TOPOLOGY-ID%-%WORKER-ID%-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps plus configure logrotate.

最后,我应该提到命名日志文件的其他几个选项,尽管我没有看到优势,至少对于我的用例:

Finally, I should mention a couple other options for naming log files, though I don't see the advantage, at least for my use case:

  • 在某些版本的 Java 中,您可以将 %t 放在 GC 日志文件命名中,Java 会将其替换为当前时间戳,格式为 <YYYY>-<MM>-<DD>_<HH>--.您也可以使用 %p 来获取当前进程 ID.
  • 有人告诉我,在某些情况下,您可以在 Storm 和 Java 的某些组合中放置反引号表达式,例如`date +'%Y%m%d%H%M'`,至少如果您使用 Ambari.他报告说这适用于 Storm 0.10.0 和 Java 1.7.0_95,但我无法使用 Storm 0.9.3.2.2.0.0-2041 和 Java 1.7.0_75 获得这种行为.
  • in some versions of Java you can put %t in GC log file naming and Java will replace that with the current timestamp formatted as <YYYY>-<MM>-<DD>_<HH>-<MM>-<SS>. You can also out %p to get the current process ID.
  • Somebody told me that in some cases you can put backticked expressions such as `date +'%Y%m%d%H%M'` in some combinations of Storm and Java, at least if you use Ambari. He reported that that worked with Storm 0.10.0 and Java 1.7.0_95, but I was unable to get that behavior with storm 0.9.3.2.2.0.0-2041 and java 1.7.0_75.

这篇关于如何为 Apache Storm 工作人员启用 GC 日志记录,同时防止日志文件覆盖和限制磁盘空间使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆