从Dataproc星火工作在谷歌云的日志输出 [英] Output from Dataproc Spark job in Google Cloud Logging

查看:201
本文介绍了从Dataproc星火工作在谷歌云的日志输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法让从Dataproc星火作业发送到谷歌云日志输出? 正如Dataproc文档从招聘司机(主对于火花的作业)的输出解释Dataproc下提供 - >在控制台作业。还有我想有云记录日志以及两个原因:

Is there a way to have the output from Dataproc Spark jobs sent to Google Cloud logging? As explained in the Dataproc docs the output from the job driver (the master for a Spark job) is available under Dataproc->Jobs in the console. There are two reasons I would like to have the logs in Cloud Logging as well:


  1. 我想看到从执行者的日志。通常情况下,主日志会说:遗嘱执行人丢失,没有进一步的细节,这将是非常有用的有关执行者是最多的更多信息。

  2. 云日志具有良好的过滤和搜索

目前从Dataproc输出只有在云记录显示出来是从纱线之间,nodemanager- *和集装箱_ *。标准错误日志项。从我的应用程序code输出显示在Dataproc->工作,但不是在Cloud纪录,而且它只能从星火主输出,而不是执行者。

Currently the only output from Dataproc that shows up in Cloud Logging is log items from yarn-yarn-nodemanager-* and container_*.stderr. Output from my application code is shown in Dataproc->Jobs but not in Cloud Logging, and it's only the output from the Spark master, not the executors.

推荐答案

TL;博士

这本身并不支持现在,但将在云Dataproc 的未来版本将支持原生。这就是说,有一个在临时手动解决方法。

This is not natively supported now but will be natively supported in a future version of Cloud Dataproc. That said, there is a manual workaround in the interim.

解决方法

云Dataproc群集使用 fluentd 来收集和转发日志Cloud纪录。 fluentd的配置就是为什么你看到一些转发日志,而不是别人。因此,简单的解决方法(直到云Dataproc有云测井作业细节支持)是修改flientd配置。一个集群上fluentd配置文件是:

Cloud Dataproc clusters use fluentd to collect and forward logs to Cloud Logging. The configuration of fluentd is why you see some logs forwarded and not others. Therefore, the simple workaround (until Cloud Dataproc has support for job details in Cloud Logging) is to modify the flientd configuration. The configuration file for fluentd on a cluster is at:

/etc/google-fluentd/google-fluentd.conf

有两件事情,以收集更多的细节,这将是最简单的:

There are two things to gather additional details which will be easiest:


  1. 添加新的 fluentd插件根据您的需求

  2. 添加一个新的文件,以收集现有的文件列表(行 56 有我的群集上的文件)

  1. Add a new fluentd plugin based on your needs
  2. Add a new file to the list of existing files collected (line 56 has the files on my cluster)

在编辑配置,则需要重新启动谷歌fluentd 服务:

Once you edit the configuration, you'll need to restart the google-fluentd service:

/etc/init.d/google-fluentd重启

最后,根据你的需要,你可能会或可能不会需要做到这一点的所有节点集群上。根据你的使用情况,这听起来像你很可能只是改变你的主节点,并进行设置。

Finally, depending on your needs, you may or may not need to do this across all nodes on your cluster. Based on your use case, it sounds like you could probably just change your master node and be set.

这篇关于从Dataproc星火工作在谷歌云的日志输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆