如何从我的Python脚本星火登录 [英] How do I log from my Python Spark script

查看:200
本文介绍了如何从我的Python脚本星火登录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有我与运行火花提交一个Python星火计划项目。我希望把记录语句在里面。

I have a Python Spark program which I run with spark-submit. I want to put logging statements in it.

logging.info("This is an informative message.")
logging.debug("This is a debug message.")

我想使用火花使用使得日志消息相同的格式出来的电平由同样的配置文件控制相同的记录器。我该怎么做呢?

I want to use the same logger that Spark is using so that the log messages come out in the same format and the level is controlled by the same configuration files. How do I do this?

我试过把登录语句在code,并用 logging.getLogger()。在这两种情况下,我看到星火的日志信息,但不是我的。我一直在看 Python记录文档,但一直没能从那里找到答案。

I've tried putting the logging statements in the code and starting out with a logging.getLogger(). In both cases I see Spark's log messages but not mine. I've been looking at the Python logging documentation, but haven't been able to figure it out from there.

不知道这是一些具体的事情提交给星火脚本或只是我不明白怎么记录的作品。

Not sure if this is something specific to scripts submitted to Spark or just me not understanding how logging works.

推荐答案

您需要获得火花本身的记录,默认情况下getLogger()为您自己的模块将返回记录器。尝试是这样的:

You need to get the logger for spark itself, by default getLogger() will return the logger for you own module. Try something like:

logger = logging.getLogger('py4j')
logger.info("My test info statement")

这也可能是不是py4j'pyspark。

It might also be 'pyspark' instead of 'py4j'.

在情况下,你在你的星火计划使用(并做一些记录)在同一模块为主要功能的定义,它会给一些序列化错误的功能。

In case the function that you use in your spark program (and which does some logging) is defined in the same module as the main function it will give some serialization error.

这是解释<一个href=\"http://apache-spark-user-list.1001560.n3.nabble.com/using-Log4j-to-log-INFO-level-messages-on-workers-td6746.html\">here由同一人给出一个例子这里

This is explained here and an example by the same person is given here

我还测试了这款上火花1.3.1

I also tested this on spark 1.3.1

编辑:

要改变从STDERR登录到stdout,你将不得不删除当前StreamHandler中,添加一个新的。

To change logging from STDERR to STDOUT you will have to remove the current StreamHandler and add a new one.

查找现有的流处理程序(完成后,该行可以去掉)

Find the existing Stream Handler (This line can be removed when finished)

print(logger.handlers)
# will look like [<logging.StreamHandler object at 0x7fd8f4b00208>]

有可能只是一个单一的,但如果不是你将有更新的位置。

There will probably only be a single one, but if not you will have to update position.

logger.removeHandler(logger.handlers[0])

添加新的处理程序为sys.stderr

Add new handler for sys.stdout

import sys # Put at top if not already there
sh = logging.StreamHandler(sys.stdout)
sh.setLevel(logging.DEBUG)
logger.addHandler(sh)

这篇关于如何从我的Python脚本星火登录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆