将DataFrame show()的结果保存到pyspark中的字符串 [英] Saving result of DataFrame show() to string in pyspark

查看:417
本文介绍了将DataFrame show()的结果保存到pyspark中的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在pyspark中捕获 show 的结果,类似于此处.我只能使用scala找不到pyspark解决方案.

I would like to capture the result of show in pyspark, similar to here and here. I was not able to find a solution with pyspark, only scala.

df.show()
#+----+-------+
#| age|   name|
#+----+-------+
#|null|Michael|
#|  30|   Andy|
#|  19| Justin|
#+----+-------+

最终目的是将其捕获为我的logger.info中的字符串 我尝试了logger.info(df.show()),它将仅显示在控制台上.

The ultimate purpose is to capture this as string inside my logger.info I tried logger.info(df.show()) which will only display on console.

推荐答案

您可以使用链接在pyspark中捕获explain()的结果.只需检查 show() 并观察到它正在调用self._jdf.showString().

You can build a helper function using the same approach as shown in post you linked Capturing the result of explain() in pyspark. Just examine the source code for show() and observe that it is calling self._jdf.showString().

答案取决于您所使用的spark版本,因为show()的参数数量随时间而变化.

The answer depends on which version of spark you are using, as the number of arguments to show() has changed over time.

在版本2.3中,添加了vertical参数.

In version 2.3, the vertical argument was added.

def getShowString(df, n=20, truncate=True, vertical=False):
    if isinstance(truncate, bool) and truncate:
        return(df._jdf.showString(n, 20, vertical))
    else:
        return(df._jdf.showString(n, int(truncate), vertical))

Spark版本1.5到2.2

从1.5版开始,添加了truncate参数.

def getShowString(df, n=20, truncate=True):
    if isinstance(truncate, bool) and truncate:
        return(df._jdf.showString(n, 20))
    else:
        return(df._jdf.showString(n, int(truncate)))

Spark版本1.3到1.4

show函数是在1.3版中首次引入的.

Spark Versions 1.3 through 1.4

The show function was first introduced in version 1.3.

def getShowString(df, n=20):
    return(df._jdf.showString(n))


现在按如下方式使用助手功能:


Now use the helper function as follows:

x = getShowString(df)  # default arguments
print(x)
#+----+-------+
#| age|   name|
#+----+-------+
#|null|Michael|
#|  30|   Andy|
#|  19| Justin|
#+----+-------+

或者您的情况:

logger.info(getShowString(df))

这篇关于将DataFrame show()的结果保存到pyspark中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆