有没有更好的方法来显示整个 Spark SQL DataFrame? [英] Is there better way to display entire Spark SQL DataFrame?

查看:57
本文介绍了有没有更好的方法来显示整个 Spark SQL DataFrame?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 Scala API 显示整个 Apache Spark SQL DataFrame.我可以使用 show() 方法:

I would like to display the entire Apache Spark SQL DataFrame with the Scala API. I can use the show() method:

myDataFrame.show(Int.MaxValue)

是否有比使用 Int.MaxValue 更好的方式来显示整个 DataFrame?

Is there a better way to display an entire DataFrame than using Int.MaxValue?

推荐答案

通常不建议将整个 DataFrame 显示到 stdout,因为这意味着您需要将整个 DataFrame(其所有值)拉到驱动程序(除非 DataFrame 已经是本地的,您可以使用 df.isLocal 进行检查).

It is generally not advisable to display an entire DataFrame to stdout, because that means you need to pull the entire DataFrame (all of its values) to the driver (unless DataFrame is already local, which you can check with df.isLocal).

除非您提前知道数据集的大小足够小,以便驱动程序 JVM 进程有足够的可用内存来容纳所有值,否则这样做是不安全的.这就是为什么 DataFrame API 的 show() 默认只显示前 20 行.

Unless you know ahead of time that the size of your dataset is sufficiently small so that driver JVM process has enough memory available to accommodate all values, it is not safe to do this. That's why DataFrame API's show() by default shows you only the first 20 rows.

您可以使用返回 Array[T]df.collect 然后遍历每一行并打印它:

You could use the df.collect which returns Array[T] and then iterate over each line and print it:

df.collect.foreach(println)

但是您丢失了在 df.showString(numRows: Int) 中实现的所有格式(show() 内部使用).

but you lose all formatting implemented in df.showString(numRows: Int) (that show() internally uses).

所以不,我想没有更好的方法了.

So no, I guess there is no better way.

这篇关于有没有更好的方法来显示整个 Spark SQL DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆