有没有更好的方法来显示整个 Spark SQL DataFrame? [英] Is there better way to display entire Spark SQL DataFrame?
问题描述
我想使用 Scala API 显示整个 Apache Spark SQL DataFrame.我可以使用 show()
方法:
I would like to display the entire Apache Spark SQL DataFrame with the Scala API. I can use the show()
method:
myDataFrame.show(Int.MaxValue)
是否有比使用 Int.MaxValue
更好的方式来显示整个 DataFrame?
Is there a better way to display an entire DataFrame than using Int.MaxValue
?
推荐答案
通常不建议将整个 DataFrame 显示到 stdout,因为这意味着您需要将整个 DataFrame(其所有值)拉到驱动程序(除非 DataFrame
已经是本地的,您可以使用 df.isLocal
进行检查).
It is generally not advisable to display an entire DataFrame to stdout, because that means you need to pull the entire DataFrame (all of its values) to the driver (unless DataFrame
is already local, which you can check with df.isLocal
).
除非您提前知道数据集的大小足够小,以便驱动程序 JVM 进程有足够的可用内存来容纳所有值,否则这样做是不安全的.这就是为什么 DataFrame API 的 show()
默认只显示前 20 行.
Unless you know ahead of time that the size of your dataset is sufficiently small so that driver JVM process has enough memory available to accommodate all values, it is not safe to do this. That's why DataFrame API's show()
by default shows you only the first 20 rows.
您可以使用返回 Array[T]
的 df.collect
然后遍历每一行并打印它:
You could use the df.collect
which returns Array[T]
and then iterate over each line and print it:
df.collect.foreach(println)
但是您丢失了在 df.showString(numRows: Int)
中实现的所有格式(show()
内部使用).
but you lose all formatting implemented in df.showString(numRows: Int)
(that show()
internally uses).
所以不,我想没有更好的方法了.
So no, I guess there is no better way.
这篇关于有没有更好的方法来显示整个 Spark SQL DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!