如何获取 DataFrame 的字符串表示(如 Dataset.show)? [英] How to get a string representation of DataFrame (as does Dataset.show)?
问题描述
我需要一个有用的 Spark 数据帧字符串表示形式.我用 df.show
得到的那个很棒——但我不能把输出作为字符串,因为 show
调用了内部的 showString
方法代码> 是私有的.有什么方法可以在不编写复制相同功能的方法的情况下获得类似的输出吗?
I need a useful string representation of a Spark dataframe. The one I get with df.show
is great -- but I can't get that output as a string because the internal showString
method called by show
is private. Is there some way I can get a similar output without writing a method to duplicate this same functionality?
推荐答案
showString
就是 private[sql] 这意味着访问它的代码必须在同一个包中,即org.apache.spark.sql
.
showString
is simply private[sql] that means that the code to access it has to be in the same package, i.e. org.apache.spark.sql
.
诀窍是创建一个属于 org.apache.spark.sql
包的辅助对象,但我们将要创建的单个方法不是 private
代码>(在任何级别).
The trick is to create a helper object that does belong to the org.apache.spark.sql
package, but the single method we're about to create is not private
(at any level).
我通常模仿实例方法的作用,将第一个输入参数作为目标,输入参数与目标方法相匹配.
I usually mimic what an instance method does with the very first input parameter as the target and the input parameters to match the target method.
package org.apache.spark.sql
object AccessShowString {
def showString[T](df: Dataset[T],
_numRows: Int, truncate: Int = 20, vertical: Boolean = false): String = {
df.showString(_numRows, truncate, vertical)
}
}
提示 使用 paste -raw
将代码复制并粘贴到 spark-shell
中.
TIP Use paste -raw
to copy and paste the code in spark-shell
.
那么让我们使用 showString
.
import org.apache.spark.sql.AccessShowString.showString
val df = spark.range(10)
scala> println(showString(df, 10))
+---+
| id|
+---+
| 0|
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
| 7|
| 8|
| 9|
+---+
这篇关于如何获取 DataFrame 的字符串表示(如 Dataset.show)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!