如何获取DataFrame的字符串表示形式(以及Dataset.show一样)? [英] How to get a string representation of DataFrame (as does Dataset.show)?
问题描述
我需要一个Spark数据框的有用的字符串表示形式.我通过 df.show
获得的代码很棒-但是我无法将输出作为字符串获得,因为 show 调用的内部
showString
方法code>是私有的.是否可以通过某种方式获得相似的输出而无需编写重复相同功能的方法?
I need a useful string representation of a Spark dataframe. The one I get with df.show
is great -- but I can't get that output as a string because the internal showString
method called by show
is private. Is there some way I can get a similar output without writing a method to duplicate this same functionality?
推荐答案
showString
is simply private[sql] that means that the code to access it has to be in the same package, i.e. org.apache.spark.sql
.
诀窍是创建一个确实属于 org.apache.spark.sql
包的辅助对象,但是我们要创建的唯一方法不是 private 代码>(在任何级别).
The trick is to create a helper object that does belong to the org.apache.spark.sql
package, but the single method we're about to create is not private
(at any level).
我通常会模仿一个实例方法以第一个输入参数作为目标,以及与目标方法匹配的输入参数的作用.
I usually mimic what an instance method does with the very first input parameter as the target and the input parameters to match the target method.
package org.apache.spark.sql
object AccessShowString {
def showString[T](df: Dataset[T],
_numRows: Int, truncate: Int = 20, vertical: Boolean = false): String = {
df.showString(_numRows, truncate, vertical)
}
}
提示.使用 paste -raw
将代码复制并粘贴到 spark-shell
中.
TIP Use paste -raw
to copy and paste the code in spark-shell
.
然后使用 showString
.
import org.apache.spark.sql.AccessShowString.showString
val df = spark.range(10)
scala> println(showString(df, 10))
+---+
| id|
+---+
| 0|
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
| 7|
| 8|
| 9|
+---+
这篇关于如何获取DataFrame的字符串表示形式(以及Dataset.show一样)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!