如何获取DataFrame的字符串表示形式(以及Dataset.show一样)? [英] How to get a string representation of DataFrame (as does Dataset.show)?

查看:44
本文介绍了如何获取DataFrame的字符串表示形式(以及Dataset.show一样)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个Spark数据框的有用的字符串表示形式.我通过 df.show 获得的代码很棒-但是我无法将输出作为字符串获得,因为 show 调用的内部 showString 方法code>是私有的.是否可以通过某种方式获得相似的输出而无需编写重复相同功能的方法?

I need a useful string representation of a Spark dataframe. The one I get with df.show is great -- but I can't get that output as a string because the internal showString method called by show is private. Is there some way I can get a similar output without writing a method to duplicate this same functionality?

推荐答案

showString 只是

showString is simply private[sql] that means that the code to access it has to be in the same package, i.e. org.apache.spark.sql.

诀窍是创建一个确实属于 org.apache.spark.sql 包的辅助对象,但是我们要创建的唯一方法不是 private (在任何级别).

The trick is to create a helper object that does belong to the org.apache.spark.sql package, but the single method we're about to create is not private (at any level).

我通常会模仿一个实例方法以第一个输入参数作为目标,以及与目标方法匹配的输入参数的作用.

I usually mimic what an instance method does with the very first input parameter as the target and the input parameters to match the target method.

package org.apache.spark.sql
object AccessShowString {
  def showString[T](df: Dataset[T],
      _numRows: Int, truncate: Int = 20, vertical: Boolean = false): String = {
    df.showString(_numRows, truncate, vertical)
  }
}

提示.使用 paste -raw 将代码复制并粘贴到 spark-shell 中.

TIP Use paste -raw to copy and paste the code in spark-shell.

然后使用 showString .

import org.apache.spark.sql.AccessShowString.showString
val df = spark.range(10)
scala> println(showString(df, 10))
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+

这篇关于如何获取DataFrame的字符串表示形式(以及Dataset.show一样)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆