在 Spark 中将数据帧转换为 json 时如何打印空值 [英] How can I print nulls when converting a dataframe to json in Spark

查看:38
本文介绍了在 Spark 中将数据帧转换为 json 时如何打印空值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从 csv 读取的数据框.

I have a dataframe that I read from a csv.

CSV:
name,age,pets
Alice,23,dog
Bob,30,dog
Charlie,35,

Reading this into a DataFrame called myData:
+-------+---+----+
|   name|age|pets|
+-------+---+----+
|  Alice| 23| dog|
|    Bob| 30| dog|
|Charlie| 35|null|
+-------+---+----+

现在,我想使用 myData.toJSON 将此数据帧的每一行转换为 json.我得到的是以下jsons.

Now, I want to convert each row of this dataframe to a json using myData.toJSON. What I get are the following jsons.

{"name":"Alice","age":"23","pets":"dog"}
{"name":"Bob","age":"30","pets":"dog"}
{"name":"Charlie","age":"35"}

我希望第三行的 json 包含空值.例如

I would like the 3rd row's json to include the null value. Ex.

{"name":"Charlie","age":"35", "pets":null}

然而,这似乎是不可能的.我通过代码调试,看到Spark的org.apache.spark.sql.catalyst.json.JacksonGenerator类有如下实现

However, this doesn't seem to be possible. I debugged through the code and saw that Spark's org.apache.spark.sql.catalyst.json.JacksonGenerator class has the following implementation

  private def writeFields(
    row: InternalRow, schema: StructType, fieldWriters: 
    Seq[ValueWriter]): Unit = {
    var i = 0
    while (i < row.numFields) {
      val field = schema(i)
      if (!row.isNullAt(i)) {
        gen.writeFieldName(field.name)
        fieldWriters(i).apply(row, i)
      }
      i += 1
    }
  }

这似乎是跳过一列,如果它是空的.我不太确定为什么这是默认行为,但是有没有办法使用 Spark 的 toJSON 在 json 中打印空值?

This seems to be skipping a column if it is null. I am not quite sure why this is the default behavior but is there a way to print null values in json using Spark's toJSON?

我使用的是 Spark 2.1.0

推荐答案

要使用 Spark 的 toJSON 方法打印 JSON 中的空值,可以使用以下代码:

To print the null values in JSON using Spark's toJSON method, you can use following code:

myData.na.fill("null").toJSON

它会给你预期的结果:

+-------------------------------------------+
|value                                      |
+-------------------------------------------+
|{"name":"Alice","age":"23","pets":"dog"}   |
|{"name":"Bob","age":"30","pets":"dog"}     |
|{"name":"Charlie","age":"35","pets":"null"}|
+-------------------------------------------+

希望能帮到你!

这篇关于在 Spark 中将数据帧转换为 json 时如何打印空值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆