在Spark中将数据帧转换为json时如何打印null [英] How can I print nulls when converting a dataframe to json in Spark

查看:373
本文介绍了在Spark中将数据帧转换为json时如何打印null的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从csv读取的数据框.

I have a dataframe that I read from a csv.

CSV:
name,age,pets
Alice,23,dog
Bob,30,dog
Charlie,35,

Reading this into a DataFrame called myData:
+-------+---+----+
|   name|age|pets|
+-------+---+----+
|  Alice| 23| dog|
|    Bob| 30| dog|
|Charlie| 35|null|
+-------+---+----+

现在,我想使用myData.toJSON将此数据帧的每一行转换为json.我得到的是以下json.

Now, I want to convert each row of this dataframe to a json using myData.toJSON. What I get are the following jsons.

{"name":"Alice","age":"23","pets":"dog"}
{"name":"Bob","age":"30","pets":"dog"}
{"name":"Charlie","age":"35"}

我希望第三行的json包含null值.例

I would like the 3rd row's json to include the null value. Ex.

{"name":"Charlie","age":"35", "pets":null}

但是,这似乎是不可能的.我调试了代码,发现Spark的org.apache.spark.sql.catalyst.json.JacksonGenerator类具有以下实现

However, this doesn't seem to be possible. I debugged through the code and saw that Spark's org.apache.spark.sql.catalyst.json.JacksonGenerator class has the following implementation

  private def writeFields(
    row: InternalRow, schema: StructType, fieldWriters: 
    Seq[ValueWriter]): Unit = {
    var i = 0
    while (i < row.numFields) {
      val field = schema(i)
      if (!row.isNullAt(i)) {
        gen.writeFieldName(field.name)
        fieldWriters(i).apply(row, i)
      }
      i += 1
    }
  }

如果它为空,这似乎正在跳过一列.我不太确定为什么这是默认行为,但是有没有办法使用Spark的toJSON在json中打印空值?

This seems to be skipping a column if it is null. I am not quite sure why this is the default behavior but is there a way to print null values in json using Spark's toJSON?

我正在使用 Spark 2.1.0

推荐答案

要使用Spark的toJSON方法在JSON中显示空值,可以使用以下代码:

To print the null values in JSON using Spark's toJSON method, you can use following code:

myData.na.fill("null").toJSON

它将为您带来预期的结果:

It will give you expected result:

+-------------------------------------------+
|value                                      |
+-------------------------------------------+
|{"name":"Alice","age":"23","pets":"dog"}   |
|{"name":"Bob","age":"30","pets":"dog"}     |
|{"name":"Charlie","age":"35","pets":"null"}|
+-------------------------------------------+

希望对您有帮助!

这篇关于在Spark中将数据帧转换为json时如何打印null的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆