在 Spark 中将数据帧转换为 json 时如何打印空值 [英] How can I print nulls when converting a dataframe to json in Spark
问题描述
我有一个从 csv 读取的数据框.
I have a dataframe that I read from a csv.
CSV:
name,age,pets
Alice,23,dog
Bob,30,dog
Charlie,35,
Reading this into a DataFrame called myData:
+-------+---+----+
| name|age|pets|
+-------+---+----+
| Alice| 23| dog|
| Bob| 30| dog|
|Charlie| 35|null|
+-------+---+----+
现在,我想使用 myData.toJSON
将此数据帧的每一行转换为 json.我得到的是以下jsons.
Now, I want to convert each row of this dataframe to a json using myData.toJSON
. What I get are the following jsons.
{"name":"Alice","age":"23","pets":"dog"}
{"name":"Bob","age":"30","pets":"dog"}
{"name":"Charlie","age":"35"}
我希望第三行的 json 包含空值.例如
I would like the 3rd row's json to include the null value. Ex.
{"name":"Charlie","age":"35", "pets":null}
然而,这似乎是不可能的.我通过代码调试,看到Spark的org.apache.spark.sql.catalyst.json.JacksonGenerator
类有如下实现
However, this doesn't seem to be possible. I debugged through the code and saw that Spark's org.apache.spark.sql.catalyst.json.JacksonGenerator
class has the following implementation
private def writeFields(
row: InternalRow, schema: StructType, fieldWriters:
Seq[ValueWriter]): Unit = {
var i = 0
while (i < row.numFields) {
val field = schema(i)
if (!row.isNullAt(i)) {
gen.writeFieldName(field.name)
fieldWriters(i).apply(row, i)
}
i += 1
}
}
这似乎是跳过一列,如果它是空的.我不太确定为什么这是默认行为,但是有没有办法使用 Spark 的 toJSON
在 json 中打印空值?
This seems to be skipping a column if it is null. I am not quite sure why this is the default behavior but is there a way to print null values in json using Spark's toJSON
?
我使用的是 Spark 2.1.0
推荐答案
要使用 Spark 的 toJSON
方法打印 JSON 中的空值,可以使用以下代码:
To print the null values in JSON using Spark's toJSON
method, you can use following code:
myData.na.fill("null").toJSON
它会给你预期的结果:
+-------------------------------------------+
|value |
+-------------------------------------------+
|{"name":"Alice","age":"23","pets":"dog"} |
|{"name":"Bob","age":"30","pets":"dog"} |
|{"name":"Charlie","age":"35","pets":"null"}|
+-------------------------------------------+
希望能帮到你!
这篇关于在 Spark 中将数据帧转换为 json 时如何打印空值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!