在Spark中将数据帧转换为json时如何打印null [英] How can I print nulls when converting a dataframe to json in Spark
问题描述
我有一个从csv读取的数据框.
I have a dataframe that I read from a csv.
CSV:
name,age,pets
Alice,23,dog
Bob,30,dog
Charlie,35,
Reading this into a DataFrame called myData:
+-------+---+----+
| name|age|pets|
+-------+---+----+
| Alice| 23| dog|
| Bob| 30| dog|
|Charlie| 35|null|
+-------+---+----+
现在,我想使用myData.toJSON
将此数据帧的每一行转换为json.我得到的是以下json.
Now, I want to convert each row of this dataframe to a json using myData.toJSON
. What I get are the following jsons.
{"name":"Alice","age":"23","pets":"dog"}
{"name":"Bob","age":"30","pets":"dog"}
{"name":"Charlie","age":"35"}
我希望第三行的json包含null值.例
I would like the 3rd row's json to include the null value. Ex.
{"name":"Charlie","age":"35", "pets":null}
但是,这似乎是不可能的.我调试了代码,发现Spark的org.apache.spark.sql.catalyst.json.JacksonGenerator
类具有以下实现
However, this doesn't seem to be possible. I debugged through the code and saw that Spark's org.apache.spark.sql.catalyst.json.JacksonGenerator
class has the following implementation
private def writeFields(
row: InternalRow, schema: StructType, fieldWriters:
Seq[ValueWriter]): Unit = {
var i = 0
while (i < row.numFields) {
val field = schema(i)
if (!row.isNullAt(i)) {
gen.writeFieldName(field.name)
fieldWriters(i).apply(row, i)
}
i += 1
}
}
如果它为空,这似乎正在跳过一列.我不太确定为什么这是默认行为,但是有没有办法使用Spark的toJSON
在json中打印空值?
This seems to be skipping a column if it is null. I am not quite sure why this is the default behavior but is there a way to print null values in json using Spark's toJSON
?
我正在使用 Spark 2.1.0
推荐答案
要使用Spark的toJSON
方法在JSON中显示空值,可以使用以下代码:
To print the null values in JSON using Spark's toJSON
method, you can use following code:
myData.na.fill("null").toJSON
它将为您带来预期的结果:
It will give you expected result:
+-------------------------------------------+
|value |
+-------------------------------------------+
|{"name":"Alice","age":"23","pets":"dog"} |
|{"name":"Bob","age":"30","pets":"dog"} |
|{"name":"Charlie","age":"35","pets":"null"}|
+-------------------------------------------+
希望对您有帮助!
这篇关于在Spark中将数据帧转换为json时如何打印null的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!