在spark中写入JSON时保留具有空值的键 [英] Retain keys with null values while writing JSON in spark

查看：28 发布时间：2021/11/14 21:24:57 java json apache-spark apache-spark-sql

本文介绍了在spark中写入JSON时保留具有空值的键的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 spark 编写 JSON 文件.有一些键的值是 null.这些在 DataSet 中显示得很好，但是当我写入文件时，键被丢弃了.我如何确保保留它们?

I am trying to write a JSON file using spark. There are some keys that have null as value. These show up just fine in the DataSet, but when I write the file, the keys get dropped. How do I ensure they are retained?

写入文件的代码:

ddp.coalesce(20).write().mode("overwrite").json("hdfs://localhost:9000/user/dedupe_employee");

来自源的部分 JSON 数据:

part of JSON data from source:

"event_header": {
        "accept_language": null,
        "app_id": "App_ID",
        "app_name": null,
        "client_ip_address": "IP",
        "event_id": "ID",
        "event_timestamp": null,
        "offering_id": "Offering",
        "server_ip_address": "IP",
        "server_timestamp": 1492565987565,
        "topic_name": "Topic",
        "version": "1.0"
    }

输出:

"event_header": {
        "app_id": "App_ID",
        "client_ip_address": "IP",
        "event_id": "ID",
        "offering_id": "Offering",
        "server_ip_address": "IP",
        "server_timestamp": 1492565987565,
        "topic_name": "Topic",
        "version": "1.0"
    }

在上面的示例中，accept_language、app_name 和 event_timestamp 已被删除.

In the above example keys accept_language, app_name and event_timestamp have been dropped.

推荐答案

显然，spark 没有提供任何处理空值的选项.因此，以下自定义解决方案应该可行.

Apparently, spark does not provide any option to handle nulls. So following custom solution should work.

import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
import com.fasterxml.jackson.databind.ObjectMapper

case class EventHeader(accept_language:String,app_id:String,app_name:String,client_ip_address:String,event_id: String,event_timestamp:String,offering_id:String,server_ip_address:String,server_timestamp:Long,topic_name:String,version:String)

val ds = Seq(EventHeader(null,"App_ID",null,"IP","ID",null,"Offering","IP",1492565987565L,"Topic","1.0")).toDS()

val ds1 = ds.mapPartitions(records => {
val mapper = new ObjectMapper with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
records.map(mapper.writeValueAsString(_))
})

ds1.coalesce(1).write.text("hdfs://localhost:9000/user/dedupe_employee")

这将产生如下输出:

{"accept_language":null,"app_id":"App_ID","app_name":null,"client_ip_address":"IP","event_id":"ID","event_timestamp":null,"offering_id":"Offering","server_ip_address":"IP","server_timestamp":1492565987565,"topic_name":"Topic","version":"1.0"}

这篇关于在spark中写入JSON时保留具有空值的键的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在spark中写入JSON时保留具有空值的键 [英] Retain keys with null values while writing JSON in spark

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

在spark中写入JSON时保留具有空值的键 [英] Retain keys with null values while writing JSON in spark

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭