Spark Dataframe 以错误的格式保存到 MongoDB [英] Spark Dataframe is saved to MongoDB in wrong format

查看:34
本文介绍了Spark Dataframe 以错误的格式保存到 MongoDB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Spark-MongoDB我正在尝试将 DataFrame 保存到 MongoDB 中:

I am using Spark-MongoDB and I am trying to save a DataFrame into MongoDB :

val event = """{"Dev":[{"a":3},{"b":3}],"hr":[{"a":6}]}"""
val events = sc.parallelize(event :: Nil)
val df = sqlc.read.json(events)
val saveConfig = MongodbConfigBuilder(Map(Host -> List("localhost:27017"),
 Database -> "test", Collection -> "test", SamplingRatio -> 1.0, WriteConcern -> "normal",
 SplitSize -> 8, SplitKey -> "_id"))
df.saveToMongodb(saveConfig.build)

我希望将数据保存为输入字符串,但实际保存的是:

I'm expecting the data to be saved as the input string, but what is actually saved is:

{ "_id" : ObjectId("57cedf4bd244c56e8e783a45"), "Dev" : [ { "a" : NumberLong(3), "b" : null }, { "a" : null, "b" : NumberLong(3) } ], "hr" : [ { "a" : NumberLong(6) } ] }

{ "_id" : ObjectId("57cedf4bd244c56e8e783a45"), "Dev" : [ { "a" : NumberLong(3), "b" : null }, { "a" : null, "b" : NumberLong(3) } ], "hr" : [ { "a" : NumberLong(6) } ] }

我想避免那些空值和重复项,知道吗?

I want to avoid those null values and duplicates, Any idea?

推荐答案

您是否尝试过使用反斜杠定义如下事件:

Have you tried event defined as below using backslash:

val event = "{\"Dev\":[{\"a\":3},{\"b\":3}],\"hr\":[{\"a\":6}]}"

这篇关于Spark Dataframe 以错误的格式保存到 MongoDB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆