如何解决 com.mongodb.spark.exceptions.MongoTypeConversionException:无法转换... Java Spark [英] How to resolve com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast... Java Spark
问题描述
您好,我是 Java Spark 的新手,几天来一直在寻找解决方案.
Hi I am new to Java Spark, and have been looking for solutions for couple of days.
我正在将 MongoDB 数据加载到 hive 表中,但是,当 saveAsTable 发生此错误时,我发现了一些错误
I am working on loading MongoDB data into hive table, however, I found some error while saveAsTable that occurs this error
com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast STRING into a StructType(StructField(oid,StringType,true)) (value: BsonString{value='54d3e8aeda556106feba7fa2'})
我试过增加 sampleSize、不同的 mongo-spark-connector 版本,...但没有可行的解决方案.
I've tried increase the sampleSize, different mongo-spark-connector versions, ... but non of working solutions.
我无法弄清楚根本原因是什么以及需要完成的工作之间有哪些差距?
I can't figure out what is the root cause and what are the gaps in between that needs to be done?
最令人困惑的部分是我有使用相同流程的相似数据集,没有问题.
The most confusing part is I have similar sets of data using the same flow without issue.
mongodb 数据模式就像嵌套结构和数组
the mongodb data schema is like nested struct and array
root
|-- sample: struct (nullable = true)
| |-- parent: struct (nullable = true)
| | |-- expanded: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- distance: integer (nullable = true)
| | | | |-- id: struct (nullable = true)
| | | | | |-- oid: string (nullable = true)
| | | | |-- keys: array (nullable = true)
| | | | | |-- element: string (containsNull = true)
| | | | |-- name: string (nullable = true)
| | | | |-- parent_id: array (nullable = true)
| | | | | |-- element: struct (containsNull = true)
| | | | | | |-- oid: string (nullable = true)
| | | | |-- type: string (nullable = true)
| | |-- id: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- oid: string (nullable = true)
样本数据
"sample": {
"expanded": [
{
"distance": 0,
"type": "domain",
"id": "54d3e17b5cf737074d4065b0",
"parent_id": [
"54d3e1775cf737074d406599"
],
"name": "level2"
},
{
"distance": 1,
"type": "domain",
"id": "54d3e1775cf737074d406599",
"name": "level1"
}
],
"id": [
"54d3e17b5cf737074d4065b0"
]
}
示例代码
public static void main(final String[] args) throws InterruptedException {
// spark session read mongodb
SparkSession mongo_spark = SparkSession.builder()
.master("local")
.appName("MongoSparkConnectorIntro")
.config("mongo_spark.master", "local")
.config("spark.mongodb.input.uri", "mongodb://localhost:27017/test_db.test_collection")
.enableHiveSupport()
.getOrCreate();
// Create a JavaSparkContext using the SparkSession's SparkContext object
JavaSparkContext jsc = new JavaSparkContext(mongo_spark.sparkContext());
// Load data and infer schema, disregard toDF() name as it returns Dataset
Dataset<Row> implicitDS = MongoSpark.load(jsc).toDF();
implicitDS.printSchema();
implicitDS.show();
// createOrReplaceTempView to see if the data being read
// implicitDS.createOrReplaceTempView("my_table");
// implicitDS.printSchema();
// implicitDS.show();
// saveAsTable
implicitDS.write().saveAsTable("my_table");
mongo_spark.sql("SELECT * FROM my_table limit 1").show();
mongo_spark.stop();
}
如果有人有一些想法,我将非常感激.谢谢
If anyone have some thoughts I would be very much appreciate. Thanks
推荐答案
随着我适当增加样本量,这个问题不再存在.
As I increase the sample size properly, this problem doesn't exist anymore.
如何配置 Java Spark sparksession 样本大小
这篇关于如何解决 com.mongodb.spark.exceptions.MongoTypeConversionException:无法转换... Java Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!