Spark Union因嵌套的JSON数据帧而失败 [英] Spark union fails with nested JSON dataframe

查看：109 发布时间：2020/9/4 8:11:54 scala apache-spark union spark-dataframe

本文介绍了Spark Union因嵌套的JSON数据帧而失败的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下两个JSON文件:

I have the following two JSON files:

{
    "name" : "Agent1",
    "age" : "32",
    "details" : [{
            "d1" : 1,
            "d2" : 2
        }
    ]
}

{
    "name" : "Agent2",
    "age" : "42",
    "details" : []
}

我读着火花:

val jsonDf1 = spark.read.json(pathToJson1)
val jsonDf2 = spark.read.json(pathToJson2)

使用以下模式创建两个数据框:

two dataframes are created with the following schemas:

root
 |-- age: string (nullable = true)
 |-- details: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- d1: long (nullable = true)
 |    |    |-- d2: long (nullable = true)
 |-- name: string (nullable = true)

root
|-- age: string (nullable = true)
|-- details: array (nullable = true)
|    |-- element: string (containsNull = true)
|-- name: string (nullable = true)

当我尝试对这两个数据帧执行联合时，出现此错误:

When I try to perform a union with these two dataframes I get this error:

jsonDf1.union(jsonDf2)


org.apache.spark.sql.AnalysisException: unresolved operator 'Union;;
'Union
:- LogicalRDD [age#0, details#1, name#2]
+- LogicalRDD [age#7, details#8, name#9]

我该如何解决?有时，在火花作业将加载的JSON文件中，我会得到空数组，但仍然必须将它们统一，这应该没问题，因为Json文件的架构是相同的.

How can I resolve this? I will get empty arrays sometimes in the JSON files the spark job will load, but it will still have to unify them, which shouldn't be a problem since the schema of the Json files is the same.

推荐答案

polomarcus的回答使我想到了以下解决方案: 我无法一次读取所有文件，因为我得到了文件列表作为输入，并且spark没有用于接收路径列表的API，但是显然使用Scala可以做到这一点:

polomarcus's answer led me to this solution: I couldn't read all the files at once because I got a list of files as input, and spark didn't have an API that receives a list of paths, but apparently with Scala it's possible to do this:

val files = List("path1", "path2", "path3")
val dataframe = spark.read.json(files: _*)

这样，我得到了一个包含所有三个文件的数据框.

This way I got one dataframe containing all three files.

这篇关于Spark Union因嵌套的JSON数据帧而失败的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark Union因嵌套的JSON数据帧而失败 [英] Spark union fails with nested JSON dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark Union因嵌套的JSON数据帧而失败 [英] Spark union fails with nested JSON dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭