如何在Spark中使用from_json（）数据帧？ [英] How do I use a from_json() dataframe in Spark?

查看：1219 发布时间：2020/10/16 20:06:08 scala apache-spark databricks

本文介绍了如何在Spark中使用from_json（）数据帧？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试根据Databricks 3.5（Spark 2.2.1）中数据帧内的json-string创建数据集。在 jsonSchema下面的代码块中是一个StructType，它具有json-string正确的布局，该布局位于数据框的 body列中。

I'm trying to create a dataset from a json-string within a dataframe in Databricks 3.5 (Spark 2.2.1). In the code block below 'jsonSchema' is a StructType with the correct layout for the json-string which is in the 'body' column of the dataframe.

val newDF = oldDF.select(from_json($"body".cast("string"), jsonSchema))

这将返回根对象所在的数据帧

This returns a dataframe where the root object is

jsontostructs(CAST(body AS STRING)):struct

后跟架构中的字段（看起来正确）。当我在newDF上尝试另一个选择时

followed by the fields in the schema (looks correct). When I try another select on the newDF

val transform = newDF.select($"propertyNameInTheParsedJsonObject")

它会引发异常

org.apache.spark.sql.AnalysisException: cannot resolve '`columnName`' given 
input columns: [jsontostructs(CAST(body AS STRING))];;

我显然错过了一些东西。我希望from_json将返回一个可以进一步操作的数据框。

I'm aparently missing something. I hoped from_json would return a dataframe I could manipulate further.

我的最终目标是将oldDF正文列中的json-string转换为数据集。

My ultimate objective is to cast the json-string within the oldDF body-column to a dataset.

推荐答案

from_json 返回结构或（ array< struct< ...> c）列。这意味着它是一个嵌套对象。如果您提供了有意义的名称：


from_json returns a struct or (array<struct<...>>) column. It means it is a nested object. If you've provided a meaningful name:
val newDF = oldDF.select(from_json($"body".cast("string"), jsonSchema) as "parsed")

，该模式描述了普通的 struct 您可以使用标准方法，例如
and the schema describes a plain struct you could use standard methods like
newDF.select($"parsed.propertyNameInTheParsedJsonObject")

否则，请按照说明访问数组。
otherwise please follow the instructions for accessing arrays.

                        这篇关于如何在Spark中使用from_json（）数据帧？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何在Spark中使用from_json（）数据帧？ [英] How do I use a from_json() dataframe in Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在Spark中使用from_json（）数据帧？ [英] How do I use a from_json() dataframe in Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭