如何在 spark-sql 查询中引用地图列? [英] How to refer a map column in a spark-sql query?

查看:34
本文介绍了如何在 spark-sql 查询中引用地图列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

scala>val map1 = spark.sql("select map('p1', 's1', 'p2', 's2')")

map1: org.apache.spark.sql.DataFrame = [map(p1, s1, p2, s2): map]

scala>map1.show()+--------------------+|地图(p1,s1,p2,s2)|+--------------------+|[p1 ->s1,p2 ->s2]|+--------------------+标度>spark.sql("选择 element_at(map1, 'p1')")

<块引用>

org.apache.spark.sql.AnalysisException: 无法解析给定的map1"输入列:[];第 1 行 pos 18;'项目[unresolvedalias('element_at('map1, p1), None)]

我们如何在第二个 sql 查询中重用数据框 map1?

解决方案

map1 是一个具有单列 map 类型的数据框.该列的名称为 map(p1, s1, p2, s2).例如,可以使用 selectExpr:

map1.selectExpr("element_at(`map(p1, s1, p2, s2)`, 'p1')").show()

印刷品

+-----------------------------------+|element_at(map(p1, s1, p2, s2), p1)|+------------------------------------+|s1|+------------------------------------+

另一种选择是将数据框注册为临时视图,然后使用 sql 查询:

map1.createOrReplaceTempView("map1")spark.sql("select element_at(`map(p1, s1, p2, s2)`, 'p1') from map1").show()

打印相同的结果.

scala> val map1 = spark.sql("select map('p1', 's1', 'p2', 's2')")

map1: org.apache.spark.sql.DataFrame = [map(p1, s1, p2, s2): map<string,string>]

scala> map1.show()

+--------------------+
| map(p1, s1, p2, s2)|
+--------------------+
|[p1 -> s1, p2 -> s2]|
+--------------------+
scala> spark.sql("select element_at(map1, 'p1')")

org.apache.spark.sql.AnalysisException: cannot resolve 'map1' given input columns: []; line 1 pos 18; 'Project [unresolvedalias('element_at('map1, p1), None)]

How can we reuse the dataframe map1 in second sql query?

解决方案

map1 is a dataframe with a single column of type map. This column has the name map(p1, s1, p2, s2). The dataframe can be queried for example with selectExpr:

map1.selectExpr("element_at(`map(p1, s1, p2, s2)`, 'p1')").show()

prints

+-----------------------------------+
|element_at(map(p1, s1, p2, s2), p1)|
+-----------------------------------+
|                                 s1|
+-----------------------------------+

Another option is to register the dataframe as temporary view and then use a sql query:

map1.createOrReplaceTempView("map1")
spark.sql("select element_at(`map(p1, s1, p2, s2)`, 'p1') from map1").show()

which prints the same result.

这篇关于如何在 spark-sql 查询中引用地图列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆