数组中第 n 项的 SparkSQL sql 语法 [英] SparkSQL sql syntax for nth item in array

查看：34 发布时间：2021/11/14 22:40:55 python apache-spark pyspark apache-spark-sql

本文介绍了数组中第 n 项的 SparkSQL sql 语法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 json 对象，它不幸地结合了嵌套和数组.所以如何用spark sql查询并不是很明显.

I have a json object that has an unfortunate combination of nesting and arrays. So its not totally obvious how to query it with spark sql.

这是一个示例对象:

{
  stuff: [
    {a:1,b:2,c:3}
  ]
}

因此，在 javascript 中，要获取 c 的值，我会编写 myData.stuff[0].c

so, in javascript, to get the value for c, I'd write myData.stuff[0].c

并且在我的 spark sql 查询中，如果该数组不存在，我将能够使用点表示法:

And in my spark sql query, if that array wasn't there, I'd be able to use dot notation:

SELECT stuff.c FROM blah

但我不能，因为最里面的对象被包裹在一个数组中.

but I can't, because the innermost object is wrapped in an array.

我试过了:

SELECT stuff.0.c FROM blah // FAIL
SELECT stuff.[0].c FROM blah // FAIL

那么，选择该数据的神奇方法是什么?或者现在还支持吗?

So, what is the magical way to select that data? or is that even supported yet?

推荐答案

JSON 对象 不是很清楚，所以让我们考虑两种不同的情况:

It is not clear what you mean by JSON object so lets consider two different cases:

结构体数组

An array of structs

import tempfile    

path = tempfile.mktemp()
with open(path, "w") as fw: 
    fw.write('''{"stuff": [{"a": 1, "b": 2, "c": 3}]}''')
df = sqlContext.read.json(path)
df.registerTempTable("df")

df.printSchema()
## root
##  |-- stuff: array (nullable = true)
##  |    |-- element: struct (containsNull = true)
##  |    |    |-- a: long (nullable = true)
##  |    |    |-- b: long (nullable = true)
##  |    |    |-- c: long (nullable = true)

sqlContext.sql("SELECT stuff[0].a FROM df").show()

## +---+
## |_c0|
## +---+
## |  1|
## +---+

一组地图

An array of maps

# Note: schema inference from dictionaries has been deprecated
# don't use this in practice
df = sc.parallelize([{"stuff": [{"a": 1, "b": 2, "c": 3}]}]).toDF()
df.registerTempTable("df")

df.printSchema()
## root
##  |-- stuff: array (nullable = true)
##  |    |-- element: map (containsNull = true)
##  |    |    |-- key: string
##  |    |    |-- value: long (valueContainsNull = true)

sqlContext.sql("SELECT stuff[0]['a'] FROM df").show()
## +---+
## |_c0|
## +---+
## |  1|
## +---+

另见使用复杂类型查询 Spark SQL DataFrame

这篇关于数组中第 n 项的 SparkSQL sql 语法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

数组中第 n 项的 SparkSQL sql 语法 [英] SparkSQL sql syntax for nth item in array

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

数组中第 n 项的 SparkSQL sql 语法 [英] SparkSQL sql syntax for nth item in array

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭