数组中第n个项目的SparkSQL SQL语法 [英] SparkSQL sql syntax for nth item in array
问题描述
我有一个json对象,不幸的是嵌套和数组的组合.因此,如何使用spark sql查询它并不是很明显.
I have a json object that has an unfortunate combination of nesting and arrays. So its not totally obvious how to query it with spark sql.
这是一个示例对象:
{
stuff: [
{a:1,b:2,c:3}
]
}
所以,在javascript中,要获取c
的值,我要写myData.stuff[0].c
so, in javascript, to get the value for c
, I'd write myData.stuff[0].c
在我的spark sql查询中,如果该数组不存在,则可以使用点表示法:
And in my spark sql query, if that array wasn't there, I'd be able to use dot notation:
SELECT stuff.c FROM blah
但是我不能,因为最里面的对象包装在一个数组中.
but I can't, because the innermost object is wrapped in an array.
我尝试过:
SELECT stuff.0.c FROM blah // FAIL
SELECT stuff.[0].c FROM blah // FAIL
那么,选择该数据的神奇方法是什么?还是已经支持了?
So, what is the magical way to select that data? or is that even supported yet?
推荐答案
JSON对象的含义不明确,因此让我们考虑两种不同的情况:
It is not clear what you mean by JSON object so lets consider two different cases:
-
结构数组
An array of structs
import tempfile
path = tempfile.mktemp()
with open(path, "w") as fw:
fw.write('''{"stuff": [{"a": 1, "b": 2, "c": 3}]}''')
df = sqlContext.read.json(path)
df.registerTempTable("df")
df.printSchema()
## root
## |-- stuff: array (nullable = true)
## | |-- element: struct (containsNull = true)
## | | |-- a: long (nullable = true)
## | | |-- b: long (nullable = true)
## | | |-- c: long (nullable = true)
sqlContext.sql("SELECT stuff[0].a FROM df").show()
## +---+
## |_c0|
## +---+
## | 1|
## +---+
一组地图
An array of maps
# Note: schema inference from dictionaries has been deprecated
# don't use this in practice
df = sc.parallelize([{"stuff": [{"a": 1, "b": 2, "c": 3}]}]).toDF()
df.registerTempTable("df")
df.printSchema()
## root
## |-- stuff: array (nullable = true)
## | |-- element: map (containsNull = true)
## | | |-- key: string
## | | |-- value: long (valueContainsNull = true)
sqlContext.sql("SELECT stuff[0]['a'] FROM df").show()
## +---+
## |_c0|
## +---+
## | 1|
## +---+
另请参见查询具有复杂类型的Spark SQL DataFrame
这篇关于数组中第n个项目的SparkSQL SQL语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!