在数组内部触发SQL搜索以获取结构 [英] Spark SQL search inside an array for a struct

查看:87
本文介绍了在数组内部触发SQL搜索以获取结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据结构大致定义如下:

My data structure is defined approximately as follows:

schema = StructType([
# ... fields skipped
StructField("extra_features", 
ArrayType(StructType([
    StructField("key", StringType(), False),
    StructField("value", StringType(), True)
])), nullable = False)],
)

现在,我想在数组列中存在结构{"key": "somekey", "value": "somevalue"}的数据框中搜索条目.我该怎么办?

Now, I'd like to search for entries in a data frame where a struct {"key": "somekey", "value": "somevalue"} exists in the array column. How do I do this?

推荐答案

Spark具有功能

Spark has a function array_contains that can be used to check the contents of an ArrayType column, but unfortunately it doesn't seem like it can handle arrays of complex types. It is possible to do it with a UDF (User Defined Function) however:

from pyspark.sql.types import *
from pyspark.sql import Row
import pyspark.sql.functions as F

schema = StructType([StructField("extra_features", ArrayType(StructType([
    StructField("key", StringType(), False),
    StructField("value", StringType(), True)])),
    False)])

df = spark.createDataFrame([
    Row([{'key': 'a', 'value': '1'}]),
    Row([{'key': 'b', 'value': '2'}])], schema)

# UDF to check whether {'key': 'a', 'value': '1'} is in an array
# The actual data of a (nested) StructType value is a Row
contains_keyval = F.udf(lambda extra_features: Row(key='a', value='1') in extra_features, BooleanType())

df.where(contains_keyval(df.extra_features)).collect()

结果是:

[Row(extra_features=[Row(key=u'a', value=u'1')])]

您还可以使用UDF添加另一列,以指示是否存在键值对:

You can also use the UDF to add another column that indicates whether the key-value pair is present:

df.withColumn('contains_it', contains_keyval(df.extra_features)).collect()

导致:

[Row(extra_features=[Row(key=u'a', value=u'1')], contains_it=True),
 Row(extra_features=[Row(key=u'b', value=u'2')], contains_it=False)]

这篇关于在数组内部触发SQL搜索以获取结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆