如何检查 Spark 数据帧结构体数组是否包含特定值 [英] How to check if a Spark data frame struct Array contains a specific value
问题描述
我有一个具有以下架构的数据框
I have a data frame with following schema
我的要求是在任何地址数组元素中过滤与给定字段(如城市)匹配的行.我可以访问单个字段,如 loyaltyMember.address[0].city
,但我必须检查所有地址数组元素以查看是否存在匹配项.我如何在 spark sql 中实现这一点,我无法使用 array_contains 函数,因为数组是复杂类型
My requirement is to filter the rows that matches given field like city in any of the address array elements.I can access individual fields like loyaltyMember.address[0].city
, but i have to check all address array elements to see if any match exists. How can i achieve that in spark sql, i couldn't use array_contains function since the array is of complex type
root
|-- loyaltyMember: struct (nullable = true)
| |-- Name: string (nullable = true)
| |-- address: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- addressType: string (nullable = true)
| | | |-- city: string (nullable = true)
| | | |-- countryCode: string (nullable = true)
| | | |-- postalCode: string (nullable = true)
| | | |-- street: string (nullable = true)
推荐答案
我相信你仍然可以使用 array_contains
如下(在 PySpark 中):
I believe you can still use array_contains
as follows (in PySpark):
from pyspark.sql.functions import col, array_contains
df.filter(array_contains(col('loyaltyMember.address.city'), 'Prague'))
这将过滤在数组列 city
元素 'Prague' 中具有的所有行.
This will filter all rows that have in the array column city
element 'Prague'.
这篇关于如何检查 Spark 数据帧结构体数组是否包含特定值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!