如何检查 Spark 数据帧结构体数组是否包含特定值 [英] How to check if a Spark data frame struct Array contains a specific value

查看:38
本文介绍了如何检查 Spark 数据帧结构体数组是否包含特定值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下架构的数据框

I have a data frame with following schema

我的要求是在任何地址数组元素中过滤与给定字段(如城市)匹配的行.我可以访问单个字段,如 loyaltyMember.address[0].city,但我必须检查所有地址数组元素以查看是否存在匹配项.我如何在 spark sql 中实现这一点,我无法使用 array_contains 函数,因为数组是复杂类型

My requirement is to filter the rows that matches given field like city in any of the address array elements.I can access individual fields like loyaltyMember.address[0].city, but i have to check all address array elements to see if any match exists. How can i achieve that in spark sql, i couldn't use array_contains function since the array is of complex type

root
 |-- loyaltyMember: struct (nullable = true)
 |    |-- Name: string (nullable = true)
 |    |-- address: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- addressType: string (nullable = true)
 |    |    |    |-- city: string (nullable = true)
 |    |    |    |-- countryCode: string (nullable = true)
 |    |    |    |-- postalCode: string (nullable = true)
 |    |    |    |-- street: string (nullable = true)

推荐答案

我相信你仍然可以使用 array_contains 如下(在 PySpark 中):

I believe you can still use array_contains as follows (in PySpark):

from pyspark.sql.functions import col, array_contains

df.filter(array_contains(col('loyaltyMember.address.city'), 'Prague'))

这将过滤在数组列 city 元素 'Prague' 中具有的所有行.

This will filter all rows that have in the array column city element 'Prague'.

这篇关于如何检查 Spark 数据帧结构体数组是否包含特定值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆