如何在pyspark列表中的值上过滤列? [英] How to filter column on values in list in pyspark?

查看:909
本文介绍了如何在pyspark列表中的值上过滤列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框原始数据,必须在其上对X列应用值CB,CI和CR的过滤条件.所以我用下面的代码:

I have a dataframe rawdata on which i have to apply filter condition on column X with values CB,CI and CR. So I used the below code:

df = dfRawData.filter(col("X").between("CB","CI","CR"))

但是我遇到以下错误:

between()正好接受3个参数(给定4个参数)

between() takes exactly 3 arguments (4 given)

请让我知道如何解决此问题.

Please let me know how I can resolve this issue.

推荐答案

between用于检查值是否在两个值之间,输入是下限还是上限.它不能用于检查列值是否在列表中.为此,请使用isin:

between is used to check if the value is between two values, the input is a lower bound and an upper bound. It can not be used to check if a column value is in a list. To do that, use isin:

import pyspark.sql.functions as f
df = dfRawData.where(f.col("X").isin({"CB", "CI", "CR"}))

这篇关于如何在pyspark列表中的值上过滤列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆