如何通过dict对pyspark.sql.funtions.when()使用多个条件? [英] How do I use multiple conditions with pyspark.sql.funtions.when() from a dict?

查看:139
本文介绍了如何通过dict对pyspark.sql.funtions.when()使用多个条件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据字典中的值生成一个when子句.它与正在执行的操作非常相似如何pyspark.sql.funtions.when()使用多个条件?

I want to generate a when clause based on values in a dict. Its very similar to what's being done How do I use multiple conditions with pyspark.sql.funtions.when()?

我只想传递cols和values的字典

Only I want to pass a dict of cols and values

假设我有一个字典:

{
  'employed': 'Y',
  'athlete': 'N'
}

我想使用该字典生成以下内容的等同物:

I want to use that dict to generate the equivalent of:

df.withColumn("call_person",when((col("employed") == "Y") & (col("athlete") == "N"), "Y")

所以最终结果是:

+---+-----------+--------+-------+
| id|call_person|employed|athlete|
+---+-----------+--------+-------+
|  1|     Y     |    Y   |   N   |
|  2|     N     |    Y   |   Y   |
|  3|     N     |    N   |   N   |
+---+-----------+--------+-------+

请注意,我要以编程方式执行此操作的部分原因是我具有不同的长度格(条件数)

Note part of the reason I want to do it programmatically is I have different length dicts (number of conditions)

推荐答案

使用reduce()函数:

Use reduce() function:

from functools import reduce
from pyspark.sql.functions import when, col

# dictionary
d = {
  'employed': 'Y',
  'athlete': 'N'
}

# set up the conditions, multiple conditions merged with `&`
cond = reduce(lambda x,y: x&y, [ col(c) == v for c,v in d.items() if c in df.columns ])

# set up the new column
df.withColumn("call_person", when(cond, "Y").otherwise("N")).show()
+---+--------+-------+-----------+
| id|employed|athlete|call_person|
+---+--------+-------+-----------+
|  1|       Y|      N|          Y|
|  2|       Y|      Y|          N|
|  3|       N|      N|          N|
+---+--------+-------+-----------+

这篇关于如何通过dict对pyspark.sql.funtions.when()使用多个条件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆