如何通过dict对pyspark.sql.funtions.when()使用多个条件? [英] How do I use multiple conditions with pyspark.sql.funtions.when() from a dict?
本文介绍了如何通过dict对pyspark.sql.funtions.when()使用多个条件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想根据字典中的值生成一个when子句.它与正在执行的操作非常相似如何pyspark.sql.funtions.when()使用多个条件?
I want to generate a when clause based on values in a dict. Its very similar to what's being done How do I use multiple conditions with pyspark.sql.funtions.when()?
我只想传递cols和values的字典
Only I want to pass a dict of cols and values
假设我有一个字典:
{
'employed': 'Y',
'athlete': 'N'
}
我想使用该字典生成以下内容的等同物:
I want to use that dict to generate the equivalent of:
df.withColumn("call_person",when((col("employed") == "Y") & (col("athlete") == "N"), "Y")
所以最终结果是:
+---+-----------+--------+-------+
| id|call_person|employed|athlete|
+---+-----------+--------+-------+
| 1| Y | Y | N |
| 2| N | Y | Y |
| 3| N | N | N |
+---+-----------+--------+-------+
请注意,我要以编程方式执行此操作的部分原因是我具有不同的长度格(条件数)
Note part of the reason I want to do it programmatically is I have different length dicts (number of conditions)
推荐答案
使用reduce()函数:
Use reduce() function:
from functools import reduce
from pyspark.sql.functions import when, col
# dictionary
d = {
'employed': 'Y',
'athlete': 'N'
}
# set up the conditions, multiple conditions merged with `&`
cond = reduce(lambda x,y: x&y, [ col(c) == v for c,v in d.items() if c in df.columns ])
# set up the new column
df.withColumn("call_person", when(cond, "Y").otherwise("N")).show()
+---+--------+-------+-----------+
| id|employed|athlete|call_person|
+---+--------+-------+-----------+
| 1| Y| N| Y|
| 2| Y| Y| N|
| 3| N| N| N|
+---+--------+-------+-----------+
这篇关于如何通过dict对pyspark.sql.funtions.when()使用多个条件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文