PySpark:when子句中的多个条件 [英] PySpark: multiple conditions in when clause

查看:622
本文介绍了PySpark:when子句中的多个条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想修改当前为空的数据框列(Age)的单元格值,并且只有在另一列(生存)的对应行的年龄为空白的情况下,我才修改它的值.如果在存活"列中为1,但在年龄"列中为空白,那么我将其保留为null.

I would like to modify the cell values of a dataframe column (Age) where currently it is blank and I would only do it if another column (Survived) has the value 0 for the corresponding row where it is blank for Age. If it is 1 in the Survived column but blank in Age column then I will keep it as null.

我尝试使用&&运算符,但是没有用.这是我的代码:

I tried to use && operator but it didn't work. Here is my code:

tdata.withColumn("Age",  when((tdata.Age == "" && tdata.Survived == "0"), mean_age_0).otherwise(tdata.Age)).show()

任何建议如何处理?谢谢.

Any suggestions how to handle that? Thanks.

错误消息:

SyntaxError: invalid syntax
  File "<ipython-input-33-3e691784411c>", line 1
    tdata.withColumn("Age",  when((tdata.Age == "" && tdata.Survived == "0"), mean_age_0).otherwise(tdata.Age)).show()
                                                    ^

推荐答案

由于Python没有&&运算符,因此会出现SyntaxError错误异常.它具有and&,其中后一个是在Column上创建布尔表达式的正确选择(|用于逻辑析取,~用于逻辑求反).

You get SyntaxError error exception because Python has no && operator. It has and and & where the latter one is the correct choice to create boolean expressions on Column (| for a logical disjunction and ~ for logical negation).

您创建的条件也无效,因为它不考虑运算符优先级. Python中的&优先级高于==,因此必须在表达式中加上括号.

Condition you created is also invalid because it doesn't consider operator precedence. & in Python has a higher precedence than == so expression has to be parenthesized.

(col("Age") == "") & (col("Survived") == "0")
## Column<b'((Age = ) AND (Survived = 0))'>

在侧面说明中,when函数等效于case表达式而不是WHEN子句.仍然适用相同的规则.连词:

On a side note when function is equivalent to case expression not WHEN clause. Still the same rules apply. Conjunction:

df.where((col("foo") > 0) & (col("bar") < 0))

析取:

df.where((col("foo") > 0) | (col("bar") < 0))

您当然可以单独定义条件以避免出现括号:

You can of course define conditions separately to avoid brackets:

cond1 = col("Age") == "" 
cond2 = col("Survived") == "0"

cond1 & cond2

这篇关于PySpark:when子句中的多个条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆