如何在 Spark 中使用 AND 或 OR 条件 [英] How to use AND or OR condition in when in Spark
问题描述
我想在这样的时候评估两个条件:-
I wanted to evaluate two conditions in when like this :-
import pyspark.sql.functions as F
df = df.withColumn(
'trueVal', F.when(df.value < 1 OR df.value2 == 'false' , 0 ).otherwise(df.value))
为此,我得到了使用OR"的无效语法"
For this I get 'invalid syntax' for using 'OR'
即使我尝试使用嵌套 when 语句 :-
Even I tried using nested when statements :-
df = df.withColumn(
'v',
F.when(df.value < 1,(F.when( df.value =1,0).otherwise(df.value))).otherwise(df.value)
)
为此,我得到 'keyword can't be an expression'
嵌套 when 语句.
For this i get 'keyword can't be an expression'
for nested when statements.
如何在 when
中使用多个条件?
How could I use multiple conditions in when
any work around ?
推荐答案
pyspark.sql.DataFrame.where
将布尔列作为其条件.在使用 PySpark 时,通常在阅读列"时考虑列表达式"很有用.
pyspark.sql.DataFrame.where
takes a Boolean Column as its condition. When using PySpark, it's often useful to think "Column Expression" when you read "Column".
PySpark 列的逻辑操作使用位运算符:
Logical operations on PySpark columns use the bitwise operators:
&
用于和
|
表示或
~
为not
&
forand
|
foror
~
fornot
将这些与比较运算符(例如 <
)结合使用时,通常需要括号.
When combining these with comparison operators such as <
, parenthesis are often needed.
就您而言,正确的说法是:
In your case, the correct statement is:
import pyspark.sql.functions as F
df = df.withColumn('trueVal',
F.when((df.value < 1) | (df.value2 == 'false'), 0).otherwise(df.value))
另见:SPARK-8568
这篇关于如何在 Spark 中使用 AND 或 OR 条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!