在Spark中时如何使用AND或OR条件 [英] How to use AND or OR condition in when in Spark
问题描述
我想在这样的情况下评估两个条件:-
I wanted to evaluate two conditions in when like this :-
import pyspark.sql.functions as F
df = df.withColumn(
'trueVal', F.when(df.value < 1 OR df.value2 == 'false' , 0 ).otherwise(df.value))
为此,我得到使用"OR"的无效语法"
For this I get 'invalid syntax' for using 'OR'
即使我尝试使用嵌套的when语句:-
Even I tried using nested when statements :-
df = df.withColumn(
'v',
F.when(df.value < 1,(F.when( df.value =1,0).otherwise(df.value))).otherwise(df.value)
)
为此,对于嵌套的when语句,我得到'keyword can't be an expression'
.
For this i get 'keyword can't be an expression'
for nested when statements.
如何在when
中使用多种条件?
How could I use multiple conditions in when
any work around ?
推荐答案
pyspark.sql.DataFrame.where
将布尔列作为其条件.使用PySpark时,在阅读列"时考虑列表达式"通常很有用.
pyspark.sql.DataFrame.where
takes a Boolean Column as its condition. When using PySpark, it's often useful to think "Column Expression" when you read "Column".
PySpark列上的逻辑运算使用按位运算符:
Logical operations on PySpark columns use the bitwise operators:
-
&
表示and
-
|
表示or
-
~
表示not
&
forand
|
foror
~
fornot
当将它们与比较运算符(例如<
)结合使用时,通常需要加上括号.
When combining these with comparison operators such as <
, parenthesis are often needed.
对于您而言,正确的陈述是:
In your case, the correct statement is:
import pyspark.sql.functions as F
df = df.withColumn('trueVal',
F.when((df.value < 1) | (df.value2 == 'false'), 0).otherwise(df.value))
另请参阅: SPARK-8568
这篇关于在Spark中时如何使用AND或OR条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!