在Spark中时如何使用AND或OR条件 [英] How to use AND or OR condition in when in Spark

查看:1076
本文介绍了在Spark中时如何使用AND或OR条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在这样的情况下评估两个条件:-

I wanted to evaluate two conditions in when like this :-

import pyspark.sql.functions as F

df = df.withColumn(
    'trueVal', F.when(df.value < 1 OR df.value2  == 'false' , 0 ).otherwise(df.value)) 

为此,我得到使用"OR"的无效语法"

For this I get 'invalid syntax' for using 'OR'

即使我尝试使用嵌套的when语句:-

Even I tried using nested when statements :-

df = df.withColumn(
    'v', 
    F.when(df.value < 1,(F.when( df.value =1,0).otherwise(df.value))).otherwise(df.value)
) 

为此,对于嵌套的when语句,我得到'keyword can't be an expression'.

For this i get 'keyword can't be an expression' for nested when statements.

如何在when中使用多种条件?

How could I use multiple conditions in when any work around ?

推荐答案

pyspark.sql.DataFrame.where 将布尔列作为其条件.使用PySpark时,在阅读列"时考虑列表达式"通常很有用.

pyspark.sql.DataFrame.where takes a Boolean Column as its condition. When using PySpark, it's often useful to think "Column Expression" when you read "Column".

PySpark列上的逻辑运算使用按位运算符:

Logical operations on PySpark columns use the bitwise operators:

  • &表示and
  • |表示or
  • ~表示not
  • & for and
  • | for or
  • ~ for not

当将它们与比较运算符(例如<)结合使用时,通常需要加上括号.

When combining these with comparison operators such as <, parenthesis are often needed.

对于您而言,正确的陈述是:

In your case, the correct statement is:

import pyspark.sql.functions as F
df = df.withColumn('trueVal',
    F.when((df.value < 1) | (df.value2 == 'false'), 0).otherwise(df.value))

另请参阅: SPARK-8568

这篇关于在Spark中时如何使用AND或OR条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆