如何在 Spark 中使用 AND 或 OR 条件 [英] How to use AND or OR condition in when in Spark

查看:53
本文介绍了如何在 Spark 中使用 AND 或 OR 条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在这样的时候评估两个条件:-

I wanted to evaluate two conditions in when like this :-

import pyspark.sql.functions as F

df = df.withColumn(
    'trueVal', F.when(df.value < 1 OR df.value2  == 'false' , 0 ).otherwise(df.value)) 

为此,我得到了使用OR"的无效语法"

For this I get 'invalid syntax' for using 'OR'

即使我尝试使用嵌套 when 语句 :-

Even I tried using nested when statements :-

df = df.withColumn(
    'v', 
    F.when(df.value < 1,(F.when( df.value =1,0).otherwise(df.value))).otherwise(df.value)
) 

为此,我得到 'keyword can't be an expression' 嵌套 when 语句.

For this i get 'keyword can't be an expression' for nested when statements.

如何在 when 中使用多个条件?

How could I use multiple conditions in when any work around ?

推荐答案

pyspark.sql.DataFrame.where 将布尔列作为其条件.在使用 PySpark 时,通常在阅读列"时考虑列表达式"很有用.

pyspark.sql.DataFrame.where takes a Boolean Column as its condition. When using PySpark, it's often useful to think "Column Expression" when you read "Column".

PySpark 列的逻辑操作使用位运算符:

Logical operations on PySpark columns use the bitwise operators:

  • & 用于
  • | 表示
  • ~not
  • & for and
  • | for or
  • ~ for not

将这些与比较运算符(例如 <)结合使用时,通常需要括号.

When combining these with comparison operators such as <, parenthesis are often needed.

就您而言,正确的说法是:

In your case, the correct statement is:

import pyspark.sql.functions as F
df = df.withColumn('trueVal',
    F.when((df.value < 1) | (df.value2 == 'false'), 0).otherwise(df.value))

另见:SPARK-8568

这篇关于如何在 Spark 中使用 AND 或 OR 条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆