pyspark多个列的条件并返回新列 [英] pyspark conditions on multiple columns and returning new column
问题描述
我使用spark 2.1,脚本是pyspark。请帮助我,因为我卡在这里。
I am using spark 2.1 and scripting is pyspark. Please help me with this as I am stuck up here .
问题陈述:根据多列的条件创建新列
Problem statement: To create new columns based on conditions on multiple columns
输入 dataframe
低于
Input dataframe
is below
FLG1 FLG2 FLG3
T F T
F T T
T T F
现在我需要创建一个新的列作为FLG,并且我的条件将类似于如果 FLG1 == T&&(FLG2 == F || FLG2 == T) my
FLG
必须是 T
其他 F
Now I need to create one new column as FLG and my conditions would be like if FLG1==T&&(FLG2==F||FLG2==T)
my FLG
has to be T
else F
考虑到 dataframe
作为 DF
下面是我试过的代码片段
below is my code snippet which was tried
DF.withColumn("FLG",DF.select(when(FLG1=='T' and (FLG2=='F' or FLG2=='T','F').otherwise('T'))).show()
没有工作我在没有定义的时候得到了名字
Didn't work I was getting name when is not defined
请帮我渡过这个障碍
推荐答案
请尝试以下操作,我t应该工作
Try the following, it should work
from pyspark.sql.functions import col, when, lit
DF.withColumn("FLG", when((col("FLG1")=='T') & ((col("FLG2")=='F') | (col("FLG2")=='T')),lit('F')).otherwise(lit('T'))).show()
这篇关于pyspark多个列的条件并返回新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!