pandas 数据框中的多个 if else 条件并派生多个列 [英] multiple if else conditions in pandas dataframe and derive multiple columns
问题描述
我有一个如下所示的数据框.
I have a dataframe like below.
import pandas as pd
import numpy as np
raw_data = {'student':['A','B','C','D','E'],
'score': [100, 96, 80, 105,156],
'height': [7, 4,9,5,3],
'trigger1' : [84,95,15,78,16],
'trigger2' : [99,110,30,93,31],
'trigger3' : [114,125,45,108,46]}
df2 = pd.DataFrame(raw_data, columns = ['student','score', 'height','trigger1','trigger2','trigger3'])
print(df2)
我需要根据多个条件导出Flag列.
I need to derive Flag column based on multiple conditions.
我需要将分数和高度列与触发器 1 -3 列进行比较.
i need to compare score and height columns with trigger 1 -3 columns.
标志栏:
如果分数大于等于触发 1 且高度小于 8 则为红色 --
if Score greater than equal trigger 1 and height less than 8 then Red --
如果分数大于等于触发器 2 且高度小于 8,则为黄色 --
if Score greater than equal trigger 2 and height less than 8 then Yellow --
如果分数大于等于触发 3 且高度小于 8,则为橙色 --
if Score greater than equal trigger 3 and height less than 8 then Orange --
如果高度大于 8 则留空
if height greater than 8 then leave it as blank
如何在 Pandas 数据框中编写 if else 条件并导出列?
How to write if else conditions in pandas dataframe and derive columns?
预期输出
student score height trigger1 trigger2 trigger3 Flag
0 A 100 7 84 99 114 Yellow
1 B 96 4 95 110 125 Red
2 C 80 9 15 30 45 NaN
3 D 105 5 78 93 108 Yellow
4 E 156 3 16 31 46 Orange
对于我原来的问题中的其他列 Text1,我已经尝试过这个,但是在使用 astype(str) 任何其他方法连接时整数列没有转换字符串?
For other column Text1 in my original question I have tried this one but the integer columns not converting the string when concatenation using astype(str) any other approach?
def text_df(df):
if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger1'].astype(str) + " and less than height 5"
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger2'].astype(str) + " and less than height 5"
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger3'].astype(str) + " and less than height 5"
elif (df['height'] > 8):
return np.nan
推荐答案
你需要使用上下限进行链式比较
You need chained comparison using upper and lower bound
def flag_df(df):
if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return 'Red'
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return 'Yellow'
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return 'Orange'
elif (df['height'] > 8):
return np.nan
df2['Flag'] = df2.apply(flag_df, axis = 1)
student score height trigger1 trigger2 trigger3 Flag
0 A 100 7 84 99 114 Yellow
1 B 96 4 95 110 125 Red
2 C 80 9 15 30 45 NaN
3 D 105 5 78 93 108 Yellow
4 E 156 3 16 31 46 Orange
注意:你可以用一个非常嵌套的 np.where 来做到这一点,但我更喜欢为多个 if-else 应用一个函数
Note: You can do this with a very nested np.where but I prefer to apply a function for multiple if-else
回答@Cecilia 的问题
answering @Cecilia's questions
- 返回的对象不是字符串而是一些计算,比如第一个条件,我们要返回df['height']*2
不确定您尝试了什么,但您可以使用
Not sure what you tried but you can return a derived value instead of string using
def flag_df(df):
if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan
- 如果 osome 列中有 'NaN' 值并且我想使用 df['xxx'] is None 作为条件,代码似乎不起作用
再次不确定您尝试了什么代码,但使用 Pandas isnull
可以解决问题
Again not sure what code did you try but using pandas isnull
would do the trick
def flag_df(df):
if pd.isnull(df['height']):
return df['height']
elif (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan
这篇关于 pandas 数据框中的多个 if else 条件并派生多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!