pandas 数据框中的多个 if else 条件并派生多个列 [英] multiple if else conditions in pandas dataframe and derive multiple columns

查看:22
本文介绍了 pandas 数据框中的多个 if else 条件并派生多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框.

I have a dataframe like below.

import pandas as pd
import numpy as np
raw_data = {'student':['A','B','C','D','E'],
        'score': [100, 96, 80, 105,156], 
    'height': [7, 4,9,5,3],
    'trigger1' : [84,95,15,78,16],
    'trigger2' : [99,110,30,93,31],
    'trigger3' : [114,125,45,108,46]}

df2 = pd.DataFrame(raw_data, columns = ['student','score', 'height','trigger1','trigger2','trigger3'])

print(df2)

我需要根据多个条件导出Flag列.

I need to derive Flag column based on multiple conditions.

我需要将分数和高度列与触发器 1 -3 列进行比较.

i need to compare score and height columns with trigger 1 -3 columns.

标志栏:

  1. 如果分数大于等于触发 1 且高度小于 8 则为红色 --

  1. if Score greater than equal trigger 1 and height less than 8 then Red --

如果分数大于等于触发器 2 且高度小于 8,则为黄色 --

if Score greater than equal trigger 2 and height less than 8 then Yellow --

如果分数大于等于触发 3 且高度小于 8,则为橙色 --

if Score greater than equal trigger 3 and height less than 8 then Orange --

如果高度大于 8 则留空

if height greater than 8 then leave it as blank

如何在 Pandas 数据框中编写 if else 条件并导出列?

How to write if else conditions in pandas dataframe and derive columns?

预期输出

  student  score  height  trigger1  trigger2  trigger3    Flag
0       A    100       7        84        99       114  Yellow
1       B     96       4        95       110       125     Red
2       C     80       9        15        30        45     NaN
3       D    105       5        78        93       108  Yellow
4       E    156       3        16        31        46  Orange

对于我原来的问题中的其他列 Text1,我已经尝试过这个,但是在使用 astype(str) 任何其他方法连接时整数列没有转换字符串?

For other column Text1 in my original question I have tried this one but the integer columns not converting the string when concatenation using astype(str) any other approach?

def text_df(df):

    if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger1'].astype(str) + " and less than height 5"
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger2'].astype(str) + " and less than height 5"
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger3'].astype(str) + " and less than height 5"
    elif (df['height'] > 8):
        return np.nan

推荐答案

你需要使用上下限进行链式比较

You need chained comparison using upper and lower bound

def flag_df(df):
    
    if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return 'Red'
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return 'Yellow'
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return 'Orange'
    elif (df['height'] > 8):
        return np.nan
    
df2['Flag'] = df2.apply(flag_df, axis = 1)

    student score   height  trigger1    trigger2    trigger3    Flag
0   A       100     7       84          99          114         Yellow
1   B       96      4       95          110         125         Red
2   C       80      9       15          30          45          NaN
3   D       105     5       78          93          108         Yellow
4   E       156     3       16          31          46          Orange

注意:你可以用一个非常嵌套的 np.where 来做到这一点,但我更喜欢为多个 if-else 应用一个函数

Note: You can do this with a very nested np.where but I prefer to apply a function for multiple if-else

回答@Cecilia 的问题

answering @Cecilia's questions

  1. 返回的对象不是字符串而是一些计算,比如第一个条件,我们要返回df['height']*2

不确定您尝试了什么,但您可以使用

Not sure what you tried but you can return a derived value instead of string using

def flag_df(df):

    if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return df['height']*2
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return df['height']*3
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return df['height']*4
    elif (df['height'] > 8):
        return np.nan

  1. 如果 osome 列中有 'NaN' 值并且我想使用 df['xxx'] is None 作为条件,代码似乎不起作用

再次不确定您尝试了什么代码,但使用 Pandas isnull 可以解决问题

Again not sure what code did you try but using pandas isnull would do the trick

def flag_df(df):

    if pd.isnull(df['height']):
        return df['height']
    elif (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return df['height']*2
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return df['height']*3
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return df['height']*4
    elif (df['height'] > 8):
        return np.nan

这篇关于 pandas 数据框中的多个 if else 条件并派生多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆