对 pandas 数据框列使用条件if / else逻辑 [英] Using conditional if/else logic with pandas dataframe columns

查看:89
本文介绍了对 pandas 数据框列使用条件if / else逻辑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据帧称为 pw2 看起来像这样,其中有两列pw1和pw2,这是获胜的概率。我想执行一些条件逻辑,以基于 pw1 WINNER 的列> pw2 。

My dataframe called pw2 looks something like this, where I have two columns, pw1 and pw2, which are probability of wins. I'd like to perform some conditional logic to create another column called WINNER based off pw1 and pw2.

+-------------------------+-------------+-----------+-------------+
|          Name1          |     pw1     |   Name2   |     pw2     |
+-------------------------+-------------+-----------+-------------+
| Seaking                 | 0.517184213 | Lickitung | 0.189236181 |
| Ferrothorn              | 0.172510623 | Quagsire  | 0.260884258 |
| Thundurus Therian Forme | 0.772536272 | Hitmonlee | 0.694069408 |
| Flaaffy                 | 0.28681284  | NaN       | NaN         |
+-------------------------+-------------+-----------+-------------+

我想在函数中有条件地执行此操作,但遇到了一些麻烦。

I want to do this conditionally in a function but I'm having some trouble.


  • 如果 pw1 > pw2 ,填充 Name1

  • 如果 pw2 > pw1 ,并填充 Name2

  • 如果已填充 pw1 但未填充 pw2 ,则使用 Name1填充

  • 如果已填充 pw2 pw1 不是,使用 Name2

  • if pw1 > pw2, populate with Name1
  • if pw2 > pw1, populate with Name2
  • if pw1 is populated but pw2 isn't, populate with Name1
  • if pw2 is populated but pw1 isn't, populate with Name2

填充,但是我的功能无法正常工作-由于某种原因,检查值是否为null无效。

But my function isn't working - for some reason checking if a value is null isn't working.

def final_winner(df):
    # If PW1 is missing and PW2 is populated, Pokemon 1 wins
    if df['pw1'] = None and df['pw2'] != None:
        return df['Number1']
    # If it's the same thing but the other way around, Pokemon 2 wins
    elif df['pw2'] = None and df['pw1'] != None:
        return df['Number2']
    # If pw2 is greater than pw1, then Pokemon 2 wins
    elif df['pw2'] > df['pw1']:
        return df['Number2']
    else
        return df['Number1']

pw2['Winner'] = pw2.apply(final_winner, axis=1)


推荐答案

不要使用 apply ,这非常慢。使用 np.where

Do not use apply, which is very slow. Use np.where

pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)

一旦 NaN 总是失败,就可以 fillna()并使用 -np.inf 产生相同的逻辑。

Once NaNs always lose, can just fillna() it with -np.inf to yield same logic.

查看您的代码,我们可以指出几个问题。首先,您要比较 df ['pw1'] =无,这是无效的python语法,无法进行比较。您通常想使用 == 运算符进行比较。但是,对于,建议使用 is ,例如如果变量为None。 :(...)。但是同样,您仍然处于 pandas / numpy 环境中,在该环境中实际上有多个空值( None NaN NaT 等)。

Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None, which is invalid python syntax for comparison. You usually want to compare things using == operator. However, for None, it is recommended to use is, such as if variable is None: (...). However again, you are in a pandas/numpy environment, where there actually several values for null values (None, NaN, NaT, etc).

所以最好使用 pd.isnull() df.isnull()检查可为空性。

So, it is preferable to check for nullability using pd.isnull() or df.isnull().

仅说明一下,这就是您的代码的样子:

Just to illustrate, this is how your code should look like:

def final_winner(df):
    if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
        return df['Name1']
    elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
        return df['Name1']
    elif df['pw2'] > df['pw1']:
        return df['Name2']
    else:
        return df['Name1']

df['winner'] = df.apply(final_winner, axis=1)

但同样,绝对要使用 np.where

这篇关于对 pandas 数据框列使用条件if / else逻辑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆