pandas 替换,多列标准 [英] Pandas replace, multi column criteria

查看:89
本文介绍了 pandas 替换,多列标准的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据多列上的某些条件替换Pandas数据框中的值.对于单列标准,可以使用字典非常优雅地完成此操作(例如用字典重新映射熊猫列中的值):

I'm trying to replace values in a Pandas data frame, based on certain criteria on multiple columns. For a single column criteria this can be done very elegantly with a dictionary (e.g. Remap values in pandas column with a dict):

import pandas as pd

df = pd.DataFrame({'col1': {0:1, 1:1, 2:2}, 'col2': {0:10, 1:20, 2:20}})

rdict = {1:'a', 2:'b'}
df2 = df.replace({"col1": rdict})

输入df:

   col1  col2
0     1    10
1     1    20
2     2    20

结果df2:

  col1  col2
0    a    10
1    a    20
2    b    20

我正在尝试将其扩展到多个列中的条件(例如where col1==1, col2==10-> replace).对于单个条件,可以这样完成:

I'm trying to extend this to criteria over multiple columns (e.g. where col1==1, col2==10 -> replace). For a single criteria this can be done like:

df3=df.copy()
df3.loc[((df['col1']==1)&(df['col2']==10)), 'col1'] = 'c'

这将导致df3:

  col1  col2
0    c    10
1    1    20
2    2    20 

我的现实生活中的问题涉及很多标准,其中涉及大量的df3.loc[((criteria1)&(criteria2)), column] = value调用,与使用字典作为查找表"进行替换相比,这要优雅得多.是否可以将优雅的解决方案(df2 = df.replace({"col1": rdict}))扩展到一种设置,其中一列中的值被基于多列的条件替换?

My real life problem has a large number of criteria, which would involve a large number of df3.loc[((criteria1)&(criteria2)), column] = value calls, which is far less elegant the the replacement using a dictionary as a "lookup table". Is it possible to extend the elegant solution (df2 = df.replace({"col1": rdict})) to a setup where values in one column are replaced by criteria based on multiple columns?

我要达到的目标的一个示例(尽管在我的现实生活中,标准的数量要大很多):

An example of what I'm trying to achieve (although in my real life case the number of criteria is a lot larger):

df = pd.DataFrame({'col1': {0:1, 1:1, 2:2, 3:2}, 'col2': {0:10, 1:20, 2:10, 3:20}})

df3=df.copy()
df3.loc[((df['col1']==1)&(df['col2']==10)), 'col1'] = 'a'
df3.loc[((df['col1']==1)&(df['col2']==20)), 'col1'] = 'b'
df3.loc[((df['col1']==2)&(df['col2']==10)), 'col1'] = 'c'
df3.loc[((df['col1']==2)&(df['col2']==20)), 'col1'] = 'd'

输入df:

0     1    10
1     1    20
2     2    10
3     2    20

结果df3:

  col1  col2
0    a    10
1    b    20
2    c    10
3    d    20

推荐答案

我们可以使用merge.

假设您的df看起来像

df = pd.DataFrame({'col1': {0:1, 1:1, 2:2, 3:2, 4:2, 5:1}, 'col2': {0:10, 1:20, 2:10, 3:20, 4: 20, 5:10}})

    col1 col2
0   1    10
1   1    20
2   2    10
3   2    20
4   2    20
5   1    10

您的条件替换可以表示为另一个数据框:

And your conditional replacement can be represented as another dataframe:

df_replace

  col1  col2    val
0   1   10      a
1   1   20      b
2   2   10      c
3   2   20      d

(As OP (Bart) pointed out, you can save this in a csv file.)

然后您就可以使用

df = df.merge(df_replace, on=["col1", "col2"], how="left")

    col1    col2    val
0   1       10      a
1   1       20      b
2   2       10      c
3   2       20      d
4   2       20      d
5   1       10      a

然后您只需要放下col1.

正如MaxU指出的那样,可能有一些行没有被替换而导致出现NaN.我们可以使用

As MaxU pointed out, there could be rows that does not get replaced and resulting in NaN. We can use a line like

df["val"] = df["val"].combine_first(df["col1"])

如果合并后的结果值为NaN,则

填充col1中的值.

to fill in values from col1 if the resulting values after merge is NaN.

这篇关于 pandas 替换,多列标准的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆