根据来自另一个数据帧的多个列条件创建列 [英] Create column based on multiple column conditions from another dataframe

查看:57
本文介绍了根据来自另一个数据帧的多个列条件创建列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有两个数据框 - 条件和数据.

Suppose I have two dataframes - conditions and data.

import pandas as pd

conditions = pd.DataFrame({'class': [1,2,3,4,4,5,5,4,4,5,5,5],
                           'primary_lower': [0,0,0,160,160,160,160,160,160,160,160,800],
                           'primary_upper':[9999,9999,9999,480,480,480,480,480,480,480,480,4000],
                           'secondary_lower':[0,0,0,3500,6100,3500,6100,0,4800,0,4800,10],
                           'secondary_upper':[9999,9999,9999,4700,9999,4700,9999,4699,6000,4699,6000,3000],
                           'group':['A','A','A','B','B','B','B','C','C','C','C','C']})

data = pd.DataFrame({'class':[1,1,4,4,5,5,2],
                     'primary':[2000,9100,1100,170,300,210,1000],
                     'secondary':[1232,3400,2400,380,3600,4800,8600]})

我想在数据"表中生成一个新列(组),根据条件"表中提供的条件为每一行分配一个组.

I'd like to generate a new column (group) in the "data" table that assigns a group to each row given the conditions provided in the "conditions" table.

条件表的结构使得每个组中的行由OR"连接,列由AND"连接.例如,要分配组B":

The conditions table is structured so that rows within each group are joined by "OR"s and columns are joined by "AND"s. For example, to be assigned group "B":

(class = 4 AND 160<=primary<=480 AND 3500<=secondary<=4700)

(class = 4 AND 160<=primary<=480 AND 3500<=secondary<=4700)

(class = 4 AND 160<=primary<=480 AND 6100<=secondary<=9999)

(class = 4 AND 160<=primary<=480 AND 6100<=secondary<=9999)

(class = 5 AND 160<=primary<=480 AND 3500<=secondary<=4700)

(class = 5 AND 160<=primary<=480 AND 3500<=secondary<=4700)

(class = 5 AND 160<=primary<=480 AND 6100<=secondary<=9999)

(class = 5 AND 160<=primary<=480 AND 6100<=secondary<=9999)

任何不符合任何条件的行都将被分配组其他".因此,最终的数据帧将如下所示:

Any rows that don't match any of the conditions will get assigned group "Other". So, the final dataframe would then look like this:

+-------+---------+-----------+-------+
| class | primary | secondary | group |
+-------+---------+-----------+-------+
|     1 |    2000 |      1232 | A     |
|     1 |    9100 |      3400 | A     |
|     4 |    1100 |      2400 | Other |
|     4 |     170 |       380 | C     |
|     5 |     300 |      3600 | B     |
|     5 |     210 |      4800 | C     |
|     2 |    1000 |      8600 | A     |
+-------+---------+-----------+-------+

推荐答案

您可以迭代 GroupBy 对象并获取每个组中掩码的并集:

You can iterate a GroupBy object and take the union of the masks within each group:

for key, grp in conditions.groupby('group'):

    cols = ['class', 'primary_lower', 'primary_upper',
            'secondary_lower', 'secondary_upper']

    masks = (data['class'].eq(cls) & \
             data['primary'].between(prim_lower, prim_upper) & \
             data['secondary'].between(sec_lower, sec_upper) \
             for cls, prim_lower, prim_upper, sec_lower, sec_upper in \
             grp[cols].itertuples(index=False))

    data.loc[pd.concat(masks, axis=1).any(1), 'group'] = key

data['group'] = data['group'].fillna('Other')

结果:

print(data)

   class  primary  secondary  group
0      1     2000       1232      A
1      1     9100       3400      A
2      4     1100       2400  Other
3      4      170        380      C
4      5      300       3600      C
5      5      210       4800      C
6      2     1000       8600      A

注意 index=4 与您想要的输出有不同的结果,因为有多个条件满足数据.

Note index=4 has a different result to your desired output since there are multiple conditions which satisfy the data.

这篇关于根据来自另一个数据帧的多个列条件创建列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆