如何通过将列的类别拆分为集合来过滤数据框? [英] How to filter dataframe by splitting categories of a columns into sets?

查看:41
本文介绍了如何通过将列的类别拆分为集合来过滤数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框:

Prop_ID    Unit_ID      Prop_Usage                     Unit_Usage
1          1            RESIDENTIAL                    RESIDENTIAL
1          2            RESIDENTIAL                    COMMERCIAL
1          3            RESIDENTIAL                    INDUSTRIAL
1          4            RESIDENTIAL                    RESIDENTIAL
2          1            COMMERCIAL                     RESIDENTIAL
2          2            COMMERCIAL                     COMMERCIAL
2          3            COMMERCIAL                     COMMERCIAL
3          1            INDUSTRIAL                     INDUSTRIAL
3          2            INDUSTRIAL                     COMMERCIAL
4          1            RESIDENTIAL - COMMERCIAL       RESIDENTIAL
4          2            RESIDENTIAL - COMMERCIAL       COMMERCIAL
4          3            RESIDENTIAL - COMMERCIAL       INDUSTRIAL
5          1            COMMERCIAL / RESIDENTIAL       RESIDENTIAL
5          2            COMMERCIAL / RESIDENTIAL       COMMERCIAL
5          3            COMMERCIAL / RESIDENTIAL       INDUSTRIAL
5          4            COMMERCIAL / RESIDENTIAL       COMMERCIAL

一个房产可能有超过 1 个单位.这意味着单位是属性的子类别.我想过滤 Prop_UsageUnit_Usage 不匹配的行.我们在 Prop_Usage 列中有一个类别是 RESIDENTIAL - COMMERCIAL 然后 Unit_Usage 可以是 RESIDENTIALCOMMERCIAL.COMMERCIAL/RESIDENTIAL 也是如此.

One property may have more than 1 unit. That means units are the subcategory of properties. I want to filter rows where Prop_Usage does not match with Unit_Usage. We have a category in Prop_Usage column that's RESIDENTIAL - COMMERCIAL then Unit_Usage can be either RESIDENTIAL or COMMERCIAL. Similarly for COMMERCIAL / RESIDENTIAL.

预期输出:

Prop_ID    Unit_ID      Prop_Usage                   Unit_Usage
1          2            RESIDENTIAL                  COMMERCIAL
1          3            RESIDENTIAL                  INDUSTRIAL
2          1            COMMERCIAL                   RESIDENTIAL
3          2            INDUSTRIAL                   COMMERCIAL
4          3            RESIDENTIAL - COMMERCIAL     INDUSTRIAL
5          3            COMMERCIAL / RESIDENTIAL     INDUSTRIAL

推荐答案

DataFrame.apply:

Use in statement in DataFrame.apply:

df = df[~df.apply(lambda x: x['Unit_Usage'] in x['Prop_Usage'], axis=1)]

或者在列表推导中使用zip:

Or use zip in list comprehension:

df = df[[not a in b for a, b in zip(df['Unit_Usage'], df['Prop_Usage'])]]

<小时>

print (df)
    Prop_ID  Unit_ID                Prop_Usage   Unit_Usage
1         1        2               RESIDENTIAL   COMMERCIAL
2         1        3               RESIDENTIAL   INDUSTRIAL
4         2        1                COMMERCIAL  RESIDENTIAL
8         3        2                INDUSTRIAL   COMMERCIAL
11        4        3  RESIDENTIAL - COMMERCIAL   INDUSTRIAL
14        5        3  COMMERCIAL / RESIDENTIAL   INDUSTRIAL

这篇关于如何通过将列的类别拆分为集合来过滤数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆