根据 pandas 的条件获取列名 [英] Get column name based on condition in pandas

查看:57
本文介绍了根据 pandas 的条件获取列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,如下所示:

I have a dataframe as below:

如果特定行的列在该列中包含1,我想获取该列的名称.

I want to get the name of the column if column of a particular row if it contains 1 in the that column.

例如

For Row 1: Blanks,
For Row 2: Manufacturing,
For Row 3: Manufacturing,
For Row 4: Manufacturing,
For Row 5: Social, Finance, Analytics, Advertising,

现在我只能获得完整的行:

Right now I am able to get the complete row only:

primary_sectors = lambda primary_sector: sectors[
    sectors["category_list"] == primary_sector
]

请帮助我获取上述数据框中的列名.

Please help me to get the name of the column in the above dataframe.

我试过这个代码:

primary_sectors("3D").filter(items=["0"])

它给我的输出为 1 ,但我需要输出为 Manufacturing

It gives me output as 1 but I need output as Manufacturing

推荐答案

首先

您的问题非常模棱两可,我建议阅读此链接在@sammywemmy的评论中.如果我正确理解了您的问题,那么我们将首先讨论这个面具:

Firstly

Your question is very ambiguous and I recommend reading this link in @sammywemmy's comment. If I understand your problem correctly... we'll talk about this mask first:

df.columns[      
    (df == 1)        # mask 
    .any(axis=0)     # mask
]

发生了什么事?让我们从 df.columns [** HERE **] 开始:

What's happening? Lets work our way outward starting from within df.columns[**HERE**] :

  1. (df == 1)使用 True / False ( 1 / 0 )
  2. .any()按照文档:
  1. (df == 1) makes a boolean mask of the df with True/False(1/0)
  2. .any() as per the docs:

除非在系列中或沿 Dataframe 轴至少有一个元素为 True 或等效元素,否则返回 False".

"Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent".

这为我们提供了方便的 Series 来屏蔽列名.

This gives us a handy Series to mask the column names with.

我们将使用此示例为您下面的解决方案自动化

自动获得(的输出,其中 1 在行值中.尽管这在大型数据集上会比较慢,但是应该可以解决问题:

Automate to get an output of (<row index> ,[<col name>, <col name>,..]) where there is 1 in the row values. Although this will be slower on large datasets, it should do the trick:

import pandas as pd

data = {'foo':[0,0,0,0], 'bar':[0, 1, 0, 0], 'baz':[0,0,0,0], 'spam':[0,1,0,1]}
df = pd.DataFrame(data, index=['a','b','c','d'])

print(df)

   foo  bar  baz  spam
a    0    0    0     0
b    0    1    0     1
c    0    0    0     0
d    0    0    0     1

# group our df by index and creates a dict with lists of df's as values
df_dict = dict(
    list(
        df.groupby(df.index)
    )
)

下一步是一个 for 循环,该循环迭代 df_dict 中每个df的内容,并使用我们之前创建的掩码对其进行检查,并打印出预期的结果:df_dict.items()中k,v的

Next step is a for loop that iterates the contents of each df in df_dict, checks them with the mask we created earlier, and prints the intended results:

for k, v in df_dict.items():               # k: name of index, v: is a df
    check = v.columns[(v == 1).any()]
    if len(check) > 0:
        print((k, check.to_list()))

('b', ['bar', 'spam'])
('d', ['spam'])

旁注:

您看到我如何生成可以轻松复制的样本数据吗?将来,请尝试对发布的示例数据提出问题,这些示例数据可以被复制.这样可以帮助您更好地了解您的问题,并且我们可以更轻松地为您解决问题.

Side note:

You see how I generated sample data that can be easily reproduced? In the future, please try to ask questions with posted sample data that can be reproduced. This way it helps you understand your problem better and it is easier for us to answer it for you.

这篇关于根据 pandas 的条件获取列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆