根据pandas中的条件获取列名 [英] Get column name based on condition in pandas

查看:26
本文介绍了根据pandas中的条件获取列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下的数据框:

如果特定行的列在该列中包含 1,我想获取该列的名称.

例如

对于第 1 行:空白,对于第 2 行:制造,对于第 3 行:制造,对于第 4 行:制造,对于第 5 行:社交、金融、分析、广告、

现在我只能得到完整的行:

primary_sectors = lambda primary_sector:sectors[扇区[category_list"] == primary_sector]

请帮我获取上述数据框中列的名称.

我试过这个代码:

primary_sectors("3D").filter(items=["0"])

它给我输出为 1 但我需要输出为 Manufacturing

解决方案

首先

您的问题非常模棱两可,我建议您阅读此链接在@sammywemmy 的评论中.如果我正确理解您的问题...我们将首先讨论这个面具:

df.columns[(df == 1) # 掩码.any(axis=0) # 掩码]

发生了什么事?让我们从 df.columns[**HERE**] 开始向外工作:

  1. (df == 1) 使用 True/False(1/0)
  2. .any() 根据 文档:

<块引用>

除非在系列中或沿 Dataframe 轴至少有一个元素为 True 或等效元素,否则返回 False".

这为我们提供了一个方便的 Series 来屏蔽列名.

我们将使用这个例子来自动化您下面的解决方案


下一步:

自动获取 ( ,[, ,..]) 的输出,其中有 1 在行值中.尽管这在大型数据集上会变慢,但它应该可以解决问题:

将pandas导入为pd数据 = {'foo':[0,0,0,0], 'bar':[0, 1, 0, 0], 'baz':[0,0,0,0], '垃圾邮件':[0,1,0,1]}df = pd.DataFrame(data, index=['a','b','c','d'])打印(df)foo bar baz 垃圾邮件0 0 0 00 1 0 10 0 0 00 0 0 1

# 按索引对我们的 df 进行分组,并创建一个以 df 列表作为值的字典df_dict = dict(列表(df.groupby(df.index)))

下一步是一个 for 循环,它迭代 df_dict 中每个 df 的内容,用我们之前创建的掩码检查它们,并打印预期的结果:

for k, v in df_dict.items(): # k: 索引名, v: 是一个dfcheck = v.columns[(v == 1).any()]如果 len(检查)>0:打印((k,check.to_list()))

('b', ['bar', '垃圾邮件'])('d', ['垃圾邮件'])

旁注:

您看到我如何生成易于复制的示例数据了吗?以后请尽量用可复制的样本数据提问.这样可以帮助您更好地了解您的问题,我们也可以更轻松地为您解答.

I have a dataframe as below:

I want to get the name of the column if column of a particular row if it contains 1 in the that column.

e.g.

For Row 1: Blanks,
For Row 2: Manufacturing,
For Row 3: Manufacturing,
For Row 4: Manufacturing,
For Row 5: Social, Finance, Analytics, Advertising,

Right now I am able to get the complete row only:

primary_sectors = lambda primary_sector: sectors[
    sectors["category_list"] == primary_sector
]

Please help me to get the name of the column in the above dataframe.

I tried this code:

primary_sectors("3D").filter(items=["0"])

It gives me output as 1 but I need output as Manufacturing

解决方案

Firstly

Your question is very ambiguous and I recommend reading this link in @sammywemmy's comment. If I understand your problem correctly... we'll talk about this mask first:

df.columns[      
    (df == 1)        # mask 
    .any(axis=0)     # mask
]

What's happening? Lets work our way outward starting from within df.columns[**HERE**] :

  1. (df == 1) makes a boolean mask of the df with True/False(1/0)
  2. .any() as per the docs:

"Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent".

This gives us a handy Series to mask the column names with.

We will use this example to automate for your solution below


Next:

Automate to get an output of (<row index> ,[<col name>, <col name>,..]) where there is 1 in the row values. Although this will be slower on large datasets, it should do the trick:

import pandas as pd

data = {'foo':[0,0,0,0], 'bar':[0, 1, 0, 0], 'baz':[0,0,0,0], 'spam':[0,1,0,1]}
df = pd.DataFrame(data, index=['a','b','c','d'])

print(df)

   foo  bar  baz  spam
a    0    0    0     0
b    0    1    0     1
c    0    0    0     0
d    0    0    0     1

# group our df by index and creates a dict with lists of df's as values
df_dict = dict(
    list(
        df.groupby(df.index)
    )
)

Next step is a for loop that iterates the contents of each df in df_dict, checks them with the mask we created earlier, and prints the intended results:

for k, v in df_dict.items():               # k: name of index, v: is a df
    check = v.columns[(v == 1).any()]
    if len(check) > 0:
        print((k, check.to_list()))

('b', ['bar', 'spam'])
('d', ['spam'])

Side note:

You see how I generated sample data that can be easily reproduced? In the future, please try to ask questions with posted sample data that can be reproduced. This way it helps you understand your problem better and it is easier for us to answer it for you.

这篇关于根据pandas中的条件获取列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆