将列添加到pandas数据框中,其中包含每一行的最大值以及相应的列名 [英] Add columns to pandas dataframe containing max of each row, AND corresponding column name

查看:357
本文介绍了将列添加到pandas数据框中,其中包含每一行的最大值以及相应的列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的系统

Windows 7,64位

Windows 7, 64 bit

python 3.5.1

python 3.5.1

挑战

我有一个pandas数据框,我想知道每行的最大值,并将该信息附加为新列.我也想知道最大值所在的列的名称.而且我想在现有数据框中添加另一列,其中包含可以找到最大值的列的名称.

I've got a pandas dataframe, and I would like to know the maximum value for each row, and append that info as a new column. I would also like to know the name of the column where the maximum value is located. And I would like to add another column to the existing dataframe containing the name of the column where the max value can be found.

可复制的示例

In[1]:
# Make pandas dataframe
df = pd.DataFrame({'a':[1,0,0,1,3], 'b':[0,0,1,0,1], 'c':[0,0,0,0,0]})

# Calculate max 
my_series = df.max(numeric_only=True, axis = 1)
my_series.name = "maxval"

# Include maxval in df
df = df.join(my_series)
df        

Out[1]:
    a  b  c  maxval
0   1  0  0  1
1   0  0  0  0
2   0  1  0  1
3   1  0  0  1
4   3  1  0  3

到目前为止,一切都很好.现在,对于将另一列添加到包含该列名称的现有数据框中:

So far so good. Now for the add another column to the existing dataframe containing the name of the column part:

In[2]:
?
?
?


# This is what I'd like to accomplish:
Out[2]:
        a  b  c  maxval maxcol
    0   1  0  0  1      a
    1   0  0  0  0      a,b,c       
    2   0  1  0  1      b
    3   1  0  0  1      a
    4   3  1  0  3      a

请注意,如果多个列包含相同的最大值,我想返回所有列名称.另外请注意, maxcol 中未包含 maxval 列,因为这没有多大意义.在此先感谢任何有兴趣的人.

Notice that I'd like to return all column names if multiple columns contain the same maximum value. Also please notice that the column maxval is not included in maxcol since that would not make much sense. Thanks in advance if anyone out there finds this interesting.

推荐答案

您可以使用eqaxis=0将df与maxval进行比较,然后将applylambda一起使用以生成布尔掩码遮盖列并join列:

You can compare the df against maxval using eq with axis=0, then use apply with a lambda to produce a boolean mask to mask the columns and join them:

In [183]:
df['maxcol'] = df.ix[:,:'c'].eq(df['maxval'], axis=0).apply(lambda x: ','.join(df.columns[:3][x==x.max()]),axis=1)
df

Out[183]:
   a  b  c  maxval maxcol
0  1  0  0       1      a
1  0  0  0       0  a,b,c
2  0  1  0       1      b
3  1  0  0       1      a
4  3  1  0       3      a

这篇关于将列添加到pandas数据框中,其中包含每一行的最大值以及相应的列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆