将列添加到pandas数据框中,其中包含每一行的最大值以及相应的列名 [英] Add columns to pandas dataframe containing max of each row, AND corresponding column name
问题描述
我的系统
Windows 7,64位
Windows 7, 64 bit
python 3.5.1
python 3.5.1
挑战
我有一个pandas数据框,我想知道每行的最大值,并将该信息附加为新列.我也想知道最大值所在的列的名称.而且我想在现有数据框中添加另一列,其中包含可以找到最大值的列的名称.
I've got a pandas dataframe, and I would like to know the maximum value for each row, and append that info as a new column. I would also like to know the name of the column where the maximum value is located. And I would like to add another column to the existing dataframe containing the name of the column where the max value can be found.
在 可复制的示例
In[1]:
# Make pandas dataframe
df = pd.DataFrame({'a':[1,0,0,1,3], 'b':[0,0,1,0,1], 'c':[0,0,0,0,0]})
# Calculate max
my_series = df.max(numeric_only=True, axis = 1)
my_series.name = "maxval"
# Include maxval in df
df = df.join(my_series)
df
Out[1]:
a b c maxval
0 1 0 0 1
1 0 0 0 0
2 0 1 0 1
3 1 0 0 1
4 3 1 0 3
到目前为止,一切都很好.现在,对于将另一列添加到包含该列名称的现有数据框中:
So far so good. Now for the add another column to the existing dataframe containing the name of the column part:
In[2]:
?
?
?
# This is what I'd like to accomplish:
Out[2]:
a b c maxval maxcol
0 1 0 0 1 a
1 0 0 0 0 a,b,c
2 0 1 0 1 b
3 1 0 0 1 a
4 3 1 0 3 a
请注意,如果多个列包含相同的最大值,我想返回所有列名称.另外请注意, maxcol 中未包含 maxval 列,因为这没有多大意义.在此先感谢任何有兴趣的人.
Notice that I'd like to return all column names if multiple columns contain the same maximum value. Also please notice that the column maxval is not included in maxcol since that would not make much sense. Thanks in advance if anyone out there finds this interesting.
推荐答案
您可以使用eq
和axis=0
将df与maxval
进行比较,然后将apply
与lambda
一起使用以生成布尔掩码遮盖列并join
列:
You can compare the df against maxval
using eq
with axis=0
, then use apply
with a lambda
to produce a boolean mask to mask the columns and join
them:
In [183]:
df['maxcol'] = df.ix[:,:'c'].eq(df['maxval'], axis=0).apply(lambda x: ','.join(df.columns[:3][x==x.max()]),axis=1)
df
Out[183]:
a b c maxval maxcol
0 1 0 0 1 a
1 0 0 0 0 a,b,c
2 0 1 0 1 b
3 1 0 0 1 a
4 3 1 0 3 a
这篇关于将列添加到pandas数据框中,其中包含每一行的最大值以及相应的列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!