pandas 返回每一行的列名应用功能 [英] pandas return column name apply function for each row
本文介绍了 pandas 返回每一行的列名应用功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在研究熊猫数据集.对于2D数据框,请尝试返回/附加一列,该列将返回其值超过0.95的列名.
I am working on the pandas dataset. For 2D dataframe try to return/append one column which return the column name whose value is over 0.95.
import pandas as pd
import numpy as np
Exp_day_list = ["EXP_DAY_1","EXP_DAY_2","EXP_DAY_3","EXP_DAY_4","EXP_DAY_5","EXP_DAY_6","EXP_DAY_7","EXP_DAY_8","EXP_DAY_9","EXP_DAY_10","EXP_GT_DAY_10"]
test = raw_databased.head()
Exp_day_percentage = test[Exp_day_list]
def over_95_percent(x):
for column in x:
if x[column] > 0.95:
return column
break
Exp_day_percentage.apply(over_95_percent,axis = 1)
我测试了Exp_day_percentage,结果就是我所需要的.
I test Exp_day_percentage and result is as what I need.
Exp_day_percentage
Out[2]:
EXP_DAY_1 EXP_DAY_2 EXP_DAY_3 EXP_DAY_4 EXP_DAY_5 EXP_DAY_6 \
0 0.0 0.0 0.52 0.94 0.94 1.0
1 0.0 0.0 0.00 0.66 1.00 1.0
2 0.0 1.0 1.00 1.00 1.00 1.0
3 0.0 0.0 0.92 1.00 1.00 1.0
4 0.0 0.0 0.95 0.97 1.00 1.0
EXP_DAY_7 EXP_DAY_8 EXP_DAY_9 EXP_DAY_10 EXP_GT_DAY_10
0 1.0 1.0 1.0 1.0 0.0
1 1.0 1.0 1.0 1.0 0.0
2 1.0 1.0 1.0 1.0 0.0
3 1.0 1.0 1.0 1.0 0.0
4 1.0 1.0 1.0 1.0 0.0
但是当我对那个数据帧运行apply函数时,错误函数如下:
but when I run the apply function to that dataframe, error function as following:
TypeError: ("cannot do label indexing on <class 'pandas.indexes.base.Index'>
with these indexers [0.0] of <type 'numpy.float64'>", u'occurred at index 0')
理想结果如下:
EXP_DAY_1 EXP_DAY_2 EXP_DAY_3 EXP_DAY_4 EXP_DAY_5 EXP_DAY_6 \
0 0.0 0.0 0.52 0.94 0.94 1.0
1 0.0 0.0 0.00 0.66 1.00 1.0
2 0.0 1.0 1.00 1.00 1.00 1.0
3 0.0 0.0 0.92 1.00 1.00 1.0
4 0.0 0.0 0.95 0.97 1.00 1.0
EXP_DAY_7 EXP_DAY_8 EXP_DAY_9 EXP_DAY_10 EXP_GT_DAY_10 Column
0 1.0 1.0 1.0 1.0 0.0 EXP_DAY_5
1 1.0 1.0 1.0 1.0 0.0 EXP_DAY_5
2 1.0 1.0 1.0 1.0 0.0 EXP_DAY_2
3 1.0 1.0 1.0 1.0 0.0 EXP_DAY_4
4 1.0 1.0 1.0 1.0 0.0 EXP_DAY_3
如果有人可以帮助我,我将非常感谢.我搜索了所有互联网,但找不到相似的内容.谢谢
if anyone can help me on that, I would much appreciate that. I search all internet and could not find similar thing. thank you
推荐答案
使用pd.DataFrame.idxmax
df.assign(Column=df.gt(.95).assign(zip5=1).idxmax(1))
EXP_DAY_1 EXP_DAY_2 EXP_DAY_3 EXP_DAY_4 EXP_DAY_5 EXP_DAY_6 EXP_DAY_7 EXP_DAY_8 EXP_DAY_9 EXP_DAY_10 EXP_GT_DAY_10 Column
0 0.0 0.0 0.52 0.94 0.94 1.0 1.0 1.0 1.0 1.0 0.0 EXP_DAY_6
1 0.0 0.0 0.00 0.66 1.00 1.0 1.0 1.0 1.0 1.0 0.0 EXP_DAY_5
2 0.0 1.0 1.00 1.00 1.00 1.0 1.0 1.0 1.0 1.0 0.0 EXP_DAY_2
3 0.0 0.0 0.92 1.00 1.00 1.0 1.0 1.0 1.0 1.0 0.0 EXP_DAY_4
4 0.0 0.0 0.95 0.97 1.00 1.0 1.0 1.0 1.0 1.0 0.0 EXP_DAY_4
这篇关于 pandas 返回每一行的列名应用功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文