获取列名称,其中值是pandas数据框中的值 [英] Get column name where value is something in pandas dataframe
问题描述
我试图在每个时间戳上查找数据帧中的列名称,该列的值与同一时间戳上的时间序列中的列名称匹配.
I'm trying to find, at each timestamp, the column name in a dataframe for which the value matches with the one in a timeseries at the same timestamp.
这是我的数据框:
>>> df
col5 col4 col3 col2 col1
1979-01-01 00:00:00 1181.220328 912.154923 648.848635 390.986156 138.185861
1979-01-01 06:00:00 1190.724461 920.767974 657.099560 399.395338 147.761352
1979-01-01 12:00:00 1193.414510 918.121482 648.558837 384.632475 126.254342
1979-01-01 18:00:00 1171.670276 897.585930 629.201469 366.652033 109.545607
1979-01-02 00:00:00 1168.892579 900.375126 638.377583 382.584568 132.998706
>>> df.to_dict()
{'col4': {<Timestamp: 1979-01-01 06:00:00>: 920.76797370744271, <Timestamp: 1979-01-01 00:00:00>: 912.15492332839756, <Timestamp: 1979-01-01 18:00:00>: 897.58592995700656, <Timestamp: 1979-01-01 12:00:00>: 918.1214819496729}, 'col5': {<Timestamp: 1979-01-01 06:00:00>: 1190.7244605667831, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 1171.6702763228691, <Timestamp: 1979-01-01 12:00:00>: 1193.4145103184442}, 'col2': {<Timestamp: 1979-01-01 06:00:00>: 399.39533771666561, <Timestamp: 1979-01-01 00:00:00>: 390.98615646597591, <Timestamp: 1979-01-01 18:00:00>: 366.65203285812231, <Timestamp: 1979-01-01 12:00:00>: 384.63247469269874}, 'col3': {<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 648.84863460462293, <Timestamp: 1979-01-01 18:00:00>: 629.20146872682449, <Timestamp: 1979-01-01 12:00:00>: 648.55883747413225}, 'col1': {<Timestamp: 1979-01-01 06:00:00>: 147.7613518219286, <Timestamp: 1979-01-01 00:00:00>: 138.18586102094068, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377}}
时间序列以及我想在每个时间戳上匹配的值:
And the time series with values I want to match at each timestamp:
>>> ts
1979-01-01 00:00:00 1181.220328
1979-01-01 06:00:00 657.099560
1979-01-01 12:00:00 126.254342
1979-01-01 18:00:00 109.545607
Freq: 6H
>>> ts.to_dict()
{<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377}
那么结果将是:
>>> df_result
value Column
1979-01-01 00:00:00 1181.220328 col5
1979-01-01 06:00:00 657.099560 col3
1979-01-01 12:00:00 126.254342 col1
1979-01-01 18:00:00 109.545607 col1
我希望我的问题足够清楚.任何人都有一个想法如何获得df_result?
I hope my question is clear enough. Anyone has an idea how to get df_result?
谢谢
格雷格
推荐答案
这是一种可能不太精致的方法:
Here is one, perhaps inelegant, way to do it:
df_result = pd.DataFrame(ts, columns=['value'])
设置一个函数,该函数获取包含值的列名(来自ts
):
Set up a function which grabs the column name which contains the value (from ts
):
def get_col_name(row):
b = (df.ix[row.name] == row['value'])
return b.index[b.argmax()]
对于每一行,测试哪些元素等于该值,并提取True的列名称.
然后 apply
(按行):
And apply
it (row-wise):
In [3]: df_result.apply(get_col_name, axis=1)
Out[3]:
1979-01-01 00:00:00 col5
1979-01-01 06:00:00 col3
1979-01-01 12:00:00 col1
1979-01-01 18:00:00 col1
即使用df_result['Column'] = df_result.apply(get_col_name, axis=1)
.
.
注意:get_col_name
中发生了很多事情,因此也许值得进一步解释:
Note: there is quite a lot going on in get_col_name
so perhaps it warrants some further explanation:
In [4]: row = df_result.irow(0) # an example row to pass to get_col_name
In [5]: row
Out[5]:
value 1181.220328
Name: 1979-01-01 00:00:00
In [6]: row.name # use to get rows of df
Out[6]: <Timestamp: 1979-01-01 00:00:00>
In [7]: df.ix[row.name]
Out[7]:
col5 1181.220328
col4 912.154923
col3 648.848635
col2 390.986156
col1 138.185861
Name: 1979-01-01 00:00:00
In [8]: b = (df.ix[row.name] == row['value'])
#checks whether each elements equal row['value'] = 1181.220328
In [9]: b
Out[9]:
col5 True
col4 False
col3 False
col2 False
col1 False
Name: 1979-01-01 00:00:00
In [10]: b.argmax() # index of a True value
Out[10]: 0
In [11]: b.index[b.argmax()] # the index value (column name)
Out[11]: 'col5'
可能有更有效的方法来完成此操作...
这篇关于获取列名称,其中值是pandas数据框中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!