将计算出的列附加到现有数据框 [英] Attach a calculated column to an existing dataframe
问题描述
我开始学习熊猫,我一直在关注问题 我明白了: 有人知道问题出在哪里吗?如何将计算出的值重新插入到数据框中,以便按顺序显示值(对于每个标签X,Y,Z降序为新") 问题是,如错误消息所言,您要插入的计算列的索引与 而计算列的索引是MultiIndex(您也已经在输出中看到),假设我们将其称为 因此,您不能将其插入框架.但是,这是0.12中的错误,因为它确实在0.13中有效(已针对链接的问题中的答案进行了测试),并且关键字 0.12的解决方案:
在熊猫0.13(开发中)中,此问题已修复( https://github.com/pydata/pandas/pull/4670 ).因此,在groupby调用中使用了 I am starting to learn Pandas, and I was following the question here and could not get the solution proposed to work for me and I get an indexing error. This is what I have which outputs the following (I am using iPython) Then, I try to append the cumulative number calculation under the label "new" as suggested in the post I get this: Does anybody knows what the problem is? How can I reinsert the calculated value into the dataframe so it shows the values in order (descending by "new" for each label X, Y, Z.) The problem is, as the Error message says, that the index of the calculated column you want to insert is incompatible with the index of The index of while the index of the calculated column is a MultiIndex (as you also already can see in the output), supposing we call it For this reason, you cannot insert it into the frame. However, this is a bug in 0.12, as this does work in 0.13 (for which the answer in the linked question was tested) and the keyword SOLUTION for 0.12:
In pandas 0.13 (in development) this is fixed (https://github.com/pydata/pandas/pull/4670). It is for this reason the 这篇关于将计算出的列附加到现有数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!df["new"] = df.groupby("L1", as_index=False).apply(lambda x : pd.expanding_sum(x.sort("L3", ascending=False)["L3"])/x["L3"].sum())
2196 value = value.reindex(self.index).values
2197 except:
-> 2198 raise TypeError('incompatible index of inserted column '
2199 'with frame index')
2200
TypeError: incompatible index of inserted column with frame index
df
的索引不兼容.>
df
的索引是一个简单的索引:In [8]: df.index
Out[8]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')
new_column
:In [15]: new_column.index
Out[15]:
MultiIndex
[(u'X', 3), (u'X', 1), (u'X', 0), (u'Y', 8), (u'Y', 7), (u'Y', 5), (u'Z', 6), (u'Z', 2), (u'Z', 4)]
as_index=False
应该确保未添加列L1
到索引.
删除MultiIndex的第一级,以便您恢复原始索引:In [13]: new_column = df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
In [14]: df["new"] = new_column.reset_index(level=0, drop=True)
as_index=False
,因此未将列L1
(您进行分组的列)添加到索引中(创建MultiIndex),因此保留了原始索引,并且结果可以附加到原始框架.但是使用apply
时,似乎在0.12中忽略了as_index
关键字.from pandas import *
import pandas as pd
d = {'L1' : Series(['X','X','Z','X','Z','Y','Z','Y','Y',]),
'L2' : Series([1,2,1,3,2,1,3,2,3]),
'L3' : Series([50,100,15,200,10,1,20,10,100])}
df = DataFrame(d)
df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
L1
X 3 0.571429
1 0.857143
0 1.000000
Y 8 0.900901
7 0.990991
5 1.000000
Z 6 0.444444
2 0.777778
4 1.000000
dtype: float64
df["new"] = df.groupby("L1", as_index=False).apply(lambda x : pd.expanding_sum(x.sort("L3", ascending=False)["L3"])/x["L3"].sum())
2196 value = value.reindex(self.index).values
2197 except:
-> 2198 raise TypeError('incompatible index of inserted column '
2199 'with frame index')
2200
TypeError: incompatible index of inserted column with frame index
df
.df
is a simple index:In [8]: df.index
Out[8]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')
new_column
:In [15]: new_column.index
Out[15]:
MultiIndex
[(u'X', 3), (u'X', 1), (u'X', 0), (u'Y', 8), (u'Y', 7), (u'Y', 5), (u'Z', 6), (u'Z', 2), (u'Z', 4)]
as_index=False
should ensure the column L1
is not added to the index.
Remove the first level of the MultiIndex, so you get back the original index:In [13]: new_column = df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
In [14]: df["new"] = new_column.reset_index(level=0, drop=True)
as_index=False
is used in the groupby call, so the column L1
(fow which you group) is not added to the index (creating a MultiIndex), so the original index is retained and the result can be appended to the original frame. But it seems the as_index
keyword is ignored in 0.12 when using apply
.