将计算出的列附加到现有数据框 [英] Attach a calculated column to an existing dataframe

查看:75
本文介绍了将计算出的列附加到现有数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始学习熊猫,我一直在关注问题

df["new"] = df.groupby("L1", as_index=False).apply(lambda x : pd.expanding_sum(x.sort("L3", ascending=False)["L3"])/x["L3"].sum())

我明白了:

   2196                         value = value.reindex(self.index).values
   2197                     except:
-> 2198                         raise TypeError('incompatible index of inserted column '
   2199                                         'with frame index')
   2200 
TypeError: incompatible index of inserted column with frame index

有人知道问题出在哪里吗?如何将计算出的值重新插入到数据框中,以便按顺序显示值(对于每个标签X,Y,Z降序为新")

问题是,如错误消息所言,您要插入的计算列的索引与df的索引不兼容.

df的索引是一个简单的索引:

In [8]: df.index
Out[8]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')

而计算列的索引是MultiIndex(您也已经在输出中看到),假设我们将其称为new_column:

In [15]: new_column.index
Out[15]: 
MultiIndex
[(u'X', 3), (u'X', 1), (u'X', 0), (u'Y', 8), (u'Y', 7), (u'Y', 5), (u'Z', 6), (u'Z', 2), (u'Z', 4)]

因此,您不能将其插入框架.但是,这是0.12中的错误,因为它确实在0.13中有效(已针对链接的问题中的答案进行了测试),并且关键字as_index=False应该确保未添加列L1到索引.

0.12的解决方案:
删除MultiIndex的第一级,以便您恢复原始索引:

In [13]: new_column = df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
In [14]: df["new"] = new_column.reset_index(level=0, drop=True)


在熊猫0.13(开发中)中,此问题已修复( https://github.com/pydata/pandas/pull/4670 ).因此,在groupby调用中使用了as_index=False,因此未将列L1(您进行分组的列)添加到索引中(创建MultiIndex),因此保留了原始索引,并且结果可以附加到原始框架.但是使用apply时,似乎在0.12中忽略了as_index关键字.

I am starting to learn Pandas, and I was following the question here and could not get the solution proposed to work for me and I get an indexing error. This is what I have

from pandas import *
import pandas as pd
d = {'L1' : Series(['X','X','Z','X','Z','Y','Z','Y','Y',]),
     'L2' : Series([1,2,1,3,2,1,3,2,3]),
     'L3' : Series([50,100,15,200,10,1,20,10,100])}
df = DataFrame(d)  
df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())

which outputs the following (I am using iPython)

L1   
X   3    0.571429
    1    0.857143
    0    1.000000
Y   8    0.900901
    7    0.990991
    5    1.000000
Z   6    0.444444
    2    0.777778
    4    1.000000
dtype: float64

Then, I try to append the cumulative number calculation under the label "new" as suggested in the post

df["new"] = df.groupby("L1", as_index=False).apply(lambda x : pd.expanding_sum(x.sort("L3", ascending=False)["L3"])/x["L3"].sum())

I get this:

   2196                         value = value.reindex(self.index).values
   2197                     except:
-> 2198                         raise TypeError('incompatible index of inserted column '
   2199                                         'with frame index')
   2200 
TypeError: incompatible index of inserted column with frame index

Does anybody knows what the problem is? How can I reinsert the calculated value into the dataframe so it shows the values in order (descending by "new" for each label X, Y, Z.)

解决方案

The problem is, as the Error message says, that the index of the calculated column you want to insert is incompatible with the index of df.

The index of df is a simple index:

In [8]: df.index
Out[8]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')

while the index of the calculated column is a MultiIndex (as you also already can see in the output), supposing we call it new_column:

In [15]: new_column.index
Out[15]: 
MultiIndex
[(u'X', 3), (u'X', 1), (u'X', 0), (u'Y', 8), (u'Y', 7), (u'Y', 5), (u'Z', 6), (u'Z', 2), (u'Z', 4)]

For this reason, you cannot insert it into the frame. However, this is a bug in 0.12, as this does work in 0.13 (for which the answer in the linked question was tested) and the keyword as_index=False should ensure the column L1 is not added to the index.

SOLUTION for 0.12:
Remove the first level of the MultiIndex, so you get back the original index:

In [13]: new_column = df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
In [14]: df["new"] = new_column.reset_index(level=0, drop=True)


In pandas 0.13 (in development) this is fixed (https://github.com/pydata/pandas/pull/4670). It is for this reason the as_index=False is used in the groupby call, so the column L1 (fow which you group) is not added to the index (creating a MultiIndex), so the original index is retained and the result can be appended to the original frame. But it seems the as_index keyword is ignored in 0.12 when using apply.

这篇关于将计算出的列附加到现有数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆