类型错误:应用自定义函数时插入列的索引与框架索引不兼容 [英] TypeError: incompatible index of inserted column with frame index when applying a custom function

查看:45
本文介绍了类型错误:应用自定义函数时插入列的索引与框架索引不兼容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对数据框的组应用一个函数,并将函数输出作为一个新列.

这是我写的函数:

def get_centroids(sample):# 理想情况下,re = complex_function(sample) 返回与样本长度相同的一维数组# 为简单起见,我们使用 np.random.rand(len(sample))re = pd.DataFrame({'B': np.random.rand(len(sample))})打印(重新)打印(重新索引)返回

函数打印,

 B0 0.1760831 0.984371范围索引(开始=0,停止=2,步骤=1)

让我们看看这个数据框.为简单起见,它只有一组a".

df = pd.DataFrame({'A': 'a a'.split(),'B': [1,43],'C': [4,2]})乙丙0 1 41 43​​ 2打印(df.index)范围索引(开始=0,停止=2,步骤=1)

当我应用如下函数时,

df['test'] = df.groupby('A')[['B']].apply(get_centroids)

它抛出类型错误:插入列的索引与帧索引不兼容";尽管 df 和 re 具有相似类型的索引.任何帮助将不胜感激.

解决方案

当我在考虑这些建议时,我意识到 df.groupby('A')[['B']].apply(get_centroids) 单独工作正常,分配导致错误.

换句话说,df 的接收效果不佳df.groupby('A')[['B']].apply(get_centroids).然后我决定检查 df.groupby('A')[['B']].apply(get_centroids).index 这是

MultiIndex([('a', 0),('a', 1)],名称=['A',无])

df 的索引是 RangeIndex(start=0, stop=2, step=1).因此,RangeIndexMultiIndex 不匹配导致了问题.

这可以通过如下重置和设置df.groupby('A')[['B']].apply(get_centroids)的索引来解决.

df['test'] = df.groupby('A')[['B']].apply(get_centroids).reset_index().set_index('level_1').drop('A',轴=1)

此处https://stackoverflow.com/a/66709059/8561039提出了相同的解决方案.>

I want to apply a function on groups of a data frame and get the function output as a new column.

Here is the function that I wrote:

def get_centroids(sample):
    
    # Ideally, re = complex_function(sample) that returns 1d array which has the same length as sample
    # for simplicity let's use np.random.rand(len(sample))

    re = pd.DataFrame({'B': np.random.rand(len(sample))})
    print(re)
    print(re.index)  
    return re

The function prints,

   B
0  0.176083
1  0.984371

RangeIndex(start=0, stop=2, step=1)

Let's look at this data frame. For simplicity, it has only one group 'a'.

df = pd.DataFrame({'A': 'a a'.split(),
                   'B': [1,43],
                   'C': [4,2]})

    A   B   C
0   a   1   4
1   a   43  2

print(df.index)
RangeIndex(start=0, stop=2, step=1)

When I apply the function as below,

df['test'] = df.groupby('A')[['B']].apply(get_centroids)

it throws "TypeError: incompatible index of inserted column with frame index" though df and re has the similar type of indexes. Any help would be appreciated.

解决方案

While I was playing around with the suggestions, I realised that df.groupby('A')[['B']].apply(get_centroids) alone works fine, and the assignment causes the error.

In other words, df does not receive well df.groupby('A')[['B']].apply(get_centroids). I then decided to check for df.groupby('A')[['B']].apply(get_centroids).index which is

MultiIndex([('a', 0),
            ('a', 1)],
           names=['A', None])

The index of df was RangeIndex(start=0, stop=2, step=1). Therefore, RangeIndex vs MultiIndex mismatach caused the issue.

This can be solved by resetting and setting the index of df.groupby('A')[['B']].apply(get_centroids) as below.

df['test'] = df.groupby('A')[['B']].apply(get_centroids).reset_index().set_index('level_1').drop('A',axis=1)

The same solution has been proposed here https://stackoverflow.com/a/66709059/8561039.

这篇关于类型错误:应用自定义函数时插入列的索引与框架索引不兼容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆