类型错误:应用自定义函数时插入列的索引与框架索引不兼容 [英] TypeError: incompatible index of inserted column with frame index when applying a custom function
问题描述
我想对数据框的组应用一个函数,并将函数输出作为一个新列.
这是我写的函数:
def get_centroids(sample):# 理想情况下,re = complex_function(sample) 返回与样本长度相同的一维数组# 为简单起见,我们使用 np.random.rand(len(sample))re = pd.DataFrame({'B': np.random.rand(len(sample))})打印(重新)打印(重新索引)返回
函数打印,
B0 0.1760831 0.984371范围索引(开始=0,停止=2,步骤=1)
让我们看看这个数据框.为简单起见,它只有一组a".
df = pd.DataFrame({'A': 'a a'.split(),'B': [1,43],'C': [4,2]})乙丙0 1 41 43 2打印(df.index)范围索引(开始=0,停止=2,步骤=1)
当我应用如下函数时,
df['test'] = df.groupby('A')[['B']].apply(get_centroids)
它抛出类型错误:插入列的索引与帧索引不兼容";尽管 df 和 re 具有相似类型的索引.任何帮助将不胜感激.
当我在考虑这些建议时,我意识到 df.groupby('A')[['B']].apply(get_centroids)
单独工作正常,分配导致错误.
换句话说,df
的接收效果不佳df.groupby('A')[['B']].apply(get_centroids)
.然后我决定检查 df.groupby('A')[['B']].apply(get_centroids).index
这是
MultiIndex([('a', 0),('a', 1)],名称=['A',无])
df
的索引是 RangeIndex(start=0, stop=2, step=1)
.因此,RangeIndex
与 MultiIndex
不匹配导致了问题.
这可以通过如下重置和设置df.groupby('A')[['B']].apply(get_centroids)
的索引来解决.
df['test'] = df.groupby('A')[['B']].apply(get_centroids).reset_index().set_index('level_1').drop('A',轴=1)
此处https://stackoverflow.com/a/66709059/8561039提出了相同的解决方案.>
I want to apply a function on groups of a data frame and get the function output as a new column.
Here is the function that I wrote:
def get_centroids(sample):
# Ideally, re = complex_function(sample) that returns 1d array which has the same length as sample
# for simplicity let's use np.random.rand(len(sample))
re = pd.DataFrame({'B': np.random.rand(len(sample))})
print(re)
print(re.index)
return re
The function prints,
B
0 0.176083
1 0.984371
RangeIndex(start=0, stop=2, step=1)
Let's look at this data frame. For simplicity, it has only one group 'a'.
df = pd.DataFrame({'A': 'a a'.split(),
'B': [1,43],
'C': [4,2]})
A B C
0 a 1 4
1 a 43 2
print(df.index)
RangeIndex(start=0, stop=2, step=1)
When I apply the function as below,
df['test'] = df.groupby('A')[['B']].apply(get_centroids)
it throws "TypeError: incompatible index of inserted column with frame index" though df and re has the similar type of indexes. Any help would be appreciated.
While I was playing around with the suggestions, I realised that df.groupby('A')[['B']].apply(get_centroids)
alone works fine, and the assignment causes the error.
In other words, df
does not receive well df.groupby('A')[['B']].apply(get_centroids)
. I then decided to check for df.groupby('A')[['B']].apply(get_centroids).index
which is
MultiIndex([('a', 0),
('a', 1)],
names=['A', None])
The index of df
was RangeIndex(start=0, stop=2, step=1)
. Therefore, RangeIndex
vs MultiIndex
mismatach caused the issue.
This can be solved by resetting and setting the index of df.groupby('A')[['B']].apply(get_centroids)
as below.
df['test'] = df.groupby('A')[['B']].apply(get_centroids).reset_index().set_index('level_1').drop('A',axis=1)
The same solution has been proposed here https://stackoverflow.com/a/66709059/8561039.
这篇关于类型错误:应用自定义函数时插入列的索引与框架索引不兼容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!