在 Pandas DataFrame 中调用 groupby 对象时出错 [英] Error when calling a groupby object inside a Pandas DataFrame
问题描述
我有这个数据框:
person_code #CNAE growth size
0 231 32 0.54 32
1 233 43 0.12 333
2 432 32 0.44 21
3 431 56 0.32 23
4 654 89 0.12 89
5 764 32 0.20 211
6 434 32 0.82 90
我需要创建一个名为top3growth"的新列.为此,我需要为每一行检查 df 的 #CNAE 并添加一个额外的列,指出哪些是该 CNAE 增长最高的 3 个人(它将在 df 数据帧中添加一个数据帧).要创建top3dfs",我正在使用这个 groupby:
I need to create a new column called "top3growth". For that I will need to check df's #CNAE for each row and add an extra column pointing out which are the 3 persons with highest growth for that CNAE (it will add a dataframe inside the df dataframe). To create the "top3dfs" I'm using this groupby:
a=sql2.groupby('#CNAE',group_keys=False).apply(pd.DataFrame.nlargest,n=3,columns='growth')
(这个解决方案来自 这个问题.)
(This solution came out of this question.)
它应该是这样的:
person_code #CNAE growth size top3growth ...
0 . 231 32 0.54 32 [df_top3_type_32]
1 . 233 43 0.12 333 [df_top3_type_43]
2 . 432 32 0.44 21 [df_top3_type_32]
3 . 431 56 0.32 23 [df_top3_type_56]
4 . 654 89 0.12 89 [df_top3_type_89]
5 . 764 32 0.20 211 [df_top3_type_32]
6 . 434 32 0.82 90 [df_top3_type_32]
...
df_top3_type_32 应如下所示(例如):
df_top3_type_32 should look like this (for example):
person_code #CNAE growth size
6 . 434 32 0.82 90
0 . 231 32 0.54 32
2 . 432 32 0.44 21
我正在尝试使用以下方法解决我的问题:
I'm trying to solve my problem by using:
df['top3growth']=np.nan
for i in df.index:
df['top3growth'].loc[i]=a[a['#CNAE'] == df['#CNAE'].loc[i]]
但我得到:
ValueError: Incompatible indexer with DataFrame
有人知道这是怎么回事吗?有没有更有效的方法(不使用 for 循环)?
Does anyone know what's going on? Is there a more efficient way of doing this (not using a for loop)?
推荐答案
有一种方法,将 a 转换为 dict ,然后再映射回来
There is one way, convert a to dict , then map it back
#a=df.groupby('#CNAE',group_keys=False).apply(pd.DataFrame.nlargest,n=3,columns='growth')
df['top3growth']=df['#CNAE'].map(a.groupby('#CNAE').apply(lambda x : x.to_dict()))
df
Out[195]:
person_code #CNAE growth size \
0 231 32 0.54 32
1 233 43 0.12 333
2 432 32 0.44 21
3 431 56 0.32 23
4 654 89 0.12 89
5 764 32 0.20 211
6 434 32 0.82 90
top3growth
0 {'person_code': {0: 231, 2: 432, 6: 434}, 'gro...
1 {'person_code': {1: 233}, 'growth': {1: 0.12},...
2 {'person_code': {0: 231, 2: 432, 6: 434}, 'gro...
3 {'person_code': {3: 431}, 'growth': {3: 0.32},...
4 {'person_code': {4: 654}, 'growth': {4: 0.12},...
5 {'person_code': {0: 231, 2: 432, 6: 434}, 'gro...
6 {'person_code': {0: 231, 2: 432, 6: 434}, 'gro...
创建新列后,如果要将单个单元格转换回数据框
After create your new column , if you want to convert the single cell back to dataframe
pd.DataFrame(df.top3growth[0])
Out[197]:
#CNAE growth person_code size
0 32 0.54 231 32
2 32 0.44 432 21
6 32 0.82 434 90
这篇关于在 Pandas DataFrame 中调用 groupby 对象时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!