将DataFrameGroupBy对象中的每个分组列转换为列表 [英] Converting each grouped column in DataFrameGroupBy object to a list
问题描述
以下是数据:
df = pd.DataFrame({
'date':[1,1,2,2,2,3,3,3,4,5],
'request':[2,2,2,3,3,2,3,3,3,3],
'users':[1,3,7,1,7,3,4,9,7,9],
'count':[1,1,2,3,1,3,1,2,1,1]
})
df
count date request users
0 1 1 2 1
1 1 1 2 3
2 2 2 2 7
3 3 2 3 1
4 1 2 3 7
5 3 3 2 3
6 1 3 3 4
7 2 3 3 9
8 1 4 3 7
9 1 5 3 9
这个想法是对count
和date
进行分组,然后将每隔一列转换为一组分组值.我以为这就像调用dfgp.agg
一样简单,但事实并非如此.
The idea is to group by count
and date
, and convert every other column to a list of grouped values. I thought this would be as simple as calling dfgp.agg
but it is not.
这就是我想要做的:
date request count users
0 1 2 [1, 1] [1, 3]
1 2 2 [2] [7]
2 2 3 [3, 1] [1, 7]
3 3 2 [3] [3]
4 3 3 [1, 2] [4, 9]
5 4 3 [1] [7]
6 5 3 [1] [9]
这是我的方法:
grouped_df = df.groupby(['date', 'request'])
df_new = pd.DataFrame({ 'count' : grouped_df['count'].apply(list), 'users' : grouped_df['users'].apply(list) }).reset_index()
它可以工作,但是我相信必须有一种更好的方法……一种可以在分组对象的 all 列上工作的方法.例如,我应该仅按date
分组,该解决方案应该可以工作.我的解决方案将依靠对我不喜欢的列进行硬编码,因此在这种情况下将失败.
It works but I believe there has to be a better way... one that can work on all columns in the grouped object. For example, I should group by just date
and the solution should work. My solution will rely on hardcoding the columns, that I dislike doing, so it will fail in this instance.
这件事一直困扰着我.这应该是一个显而易见的解决方案,但我找不到.有更好的方法吗?
This is a something that has been bothering me. It should be an obvious solution but I cannot find it. Is there a better way?
正在呼叫我所有的熊猫MVP ...
推荐答案
更好的答案
查找重复的地方,进行相应的拆分和过滤
dups = df.duplicated(['request', 'date'], 'last').values
i = np.where(~dups[:-1])[0] + 1
r, d, c, u = (df[c].values for c in df)
d1 = pd.DataFrame(
np.column_stack([r[~dups], d[~dups]]), columns=['request', 'date'])
d2 = pd.DataFrame(
np.column_stack([np.split(c, i), np.split(u, i)]), columns=['count', 'users'])
d1.join(d2)
date requeset count users
0 1 2 [1, 1] [1, 3]
1 2 2 [2] [7]
2 2 3 [3, 1] [1, 7]
3 3 2 [3] [3]
4 3 3 [1, 2] [4, 9]
5 4 3 [1] [7]
6 5 3 [1] [9]
我的回答很好!
耶! defaultdict
Answer I feel good about!
Yay! defaultdict
from collections import defaultdict
d = defaultdict(list)
s = df.set_index(['date', 'request']).stack()
[d[k].append(v) for k, v in s.iteritems()];
pd.Series(d).unstack().rename_axis(['date', 'requeset']).reset_index()
date requeset count users
0 1 2 [1, 1] [1, 3]
1 2 2 [2] [7]
2 2 3 [3, 1] [1, 7]
3 3 2 [3] [3]
4 3 3 [1, 2] [4, 9]
5 4 3 [1] [7]
6 5 3 [1] [9]
旧答案
f = lambda x: pd.Series(x.values.T.tolist(), x.columns)
df.groupby(['request', 'date'])[['count', 'users']].apply(f).reset_index()
request date count users
0 2 1 [1, 1] [1, 3]
1 2 2 [2] [7]
2 2 3 [3] [3]
3 3 2 [3, 1] [1, 7]
4 3 3 [1, 2] [4, 9]
5 3 4 [1] [7]
6 3 5 [1] [9]
沮丧的答案!
鞋拔号agg
Frustration Answer!
Shoehorning agg
from ast import liter_eval
df.groupby(['request', 'date']).agg(
lambda x: str(list(x))
).applymap(literal_eval).reset_index()
request date count users
0 2 1 [1, 1] [1, 3]
1 2 2 [2] [7]
2 2 3 [3] [3]
3 3 2 [3, 1] [1, 7]
4 3 3 [1, 2] [4, 9]
5 3 4 [1] [7]
6 3 5 [1] [9]
这篇关于将DataFrameGroupBy对象中的每个分组列转换为列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!