pandas groupby与dict [英] Pandas groupby with dict
问题描述
是否可以使用字典对列的元素进行分组?
Is it possible to use a dict to group on elements of a column?
例如:
In [3]: df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
...: 'B' : np.random.randn(8)})
In [4]: df
Out[4]:
A B
0 one 0.751612
1 one 0.333008
2 two 0.395667
3 three 1.636125
4 two 0.916435
5 two 1.076679
6 one -0.992324
7 three -0.593476
In [5]: d = {'one':'Start', 'two':'Start', 'three':'End'}
In [6]: grouped = df[['A','B']].groupby(d)
此(和其他变体)返回一个空的groupby对象.而且我使用.apply
的所有方法也都失败了.
This (and other variations) returns an empty groupby object. And my variations on using .apply
all fail too.
我想将列A
的值与字典的键匹配,并将行放入由值定义的组中.输出看起来像这样:
I'd like to match the values of column A
to the keys of the dictionary and put rows into the groups defined by the values. The output would look something like this:
Start:
A B
0 one 0.751612
1 one 0.333008
2 two 0.395667
4 two 0.916435
5 two 1.076679
6 one -0.992324
End:
A B
3 three 1.636125
7 three -0.593476
推荐答案
From the docs, the dict has to map from labels to group names, so this will work if you put 'A'
into the index:
grouped2 = df.set_index('A').groupby(d)
for group_name, data in grouped2:
print group_name
print '---------'
print data
# Output:
End
---------
B
A
three -1.234795
three 0.239209
Start
---------
B
A
one -1.924156
one 0.506046
two -1.681980
two 0.605248
two -0.861364
one 0.800431
列名和行索引都是标签,而在将'A'
放入索引之前,'A'
的元素是值.
Column names and row indices are both labels, whereas before you put 'A'
into the index, the elements of 'A'
are values.
如果索引中还有其他信息使set_index()
变得棘手,则可以使用map()
创建分组列:
If you have other info in the index that makes doing a set_index()
tricky, you can just create a grouping column with map()
:
df['group'] = df['A'].map(d)
grouped3 = df.groupby('group')
这篇关于 pandas groupby与dict的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!