复制pandas.DataFrame的GROUP_CONCAT [英] Replicating GROUP_CONCAT for pandas.DataFrame

查看:399
本文介绍了复制pandas.DataFrame的GROUP_CONCAT的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫DataFrame df:

I have a pandas DataFrame df:

+------+---------+  
| team | user    |  
+------+---------+  
| A    | elmer   |  
| A    | daffy   |  
| A    | bugs    |  
| B    | dawg    |  
| A    | foghorn |  
| B    | speedy  |  
| A    | goofy   |  
| A    | marvin  |  
| B    | pepe    |  
| C    | petunia |  
| C    | porky   |  
+------+---------  

我想使用以下方法找到或编写函数以返回要在MySQL中返回的DataFrame:

I want to find or write a function to return a DataFrame that I would return in MySQL using the following:

SELECT
  team,
  GROUP_CONCAT(user)
FROM
  df
GROUP BY
  team

获得以下结果:

+------+---------------------------------------+  
| team | group_concat(user)                    |  
+------+---------------------------------------+  
| A    | elmer,daffy,bugs,foghorn,goofy,marvin |  
| B    | dawg,speedy,pepe                      |  
| C    | petunia,porky                         |  
+------+---------------------------------------+  

我可以想到通过遍历行并添加到字典来实现此目的的讨厌方法,但是必须有一种更好的方法.

I can think of nasty ways to do this by iterating over rows and adding to a dictionary, but there's got to be a better way.

推荐答案

执行以下操作:

df.groupby('team').apply(lambda x: ','.join(x.user))

获取字符串Series

df.groupby('team').apply(lambda x: list(x.user))

可获得Serieslist个字符串.

结果如下:

In [33]: df.groupby('team').apply(lambda x: ', '.join(x.user))
Out[33]:
team
a       elmer, daffy, bugs, foghorn, goofy, marvin
b                               dawg, speedy, pepe
c                                   petunia, porky
dtype: object

In [34]: df.groupby('team').apply(lambda x: list(x.user))
Out[34]:
team
a       [elmer, daffy, bugs, foghorn, goofy, marvin]
b                               [dawg, speedy, pepe]
c                                   [petunia, porky]
dtype: object

请注意,通常对这些类型的Series进行的任何进一步操作都将很慢并且通常不建议使用.如果还有另一种汇总方法,而无需在Series内放入list,则应考虑改用这种方法.

Note that in general any further operations on these types of Series will be slow and are generally discouraged. If there's another way to aggregate without putting a list inside of a Series you should consider using that approach instead.

这篇关于复制pandas.DataFrame的GROUP_CONCAT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆