pandas groupby +列表 [英] pandas groupby + list

查看:82
本文介绍了 pandas groupby +列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

熊猫新手,如果这是旧帽子,抱歉.我要完成的工作类似于中包含的内容.在pandas groupby中对列表中的行进行分组,但是我有两列以上,并且无法弄清楚如何将所有列与分组值一起显示.这就是我想要做的.

New to pandas so sorry if this is old hat. What I'm trying to accomplish is similar to what is contained in grouping rows in list in pandas groupby, but I have more than two columns and can't figure out how to get all of my columns displayed along with the grouped value. Here's what I'm trying to do.

data = [{'ip': '192.168.1.1', 'make': 'Dell', 'model': 'UltraServ9000'},
{'ip': '192.168.1.3', 'make': 'Dell', 'model': 'MiniServ'},
{'ip': '192.168.1.5', 'make': 'Dell', 'model': 'UltraServ9000'},
{'ip': '192.168.1.6', 'make': 'HP', 'model': 'Thinger3000'},
{'ip': '192.168.1.8', 'make': 'HP', 'model': 'Thinger3000'}]

In [2]: df = pd.DataFrame(data)
In [3]: df
Out[4]:
            ip  make          model
0  192.168.1.1  Dell  UltraServ9000
1  192.168.1.3  Dell       MiniServ
2  192.168.1.5  Dell  UltraServ9000
3  192.168.1.6    HP    Thinger3000
4  192.168.1.8    HP    Thinger3000    

<magic>

Out[?]:    
            ip               make           model
0  192.168.1.1, 192.168.1.5  Dell   UltraServ9000
1  192.168.1.3               Dell        MiniServ
3  192.168.1.6, 192.168.1.8  HP       Thinger3000

先谢谢您了:)

推荐答案

groupby采用参数by,通过该参数可以指定要对groupby进行操作的变量list.因此,对该问题的答案进行了如下修改:

groupby takes a parameter, by, through which you can specify a list of variables you want to operate your groupby over. So the answer of that question is modified as follows:

df.groupby(by = ["a", "c"])["b"].apply(list).reset_index()

查看您的注释:由于除a以外的所有列均具有相同的值,因此您可以轻松地在by参数中列出它们,因为它们不会影响结果.为了节省您的时间并防止您键入所有名称,您可以执行以下操作:

Looking at your comment: since all columns other than a have the same values, you can list them easily in the by parameter because they won't affect the result. To save you time and prevent you to actually type all the names you could do something like this:

df.groupby(by = list(set(df.columns) - set(["b"])))["b"].apply(list).reset_index()

或者,您可以通过传递一个字典来利用agg函数,该字典对于所有列将采用max,对于b将返回列表:

Alternatively, you could exploit the agg function by passing a dictionary which for all columns will take the max and for b will return the list:

aggregate_functions = {x: max for x in df.columns if x != "a" and x != "b"}
aggregate_functions["b"] = lambda x: list(x)
df.groupby(by = "a").agg(aggregate_functions)

您更喜欢哪个,取决于您,后者可能更具可读性.

Which you prefer is up to you, probably the latter is more readable.

这篇关于 pandas groupby +列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆