pandas groupby +列表 [英] pandas groupby + list
问题描述
熊猫新手,如果这是旧帽子,抱歉.我要完成的工作类似于中包含的内容.在pandas groupby中对列表中的行进行分组,但是我有两列以上,并且无法弄清楚如何将所有列与分组值一起显示.这就是我想要做的.
New to pandas so sorry if this is old hat. What I'm trying to accomplish is similar to what is contained in grouping rows in list in pandas groupby, but I have more than two columns and can't figure out how to get all of my columns displayed along with the grouped value. Here's what I'm trying to do.
data = [{'ip': '192.168.1.1', 'make': 'Dell', 'model': 'UltraServ9000'},
{'ip': '192.168.1.3', 'make': 'Dell', 'model': 'MiniServ'},
{'ip': '192.168.1.5', 'make': 'Dell', 'model': 'UltraServ9000'},
{'ip': '192.168.1.6', 'make': 'HP', 'model': 'Thinger3000'},
{'ip': '192.168.1.8', 'make': 'HP', 'model': 'Thinger3000'}]
In [2]: df = pd.DataFrame(data)
In [3]: df
Out[4]:
ip make model
0 192.168.1.1 Dell UltraServ9000
1 192.168.1.3 Dell MiniServ
2 192.168.1.5 Dell UltraServ9000
3 192.168.1.6 HP Thinger3000
4 192.168.1.8 HP Thinger3000
<magic>
Out[?]:
ip make model
0 192.168.1.1, 192.168.1.5 Dell UltraServ9000
1 192.168.1.3 Dell MiniServ
3 192.168.1.6, 192.168.1.8 HP Thinger3000
先谢谢您了:)
推荐答案
groupby
采用参数by
,通过该参数可以指定要对groupby
进行操作的变量list
.因此,对该问题的答案进行了如下修改:
groupby
takes a parameter, by
, through which you can specify a list
of variables you want to operate your groupby
over. So the answer of that question is modified as follows:
df.groupby(by = ["a", "c"])["b"].apply(list).reset_index()
查看您的注释:由于除a
以外的所有列均具有相同的值,因此您可以轻松地在by
参数中列出它们,因为它们不会影响结果.为了节省您的时间并防止您键入所有名称,您可以执行以下操作:
Looking at your comment: since all columns other than a
have the same values, you can list them easily in the by
parameter because they won't affect the result. To save you time and prevent you to actually type all the names you could do something like this:
df.groupby(by = list(set(df.columns) - set(["b"])))["b"].apply(list).reset_index()
或者,您可以通过传递一个字典来利用agg
函数,该字典对于所有列将采用max
,对于b
将返回列表:
Alternatively, you could exploit the agg
function by passing a dictionary which for all columns will take the max
and for b
will return the list:
aggregate_functions = {x: max for x in df.columns if x != "a" and x != "b"}
aggregate_functions["b"] = lambda x: list(x)
df.groupby(by = "a").agg(aggregate_functions)
您更喜欢哪个,取决于您,后者可能更具可读性.
Which you prefer is up to you, probably the latter is more readable.
这篇关于 pandas groupby +列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!