汇总 pandas GroupBy中的列值作为dict [英] Aggregate column values in pandas GroupBy as a dict
问题描述
这是我过去面试中遇到的问题。
This is the question I had during the interview in the past.
我们的输入数据包含以下列:
We have the input data having the following columns:
语言,产品ID,货架ID,等级
language, product id, shelf id, rank
例如,输入将采用以下格式
For instance, the input would have the following format
English, 742005, 4560, 10.2
English, 6000075389352, 4560, 49
French, 899883993, 4560, 32
French, 731317391, 7868, 81
我们想对语言,货架ID列进行分组操作,并根据等级属性中的desc排序产品列表会导致输出具有以下格式:
we would like to do "group by" operation on language, shelf id columns and sort the list of products based on sort desc on "rank" attribute, which would result in the output having the following format:
语言,shelf_id,{product_id:rank1,product_id:rank2 ....}
Language, shelf_id, {product_id:rank1, product_id:rank2 ....}
每条记录。
对于给定的输入,输出如下:
For the given input, the output would be the following:
English, 4560, {6000075389352:49, 742005:10.2}
French, 4560, 899883993:32
French, 7868, 731317391:81
我解决了这个问题,方法是制作一个带有键的字典(通过组合语言和货架ID来创建),并插入产品ID,为每个键排列等级。
I solved this problem by making a dictionary with the key (which is created by combining the language and shelf id) and inserting the product id, rank for each of the key.
我的方法有效,但是看起来有一种使用python pandas库的简便方法。我已经阅读了一些参考资料,但是我仍然不确定是否有一种比我所做的更好的方法(通过使用具有该密钥的语言,架子ID和字典来创建密钥来解决问题)
My method worked, but it looks like there's an easier way of doing it using the python pandas library. I've read some references, but I'm still not sure if there's a superior method to what I've done (solving the problem by creating the key using language, shelf id and dictionary having that key)
任何帮助将不胜感激。
推荐答案
设置
df = pd.read_csv('file.csv', header=None)
df.columns = ['Lang', 'product_id', 'shelf_id', 'rank_id']
df
Lang product_id shelf_id rank_id
0 English 742005 4560 10.2
1 English 6000075389352 4560 49.0
2 French 899883993 4560 32.0
3 French 731317391 7868 81.0
您可以使用 df.groupby
按 Lang
和 shelf_id
分组。然后使用 df.apply
获得 {productid:rankid}
的字典:
You can use df.groupby
to group by Lang
and shelf_id
. Then use df.apply
to get a dictionary of {productid : rankid}
:
(df.groupby(['Lang', 'shelf_id'], as_index=False)
.apply(lambda x: dict(zip(x['product_id'], x['rank_id'])))
.reset_index(name='mapping'))
Lang shelf_id mapping
0 English 4560 {6000075389352: 49.0, 742005: 10.2}
1 French 4560 {899883993: 32.0}
2 French 7868 {731317391: 81.0}
这篇关于汇总 pandas GroupBy中的列值作为dict的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!