汇总 pandas GroupBy中的列值作为dict [英] Aggregate column values in pandas GroupBy as a dict

查看:239
本文介绍了汇总 pandas GroupBy中的列值作为dict的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我过去面试中遇到的问题。

This is the question I had during the interview in the past.

我们的输入数据包含以下列:

We have the input data having the following columns:

语言,产品ID,货架ID,等级

language, product id, shelf id, rank

例如,输入将采用以下格式

For instance, the input would have the following format

English, 742005, 4560, 10.2 
English, 6000075389352, 4560, 49
French, 899883993, 4560, 32
French, 731317391, 7868, 81

我们想对语言,货架ID列进行分组操作,并根据等级属性中的desc排序产品列表会导致输出具有以下格式:

we would like to do "group by" operation on language, shelf id columns and sort the list of products based on sort desc on "rank" attribute, which would result in the output having the following format:

语言,shelf_id,{product_id:rank1,product_id:rank2 ....}

Language, shelf_id, {product_id:rank1, product_id:rank2 ....}

每条记录。

对于给定的输入,输出如下:

For the given input, the output would be the following:

English, 4560, {6000075389352:49, 742005:10.2}
French, 4560, 899883993:32
French, 7868, 731317391:81

我解决了这个问题,方法是制作一个带有键的字典(通过组合语言和货架ID来创建),并插入产品ID,为每个键排列等级。

I solved this problem by making a dictionary with the key (which is created by combining the language and shelf id) and inserting the product id, rank for each of the key.

我的方法有效,但是看起来有一种使用python pandas库的简便方法。我已经阅读了一些参考资料,但是我仍然不确定是否有一种比我所做的更好的方法(通过使用具有该密钥的语言,架子ID和字典来创建密钥来解决问题)

My method worked, but it looks like there's an easier way of doing it using the python pandas library. I've read some references, but I'm still not sure if there's a superior method to what I've done (solving the problem by creating the key using language, shelf id and dictionary having that key)

任何帮助将不胜感激。

推荐答案

设置

df = pd.read_csv('file.csv', header=None)  
df.columns = ['Lang', 'product_id', 'shelf_id', 'rank_id']    

df
      Lang     product_id  shelf_id  rank_id
0  English         742005      4560     10.2
1  English  6000075389352      4560     49.0
2   French      899883993      4560     32.0
3   French      731317391      7868     81.0

您可以使用 df.groupby Lang shelf_id 分组。然后使用 df.apply 获得 {productid:rankid} 的字典:

You can use df.groupby to group by Lang and shelf_id. Then use df.apply to get a dictionary of {productid : rankid}:

(df.groupby(['Lang', 'shelf_id'], as_index=False)
   .apply(lambda x: dict(zip(x['product_id'], x['rank_id'])))
   .reset_index(name='mapping'))

      Lang  shelf_id                              mapping
0  English      4560  {6000075389352: 49.0, 742005: 10.2}
1   French      4560                    {899883993: 32.0}
2   French      7868                    {731317391: 81.0}

这篇关于汇总 pandas GroupBy中的列值作为dict的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆