使用Python中的词典从数据框架构造序列 [英] Construct sequences from a dataframe using dictionaries in Python

查看:217
本文介绍了使用Python中的词典从数据框架构造序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Python中的词典来构建用户购买历史的顺序。我希望这些序列按日期排列。



我的数据框中有3列:

 用户项目日期

1 1 date_1
1 2 date_2
2 1 date_3
2 3 date_1
4 5 date_2
4 1 date_5
4 3 date_3

结果应该是这样的:

  {1:[[1,date_1],[2,date_2]],2:[[3,date_1] ,[5,date_2],[1,date_3]],4:[[5,date_2],[3,date_3] [1,date_5]]} 

我的代码是:

  df_sub = df [['uid' ,'nid','date']] 
dic3 = df_sub.set_index('uid')。T.to_dict('list')

我的结果是:

  {36864:[258509L,'2014- 12-03'],548873:[502105L,'2015-09-08'],42327:[492268 L,'2015-01-29'],548873:[370049L,'2015-02-18'],36864:[258909L,'2016-01-13'] ...} 

但我想按用户分组:

  {36864:[[258509L,'2014-12-03'],[258909L,'2016-01-13']],548873:[[502105L,'2015-09-08'],[370049L ,'2015-02-18']],42327:[492268L,'2015-01-29']} 

请帮助!

解决方案

首先,将用户设置为索引并执行 groupby wrt that。然后,您可以传递一个函数来按照 date 列对每个组进行排序,并使用 .values



请使用 .tolist ,以获取相应的列表。这给你所需的格式。最后,使用 .to_dict 将您的最终输出作为字典。

  fnc = lambda x:x.sort_values('date')。values.tolist()
df.set_index('users')。groupby(level = 0).apply(fnc).to_dict()

产生:

  1:[[1,'date_1'],[2,'date_2']],
2:[[3,'date_1'],[1,'date_3']],
4: [[5,'date_2'],[3,'date_3'],[1,'date_5']]}


I would like to construct sequences of user's purchasing history using dictionaries in Python. I would like these sequences to be ordred by date.

I have 3 columns in my dataframe:

users        items         date

1             1            date_1 
1             2            date_2
2             1            date_3
2             3            date_1
4             5            date_2
4             1            date_5
4             3            date_3

And the result should be like this :

{1: [[1,date_1],[2,date_2]], 2:[[3,date_1],[5,date_2],[1,date_3]], 4:[[5,date_2],[3,date_3][1,date_5]]}

My code is :

df_sub = df[['uid', 'nid', 'date']] 
dic3 = df_sub.set_index('uid').T.to_dict('list')

And my results are :

{36864: [258509L, '2014-12-03'], 548873: [502105L, '2015-09-08'], 42327: [492268L, '2015-01-29'], 548873: [370049L, '2015-02-18'], 36864: [258909L, '2016-01-13'] ... }

But I would like to group by users :

 {36864: [[258509L, '2014-12-03'],[258909L, '2016-01-13']], 548873: [[502105L, '2015-09-08'],[370049L, '2015-02-18']], 42327: [492268L, '2015-01-29'] }

Some help, please!

解决方案

Firstly, set users as the index and perform groupby w.r.t that. Then, you could pass a function to sort each group by it's date column and extract it's underlying array part using .values.

Use .tolist to get back it's list equivalent. This gives you in the required format. Finally, use .to_dict to get your final output as a dictionary.

fnc = lambda x: x.sort_values('date').values.tolist()
df.set_index('users').groupby(level=0).apply(fnc).to_dict()

produces:

{1: [[1, 'date_1'], [2, 'date_2']],
 2: [[3, 'date_1'], [1, 'date_3']],
 4: [[5, 'date_2'], [3, 'date_3'], [1, 'date_5']]}

这篇关于使用Python中的词典从数据框架构造序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆