使用Python中的词典从数据框架构造序列 [英] Construct sequences from a dataframe using dictionaries in Python
问题描述
我的数据框中有3列:
用户项目日期
1 1 date_1
1 2 date_2
2 1 date_3
2 3 date_1
4 5 date_2
4 1 date_5
4 3 date_3
结果应该是这样的:
{1:[[1,date_1],[2,date_2]],2:[[3,date_1] ,[5,date_2],[1,date_3]],4:[[5,date_2],[3,date_3] [1,date_5]]}
我的代码是:
df_sub = df [['uid' ,'nid','date']]
dic3 = df_sub.set_index('uid')。T.to_dict('list')
我的结果是:
{36864:[258509L,'2014- 12-03'],548873:[502105L,'2015-09-08'],42327:[492268 L,'2015-01-29'],548873:[370049L,'2015-02-18'],36864:[258909L,'2016-01-13'] ...}
但我想按用户分组:
{36864:[[258509L,'2014-12-03'],[258909L,'2016-01-13']],548873:[[502105L,'2015-09-08'],[370049L ,'2015-02-18']],42327:[492268L,'2015-01-29']}
请帮助!
首先,将用户设置为索引并执行 groupby
wrt that。然后,您可以传递一个函数来按照 date 列对每个组进行排序,并使用 .values
。
请使用 .tolist
,以获取相应的列表
。这给你所需的格式。最后,使用 .to_dict
将您的最终输出作为字典。
fnc = lambda x:x.sort_values('date')。values.tolist()
df.set_index('users')。groupby(level = 0).apply(fnc).to_dict()
产生:
1:[[1,'date_1'],[2,'date_2']],
2:[[3,'date_1'],[1,'date_3']],
4: [[5,'date_2'],[3,'date_3'],[1,'date_5']]}
I would like to construct sequences of user's purchasing history using dictionaries in Python. I would like these sequences to be ordred by date.
I have 3 columns in my dataframe:
users items date
1 1 date_1
1 2 date_2
2 1 date_3
2 3 date_1
4 5 date_2
4 1 date_5
4 3 date_3
And the result should be like this :
{1: [[1,date_1],[2,date_2]], 2:[[3,date_1],[5,date_2],[1,date_3]], 4:[[5,date_2],[3,date_3][1,date_5]]}
My code is :
df_sub = df[['uid', 'nid', 'date']]
dic3 = df_sub.set_index('uid').T.to_dict('list')
And my results are :
{36864: [258509L, '2014-12-03'], 548873: [502105L, '2015-09-08'], 42327: [492268L, '2015-01-29'], 548873: [370049L, '2015-02-18'], 36864: [258909L, '2016-01-13'] ... }
But I would like to group by users :
{36864: [[258509L, '2014-12-03'],[258909L, '2016-01-13']], 548873: [[502105L, '2015-09-08'],[370049L, '2015-02-18']], 42327: [492268L, '2015-01-29'] }
Some help, please!
Firstly, set users as the index and perform groupby
w.r.t that. Then, you could pass a function to sort each group by it's date column and extract it's underlying array part using .values
.
Use .tolist
to get back it's list
equivalent. This gives you in the required format. Finally, use .to_dict
to get your final output as a dictionary.
fnc = lambda x: x.sort_values('date').values.tolist()
df.set_index('users').groupby(level=0).apply(fnc).to_dict()
produces:
{1: [[1, 'date_1'], [2, 'date_2']],
2: [[3, 'date_1'], [1, 'date_3']],
4: [[5, 'date_2'], [3, 'date_3'], [1, 'date_5']]}
这篇关于使用Python中的词典从数据框架构造序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!