将字典转换为稀疏矩阵 [英] convert dictionary to sparse matrix

查看:158
本文介绍了将字典转换为稀疏矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一本字典,其中键是user_id,键值是该用户喜欢的movie_id列表,其中#unique_users = 573000和#unique_movies = 16000.

I have a dictionary with keys as user_ids and values as list of movie_ids liked by that user with #unique_users = 573000 and # unique_movies =16000.

{1:[51,379,552,2333,2335,4089,4484], 2:[51,379,552,1674,1688,2333,3650,4089,4296,4484], 5:[783,909,1052,1138,1147,2676], 7:[171,321,959], 9:[3193], 10:[959], 11:[131,567,897,923],..........}

{1: [51, 379, 552, 2333, 2335, 4089, 4484], 2: [51, 379, 552, 1674, 1688, 2333, 3650, 4089, 4296, 4484], 5: [783, 909, 1052, 1138, 1147, 2676], 7: [171, 321, 959], 9: [3193], 10: [959], 11: [131,567,897,923],..........}

现在我要将其转换为矩阵,其中行作为user_ids,列作为movies_id,对于用户喜欢的电影,其值为1,即为573000 * 16000

Now i want to convert this into into a matrix with rows as user_ids and columns as movies_id with values 1 for the movies which user has liked i.e it will be 573000*16000

最终,我必须将此矩阵与它的转置相乘,以使共生矩阵具有暗淡的(#unique_movies,#unique_movies).

Ultimately i have to multiply this matrix with it's transpose to have co-occurrence matrix with dim (#unique_movies,#unique_movies).

此外,X'* X运算的时间复杂度是多少,其中X类似于(500000,12000).

Also, what will be the time complexity of X'*X operation where X is like (500000,12000).

推荐答案

我认为您可以构建一个空的 csr_matrix 进行有效的矩阵乘法.

I think you can construct an empty dok_matrix and fill the values. Then transpose it and convert it to csr_matrix for efficient matrix multiplications.

import numpy as np
import scipy.sparse as sp
d = {1: [51, 379, 552, 2333, 2335, 4089, 4484], 2: [51, 379, 552, 1674, 1688, 2333, 3650, 4089, 4296, 4484], 5: [783, 909, 1052, 1138, 1147, 2676], 7: [171, 321, 959], 9: [3193], 10: [959], 11: [131,567,897,923]}

mat = sp.dok_matrix((573000,16000), dtype=np.int8)

for user_id, movie_ids in d.items():
    mat[user_id, movie_ids] = 1

mat = mat.transpose().tocsr()
print mat.shape

这篇关于将字典转换为稀疏矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆