将字典转换为稀疏矩阵 [英] convert dictionary to sparse matrix
问题描述
我有一本字典,其中键是user_id,键值是该用户喜欢的movie_id列表,其中#unique_users = 573000和#unique_movies = 16000.
I have a dictionary with keys as user_ids and values as list of movie_ids liked by that user with #unique_users = 573000 and # unique_movies =16000.
{1:[51,379,552,2333,2335,4089,4484], 2:[51,379,552,1674,1688,2333,3650,4089,4296,4484], 5:[783,909,1052,1138,1147,2676], 7:[171,321,959], 9:[3193], 10:[959], 11:[131,567,897,923],..........}
{1: [51, 379, 552, 2333, 2335, 4089, 4484], 2: [51, 379, 552, 1674, 1688, 2333, 3650, 4089, 4296, 4484], 5: [783, 909, 1052, 1138, 1147, 2676], 7: [171, 321, 959], 9: [3193], 10: [959], 11: [131,567,897,923],..........}
现在我要将其转换为矩阵,其中行作为user_ids,列作为movies_id,对于用户喜欢的电影,其值为1,即为573000 * 16000
Now i want to convert this into into a matrix with rows as user_ids and columns as movies_id with values 1 for the movies which user has liked i.e it will be 573000*16000
最终,我必须将此矩阵与它的转置相乘,以使共生矩阵具有暗淡的(#unique_movies,#unique_movies).
Ultimately i have to multiply this matrix with it's transpose to have co-occurrence matrix with dim (#unique_movies,#unique_movies).
此外,X'* X运算的时间复杂度是多少,其中X类似于(500000,12000).
Also, what will be the time complexity of X'*X operation where X is like (500000,12000).
推荐答案
我认为您可以构建一个空的 csr_matrix 进行有效的矩阵乘法.
I think you can construct an empty dok_matrix and fill the values. Then transpose it and convert it to csr_matrix for efficient matrix multiplications.
import numpy as np
import scipy.sparse as sp
d = {1: [51, 379, 552, 2333, 2335, 4089, 4484], 2: [51, 379, 552, 1674, 1688, 2333, 3650, 4089, 4296, 4484], 5: [783, 909, 1052, 1138, 1147, 2676], 7: [171, 321, 959], 9: [3193], 10: [959], 11: [131,567,897,923]}
mat = sp.dok_matrix((573000,16000), dtype=np.int8)
for user_id, movie_ids in d.items():
mat[user_id, movie_ids] = 1
mat = mat.transpose().tocsr()
print mat.shape
这篇关于将字典转换为稀疏矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!