从数据框创建稀疏矩阵 [英] Create Sparse Matrix from a data frame
问题描述
我正在做一项作业,试图为Netflix奖品数据建立协作过滤模型.我正在使用的数据在CSV文件中,我可以轻松地将其导入到数据框中.现在,我需要做的是创建一个稀疏矩阵,该矩阵由用户"作为行,电影"作为列,每个单元格都由相应的评估值填充.当我尝试映射数据帧中的值时,我需要为数据帧中的每一行运行一个循环,这在R中花费了大量时间,请有人可以提出一种更好的方法.这是示例代码和数据:
I m doing an assignment where I am trying to build a collaborative filtering model for the Netflix prize data. The data that I am using is in a CSV file which I easily imported into a data frame. Now what I need to do is create a sparse matrix consisting of the Users as the rows and Movies as the columns and each cell is filled up by the corresponding rating value. When I try to map out the values in the data frame I need to run a loop for each row in the data frame, which is taking a lot of time in R, please can anyone suggest a better approach. Here is the sample code and data:
buildUserMovieMatrix <- function(trainingData)
{
UIMatrix <- Matrix(0, nrow = max(trainingData$UserID), ncol = max(trainingData$MovieID), sparse = T);
for(i in 1:nrow(trainingData))
{
UIMatrix[trainingData$UserID[i], trainingData$MovieID[i]] = trainingData$Rating[i];
}
return(UIMatrix);
}
要从中创建稀疏矩阵的数据帧中的数据样本:
Sample of data in the dataframe from which the sparse matrix is being created:
MovieID UserID Rating
1 1 2 3
2 2 3 3
3 2 4 4
4 2 6 3
5 2 7 3
所以最终我想要这样的东西: 列是电影ID,行是用户ID
So in the end I want something like this: The columns are the movie IDs and the rows are the user IDs
1 2 3 4 5 6 7
1 0 0 0 0 0 0 0
2 3 0 0 0 0 0 0
3 0 3 0 0 0 0 0
4 0 4 0 0 0 0 0
5 0 0 0 0 0 0 0
6 0 3 0 0 0 0 0
7 0 3 0 0 0 0 0
因此,解释是这样的:用户2将电影1评为3星,用户3将电影2评为3星,依此类推,其他用户和电影也是如此.我的数据框中大约有8500000行,我的代码只需要30-45分钟即可创建此用户项矩阵,我想得到任何建议
So the interpretation is something like this: user 2 rated movie 1 as 3 star, user 3 rated the movie 2 as 3 star and so on for the other users and movies. There are about 8500000 rows in my data frame for which my code takes just about 30-45 mins to create this user item matrix, i would like to get any suggestions
推荐答案
Matrix
包具有一个专门为您的数据类型制作的构造函数:
The Matrix
package has a constructor made especially for your type of data:
library(Matrix)
UIMatrix <- sparseMatrix(i = trainingData$UserID,
j = trainingData$MovieID,
x = trainingData$Rating)
否则,您可能想了解[
函数的那个很酷的功能,即矩阵索引.您可以尝试:
Otherwise, you might like knowing about that cool feature of the [
function known as matrix indexing. Your could have tried:
buildUserMovieMatrix <- function(trainingData) {
UIMatrix <- Matrix(0, nrow = max(trainingData$UserID),
ncol = max(trainingData$MovieID), sparse = TRUE);
UIMatrix[cbind(trainingData$UserID,
trainingData$MovieID)] <- trainingData$Rating;
return(UIMatrix);
}
(但是我绝对会推荐sparseMatrix
方法.)
(but I would definitely recommend the sparseMatrix
approach over this.)
这篇关于从数据框创建稀疏矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!