如何使用R项目创建电影评分的矢量矩阵? [英] How to create vector matrix of movie ratings using R project?

查看:202
本文介绍了如何使用R项目创建电影评分的矢量矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我正在使用以下电影收视率数据集: http://www.grouplens.org/node/73

Suppose I am using this data set of movie ratings: http://www.grouplens.org/node/73

它包含格式为以下格式的评分 userID :: movieID :: rating :: timestamp

It contains ratings in a file formatted as userID::movieID::rating::timestamp

鉴于此,我想在R项目中构造一个特征矩阵,其中每一行对应一个用户,每列指示该用户对电影的评价(如果有).

Given this, I want to construct a feature matrix in R project, where each row corresponds to a user and each column indicates the rating that the user gave to the movie (if any).

例如,如果数据文件包含

Example, if the data file contains


1::1::1::10
2::2::2::11
1::2::3::12
2::1::5::13
3::3::4::14

然后输出矩阵如下:


UserID, Movie1, Movie2, Movie3
1, 1, 3, NA
2, 5, 2, NA
3, NA, NA, 3

因此在R项目中有一些内置的方法可以实现这一目标.我写了一个简单的python脚本来做同样的事情,但是我敢肯定有更有效的方法来实现这一点.

So is there some built-in way to achieve this in R project. I wrote a simple python script to do the same thing but I bet there are more efficient ways to accomplish this.

推荐答案

您可以在reshape2包中使用dcast函数,但是结果data.frame可能很大(而且很稀疏).

You can use the dcast function, in the reshape2 package, but the resulting data.frame may be huge (and sparse).

d <- read.delim(
  "u1.base", 
  col.names = c("user", "film", "rating", "timestamp")
)
library(reshape2)
d <- dcast( d, user ~ film, value.var = "rating" )

如果您的字段用双冒号分隔,则不能使用read.delimsep参数,该参数只能是一个字符. 如果您已经在R之外进行了一些预处理,那么在R处进行预处理会更容易(例如,在Perl中只是s/::/\t/g),但是您也可以在R中进行预处理:将文件读为单列,将文件拆分为字符串,并连接结果.

If your fields are separated by double colons, you cannot use the sep argument of read.delim, which has to be only one character. If you already do some preprocessing outside R, it is easier to do it there (e.g., in Perl, it would just be s/::/\t/g), but you can also do it in R: read the file as a single column, split the strings, and concatenate the result.

d <- read.delim("a")
d <- as.character( d[,1] )   # vector of strings
d <- strsplit( d, "::" )     # List of vectors of strings of characters
d <- lapply( d, as.numeric ) # List of vectors of numbers
d <- do.call( rbind, d )     # Matrix
d <- as.data.frame( d )
colnames( d ) <- c( "user", "movie", "rating", "timestamp" )

这篇关于如何使用R项目创建电影评分的矢量矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆