在R中使用选择功能错误 [英] Error using select function in R
问题描述
我想获得用户最常播放的歌曲。在csv文件中我想要的三个字段是userId,songId和playCount,但是select函数给出一个错误:
I want to get the song that user's play most frequently. The three fields I want in the csv file are userId,songId and playCount but the select function is giving an error:
write.csv(group_by(mydata,userId) %.%
summarise(one=max(playCount)) %.%
select(userId,songId,playCount), file="FavouriteSongs.csv")
Error in eval(expr, envir, enclos) : object 'songId' not found
数据的一个例子看起来像这样
An example of the data looks like this
userId songId playCount
A 568r 85
A 711g 18
C 34n 18
E 454j 65
D 663a 72
B 35d 84
A 34c 72
A 982s 65
E 433f 11
A 565t 7
提前感谢
推荐答案
在你的链中d序列 dplyr
操作,总结
调用将产生两列:分组变量和摘要函数的结果。
In your chained sequence of dplyr
operations, the summarise
call will produce two columns: the grouping variable and the result of the summary function.
df %.%
group_by(userId) %.%
summarise(
one = max(playCount))
# Source: local data frame [5 x 2]
#
# userId one
# 1 A 85
# 2 B 84
# 3 C 18
# 4 D 72
# 5 E 65
当您尝试选择
从生成的数据框中的songID变量总结
找不到songID变量。
When you then try to select
the songID variable from the data frame generated by summarise
, the songID variable is not found.
df %.%
group_by(userId) %.%
summarise(
one = max(playCount)) %.%
select(userId, songId, playCount)
# Error in eval(expr, envir, enclos) : object 'songId' not found
在这种情况下,更合适的 dplyr
函数是过滤
。在这里,我们选择条件 playCount == max(playCount)
中的行 c
A more suitable dplyr
function in this case is filter
. Here we select rows where the condition playCount == max(playCount)
is TRUE
within each group.
df %.%
group_by(userId) %.%
filter(
playCount == max(playCount))
# Source: local data frame [5 x 3]
# Groups: userId
#
# userId songId playCount
# 1 A 568r 85
# 2 C 34n 18
# 3 E 454j 65
# 4 D 663a 72
# 5 B 35d 84
你会发现几个很好的 dplyr example here 。
You find several nice dplyr examples here.
这篇关于在R中使用选择功能错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!