帮我用一个“应用”功能 [英] Help me replace a for loop with an "apply" function

查看:118
本文介绍了帮我用一个“应用”功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

...如果可能的话

我的任务是找出用户参与游戏的最长连续天数。

不用编写一个sql函数,我选择使用R的rle函数来获取最长的条纹,然后用结果更新我的db表。
$ b $ (附)数据框是这样的:

  day user_id 
2008/11/01 2001
2008/11/01 2002
2008/11/01 2003
2008/11/01 2004
2008/11/01 2005
2008/11/02 2001
2008/11/02 2005
2008/11/03 2001
2008/11/03 2003
2008/11/03 2004
2008/11/03 2005
2008/11/04 2001
2008/11/04 2003
2008/11/04 2004
2008/11/04 2005

我尝试了下面的方法来获取每个用户的最长连胜数据库b
$ b

 #把它变成一个应急表
my_table< - table(user_id,day)

#得到条纹
rle_ta (< my_table,1,rle)

#验证用户2001的最长条数1
#as.vector(tapply(rle_table $'2001'$ rle_table $'2001'$ values,max)[1])

#循环得到结果
#发起结果矩阵
res< -matrix(nrow = dim(my_table)[1],
字符串< - paste(as.vector [1],ncol = 2)

(tapply(rle_table $',rownames(my_table)[i],''$ lengths,rle_table $',rownames(my_table)[i],''$ values,max)['1']),sep =)
res [i,]< -c(as.integer(rownames(my_table)[i]),eval(parse(text = string)))
}

不幸的是,这个for循环需要很长时间,我想知道是否有一种方法可以使用函数来生成res矩阵申请家庭。

提前致谢

解决方案

转换为日期
day_table $ day< - as.Date(day_table $ day,format = % Y /%m /%d)
#按用户分割,然后查找连续日期
contig< - sapply(split(day_table $ day,day_table $ user_id),function(.days){
.diff< - cumsum(c(TRUE,diff(.days)!= 1))
max(table(.diff))
})


...if that is possible

My task is to find the longest streak of continuous days a user participated in a game.

Instead of writing an sql function, I chose to use the R's rle function, to get the longest streaks and then update my db table with the results.

The (attached) dataframe is something like this:

    day      user_id
2008/11/01    2001
2008/11/01    2002
2008/11/01    2003
2008/11/01    2004
2008/11/01    2005
2008/11/02    2001
2008/11/02    2005
2008/11/03    2001
2008/11/03    2003
2008/11/03    2004
2008/11/03    2005
2008/11/04    2001
2008/11/04    2003
2008/11/04    2004
2008/11/04    2005

I tried the following to get per user longest streak

# turn it to a contingency table
my_table <- table(user_id, day)

# get the streaks
rle_table <- apply(my_table,1,rle)

# verify the longest streak of "1"s for user 2001
# as.vector(tapply(rle_table$'2001'$lengths, rle_table$'2001'$values, max)["1"])

# loop to get the results
# initiate results matrix
res<-matrix(nrow=dim(my_table)[1], ncol=2)

for (i in 1:dim(my_table)[1]) {
string <- paste("as.vector(tapply(rle_table$'", rownames(my_table)[i], "'$lengths, rle_table$'", rownames(my_table)[i], "'$values, max)['1'])", sep="")
res[i,]<-c(as.integer(rownames(my_table)[i]) , eval(parse(text=string)))
}

Unfortunately this for loop takes too long and I' wondering if there is a way to produce the res matrix using a function from the "apply" family.

Thank you in advance

解决方案

another option

# convert to Date
day_table$day <- as.Date(day_table$day, format="%Y/%m/%d")
# split by user and then look for contiguous days
contig <- sapply(split(day_table$day, day_table$user_id), function(.days){
    .diff <- cumsum(c(TRUE, diff(.days) != 1))
    max(table(.diff))
})

这篇关于帮我用一个“应用”功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆