计算列中的出现并在R中创建变量 [英] counting occurrences in column and create variable in R

查看:52
本文介绍了计算列中的出现并在R中创建变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R的新手,我有一个名为 CT的data.frame,其中包含一个名为 ID的列,其中包含数百个不同的标识号(这些是患者)。大多数数字出现一次,但是其他一些出现两次或三次(因此,在不同的行中)。
在CT data.frame中,我想插入一个新的变量,称为 countID,该变量将指示这些特定患者的发生次数(多次记录仍应出现多次)。
在阅读本论坛后,我尝试了两种不同的策略:
第一种策略:

I am new on R and I have a data.frame , called "CT", containing a column called "ID" containing several hundreds of different identification numbers (these are patients). Most numbers appear once, but some others appear two or three times (therefore, in different rows). In the CT data.frame, I would like to insert a new variable, called "countID", which would indicate the number of occurrences of these specific patients (multiple records should still appear several times). I tried two different strategies after reading this forum: 1st strategy:

CT <- cbind(CT, countID=sequence(rle(CT.long$ID)$lengths)

但是不行,我只得到一个计数。
第二种策略:创建一个具有两列的数据框(一个是ID,一个是计数),然后将此数据框与CT匹配:

But this doesn't work, I get only one count. 2nd strategy: create a data frame with two columns (one is ID, one is count) and the match this dataframe with CT:

tabs <- table(CT.long$ID)
out <- data.frame(item=names(unlist(tabs)),count=unlist(tabs)[],stringsAsFactors=FALSE)
rownames(out) = c()
head(out)

# item    count
# 1 1.312     1
# 2 1.313     2
# 3 1.316     1
# 4 1.317     1
# 5 1.321     1
# 6 1.322     1

所以这很好,但我无法融合两个data.frames: out和 CT(当然,行数更少)。
也许有人有一个不错的解决方案,可以直接在data.frame中添加出现次数CT,还是正确匹配两个data.frame?

So this works fine but I can't melt the two data.frames: the number of rows doesn't match between "out" and "CT" (out has less rows of course). Maybe someone has an elegant solution to add the number of occurrences directly in the data.frame CT, or correctly match the two data.frames?

推荐答案

您快到了! rle 会很好地工作,您只需要在 rle上对表进行排序就可以在 ID 上进行排序

You were almost there! rle will work very nicely, you just need to sort your table on ID before computing rle:

CT <- data.frame( value = runif(10) , id = sample(5,10,repl=T) )

#  sort on ID when calculating rle
Count <- rle( sort( CT$id ) )

#  match values
CT$Count <- Count[[1]][ match( CT$id , Count[[2]] ) ]
CT
#       value id Count
#1  0.94282600  1     4
#2  0.12170165  2     2
#3  0.04143461  1     4
#4  0.76334609  3     2
#5  0.87320740  4     1
#6  0.89766749  1     4
#7  0.16539820  1     4
#8  0.98521044  5     1
#9  0.70609853  3     2
#10 0.75134208  2     2

这篇关于计算列中的出现并在R中创建变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆