计数列中的出现次数,并在R中创建变量 [英] counting occurences in column and create variable in R

查看:558
本文介绍了计数列中的出现次数,并在R中创建变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新的R和我有一个data.frame,称为CT,包含一个名为ID的列,包含几百个不同的标识号(这些是病人)。大多数数字出现一次,但其他一些出现两三次(因此,在不同的行)。
在CT data.frame中,我想插入一个新的变量,称为countID,这将指示这些特定患者的出现次数(多个记录应该仍然出现多次)。
我在阅读此论坛后尝试了两种不同的策略:
第一个策略:

  CT<  -  cbind (CT,countID = sequence(rle(CT.long $ ID)$ lengths)


第二个策略:创建一个具有两列(一个是ID,一个是计数)的数据帧,并将此数据帧与CT匹配:

  tabs<  -  table(CT.long $ ID)
out< - data.frame(item = names(unlist(tabs)),count = unlist (tabs)[],stringsAsFactors = FALSE)
rownames(out)= c()
head(out)

#项计数
#1 1.312 1
#2 1.313 2
#3 1.316 1
#4 1.317 1
#5 1.321 1
#6 1.322 1



所以这个工作正常,但我不能融化两个data.frames:行数不匹配out和CT当然有更少的行)
也许有人有一个优雅的解决方案,直接在data.frame CT中添加出现次数,或正确匹配两个data.frames?
提前感谢,Denis

解决方案

你几乎到了! rle 会很好地工作,你只需要在计算 rle之前在 ID

  CT<  -  data.frame(value = runif(10),id = sample (5,10,repl = T))

#在计算rle时根据ID排序
计数< - rle(sort(CT $ id))

#match values
CT $ Count < - Count [[1]] [match(CT $ id,Count [[2]])]
CT
#value id Count
#1 0.94282600 1 4
#2 0.12170165 2 2
#3 0.04143461 1 4
#4 0.76334609 3 2
#5 0.87320740 4 1
#6 0.89766749 1 4
#7 0.16539820 1 4
#8 0.98521044 5 1
#9 0.70609853 3 2
#10 0.75134208 2 2
pre>

I am new on R and i have a data.frame , called "CT", containing a column called "ID" containing several hundreds of different identification numbers (these are patients). Most numbers appear once, but some others appear two or three times (therefore, in different rows). In the CT data.frame, i would like to insert a new variable, called "countID", which would indicate the number of occurrences of these specific patients (multiple records should still appear several times). I tried two different strategies after reading this forum : 1st strategy :

CT <- cbind(CT, countID=sequence(rle(CT.long$ID)$lengths)

But this doesnt work, i get only one count. 2nd strategy : create a data frame with two columns (one is ID, one is count) and the match this dataframe with CT :

tabs <- table(CT.long$ID)
out <- data.frame(item=names(unlist(tabs)),count=unlist(tabs)[],stringsAsFactors=FALSE)
rownames(out) = c()
head(out)

# item    count
# 1 1.312     1
# 2 1.313     2
# 3 1.316     1
# 4 1.317     1
# 5 1.321     1
# 6 1.322     1

So this works fine but i cant melt the two data.frames : the number of rows doesn't match between "out" and "CT" (out has less rows of course). Maybe someone has an elegant solution to add the number of occurrences directly in the data.frame CT, or correctly match the two data.frames ? Thanks in advance, Denis

解决方案

You were almost there! rle will work very nicely, you just need to sort your table on ID before computing rle:

CT <- data.frame( value = runif(10) , id = sample(5,10,repl=T) )

#  sort on ID when calculating rle
Count <- rle( sort( CT$id ) )

#  match values
CT$Count <- Count[[1]][ match( CT$id , Count[[2]] ) ]
CT
#       value id Count
#1  0.94282600  1     4
#2  0.12170165  2     2
#3  0.04143461  1     4
#4  0.76334609  3     2
#5  0.87320740  4     1
#6  0.89766749  1     4
#7  0.16539820  1     4
#8  0.98521044  5     1
#9  0.70609853  3     2
#10 0.75134208  2     2

这篇关于计数列中的出现次数,并在R中创建变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆