计数列中的出现次数,并在R中创建变量 [英] counting occurences in column and create variable in R
问题描述
我是新的R和我有一个data.frame,称为CT,包含一个名为ID的列,包含几百个不同的标识号(这些是病人)。大多数数字出现一次,但其他一些出现两三次(因此,在不同的行)。
在CT data.frame中,我想插入一个新的变量,称为countID,这将指示这些特定患者的出现次数(多个记录应该仍然出现多次)。
我在阅读此论坛后尝试了两种不同的策略:
第一个策略:
CT< - cbind (CT,countID = sequence(rle(CT.long $ ID)$ lengths)
第二个策略:创建一个具有两列(一个是ID,一个是计数)的数据帧,并将此数据帧与CT匹配:
tabs< - table(CT.long $ ID)
out< - data.frame(item = names(unlist(tabs)),count = unlist (tabs)[],stringsAsFactors = FALSE)
rownames(out)= c()
head(out)
#项计数
#1 1.312 1
#2 1.313 2
#3 1.316 1
#4 1.317 1
#5 1.321 1
#6 1.322 1
所以这个工作正常,但我不能融化两个data.frames:行数不匹配out和CT当然有更少的行)
也许有人有一个优雅的解决方案,直接在data.frame CT中添加出现次数,或正确匹配两个data.frames?
提前感谢,Denis解决方案你几乎到了!
rle
会很好地工作,你只需要在计算rle之前在
:ID
CT< - data.frame(value = runif(10),id = sample (5,10,repl = T))
pre>
#在计算rle时根据ID排序
计数< - rle(sort(CT $ id))
#match values
CT $ Count < - Count [[1]] [match(CT $ id,Count [[2]])]
CT
#value id Count
#1 0.94282600 1 4
#2 0.12170165 2 2
#3 0.04143461 1 4
#4 0.76334609 3 2
#5 0.87320740 4 1
#6 0.89766749 1 4
#7 0.16539820 1 4
#8 0.98521044 5 1
#9 0.70609853 3 2
#10 0.75134208 2 2
I am new on R and i have a data.frame , called "CT", containing a column called "ID" containing several hundreds of different identification numbers (these are patients). Most numbers appear once, but some others appear two or three times (therefore, in different rows). In the CT data.frame, i would like to insert a new variable, called "countID", which would indicate the number of occurrences of these specific patients (multiple records should still appear several times). I tried two different strategies after reading this forum : 1st strategy :
CT <- cbind(CT, countID=sequence(rle(CT.long$ID)$lengths)
But this doesnt work, i get only one count. 2nd strategy : create a data frame with two columns (one is ID, one is count) and the match this dataframe with CT :
tabs <- table(CT.long$ID) out <- data.frame(item=names(unlist(tabs)),count=unlist(tabs)[],stringsAsFactors=FALSE) rownames(out) = c() head(out) # item count # 1 1.312 1 # 2 1.313 2 # 3 1.316 1 # 4 1.317 1 # 5 1.321 1 # 6 1.322 1
So this works fine but i cant melt the two data.frames : the number of rows doesn't match between "out" and "CT" (out has less rows of course). Maybe someone has an elegant solution to add the number of occurrences directly in the data.frame CT, or correctly match the two data.frames ? Thanks in advance, Denis
解决方案You were almost there!
rle
will work very nicely, you just need to sort your table onID
before computingrle
:CT <- data.frame( value = runif(10) , id = sample(5,10,repl=T) ) # sort on ID when calculating rle Count <- rle( sort( CT$id ) ) # match values CT$Count <- Count[[1]][ match( CT$id , Count[[2]] ) ] CT # value id Count #1 0.94282600 1 4 #2 0.12170165 2 2 #3 0.04143461 1 4 #4 0.76334609 3 2 #5 0.87320740 4 1 #6 0.89766749 1 4 #7 0.16539820 1 4 #8 0.98521044 5 1 #9 0.70609853 3 2 #10 0.75134208 2 2
这篇关于计数列中的出现次数,并在R中创建变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!