R:符合> =组中的其他行的行数 [英] R: number rows that match >= other row within group
问题描述
是否可以将两个表格彼此匹配,其中一个变量> =另一个,但在一个组内?
<$ c $数据框架(ID = c(rep(1,6),rep(2,4)),
IDSEQ = c(seq(1,6),seq ),
TAG = c(0,0,1,0,1,0,0,1,0,0))
下面是表格的简单示例:
ID IDSEQ TAG
1 1 0
1 2 0
1 3 1
1 4 0
1 5 1
1 6 0
2 1 0
2 2 1
2 3 0
2 4 0
创建了一个小的查找(需要使用min作为几个TAG可能发生在每个ID组内完成):
df2 < df [which(df $ TAG == 1),]
/ pre>
库(data.table)
DT < - data.table(df2)
DT < [,list(IDSEQ = min(IDSEQ)),by = ID]
ID IDSEQ
1 3
2 2
我正在考虑用ID编号的行,其中
df $ ID == DT $ ID
code> df $ IDSEQ> = DT $ IDSEQ ,但可能有其他方法来解决此问题。
看起来像这样。
ID IDSEQ标签案例
1 1 0 0
1 2 0 0
1 3 1 1
1 4 0 2
1 5 1 3
1 6 0 4
2 1 0 0
2 2 1 1
2 3 0 2
2 4 0 3
我想这可以用<$
解决方案我认为合并data.tables在这里可能是有用的。类似...
DT0 < - data.table(df)
setkey(DT0,ID)合并中的第一个数据表必须被键入
DT0 [DT,IDSEQ> = i.IDSEQ]#这标记每一行是否满足条件
我不知道前缀
i。
在哪里,X [Y,...]
加入的第二个表中的列。
我仍然试图找到一个优雅的方式来对组中的行进行编号。这是一种笨拙的方法:
DT0 [,CASES:= 0L]
DT0 [DT0 [IDSEQ> = i.IDSEQ]] $ V1,CASES:= 1:.N,by = ID]
$ b b它提供
ID IDSEQ TAG CASES
1:1 1 0 0
2: 1 2 0 0
3:1 3 1 1
4:1 4 0 2
5:1 5 1 3
6:1 6 0 4
7: 2 1 0 0
8:2 2 1 1
9:2 3 0 2
10:2 4 0 3
这使用特殊变量
.I
和.N
它们在help('data.table')
中有记录。Is it possible to match two tables against each other, where one variable >= the other one, but within a group?
df <- data.frame(ID=c(rep(1,6),rep(2,4)), IDSEQ=c(seq(1,6),seq(1,4)), TAG = c(0,0,1,0,1,0,0,1,0,0))
Here's a brief example of what the table looks like:
ID IDSEQ TAG 1 1 0 1 2 0 1 3 1 1 4 0 1 5 1 1 6 0 2 1 0 2 2 1 2 3 0 2 4 0
I've created a little lookup (needs to be done using min as several TAG's might occure within each ID group):
df2 <- df[which(df$TAG == 1), ] library(data.table) DT <- data.table(df2) DT <- DT[, list(IDSEQ=min(IDSEQ)), by=ID] ID IDSEQ 1 3 2 2
I was thinking of numbering the rows by ID where
df$ID == DT$ID
anddf$IDSEQ >= DT$IDSEQ
, but there might be other ways to solve this.The result should look like this.
ID IDSEQ TAG CASES 1 1 0 0 1 2 0 0 1 3 1 1 1 4 0 2 1 5 1 3 1 6 0 4 2 1 0 0 2 2 1 1 2 3 0 2 2 4 0 3
I think this might be done with
data.table
, but I have only used simple statements so far.解决方案I think merging data.tables might be useful here. Something like...
DT0 <- data.table(df) setkey(DT0,ID) # the first data.table in a merge must be keyed DT0[DT,IDSEQ >= i.IDSEQ] # this labels whether each row satisfies the condition
I'm not sure where the prefix
i.
is documented, but it refers to the column from the second table in theX[Y,...]
join.
I'm still trying to find an elegant way to number the rows within groups. This is one clumsy approach:
DT0[,CASES:=0L] DT0[DT0[DT,.I[IDSEQ >= i.IDSEQ]]$V1,CASES:=1:.N,by=ID]
which gives
ID IDSEQ TAG CASES 1: 1 1 0 0 2: 1 2 0 0 3: 1 3 1 1 4: 1 4 0 2 5: 1 5 1 3 6: 1 6 0 4 7: 2 1 0 0 8: 2 2 1 1 9: 2 3 0 2 10: 2 4 0 3
This uses the special variables
.I
and.N
, which are documented inhelp('data.table')
.这篇关于R:符合> =组中的其他行的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!