比较列直到R中的某些索引 [英] Comparing columns uptill certain index in R

查看:77
本文介绍了比较列直到R中的某些索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

仅当 dep在 label中与先前的索引值匹配时,我才想比较两列(dep和label)并在第三列(mark)中设置一个条目。例如,在下面的示例中, label = 40(id = 2,具有dep = 45),但是我们将标记设置为2,因为匹配的标签(45)稍后出现(id = 4或8)。另外,如果有多个匹配项,我们保留最近的匹配项。例如,标签52(id 9)依赖于45,因此请选择最近一次匹配的id为id8。而且,当dep< 1

I want to compare two columns (dep and label)and set an entry in the third column(mark) only if 'dep' has match in 'label' for previous index values. For example, in the following example, 'label=40' (id =2, has dep=45) but we set mark as 2 because the matching label (45) exist later (id=4 or 8). Also, if there are multiple matches, we keep the recent one. For example, label 52 (id 9) is dependent on 45, so pick the id of most recent match which is id 8. Also, I do not want a comparison when dep <1

library(data.table)
trace <- data.table(id=1:10, dep=c(-1,45,40,47,0,45,43,42,45,45), 
label=c(99,40,43,45,47,42,48,45,52,67), mark=rep("",10))
   id dep label mark
1:  1  -1    99  1    
2:  2  45    40  2  
3:  3  40    43  2   
4:  4  47    45  4  
5:  5  0     47  5   
6:  6  45    42  4  
7:  7  43    48  3
8:  8  42    45  6   
9:  9  45    52  8  
10: 10  45   67  8  

为此的循环解决方案是

trace$mark <- trace$id
for (i in 1:length(trace$id)){
    val <- trace$dep[i]
    j <- 1
while(j<=i && val >1){ 
    if(val==trace$label[j]){
        trace$mark[i] <- trace$id[j]
                }
   j <-j +1
 }
}

下面前面建议的以下解决方案设置所有值是否它们出现在当前索引之前或之后。

A following solution which was suggested earlier here sets all values whether they occur before or after the current index.

trace[trace[dep>1,.(id,dep=label)],mark:=i.id,on="dep"]

任何想法如何实现

推荐答案

这似乎可行:

# clean up OP's example
trace[, mark := NULL ]

# lookup label
trace[, mark := 
  trace[.(dep = dep, id = id), on=.(label = dep, id < id), mult="last", x.id]
]

# if not found, use current id
trace[is.na(mark), mark := id ]

    id dep label mark
 1:  1  -1    99    1
 2:  2  45    40    2
 3:  3  40    43    2
 4:  4  47    45    4
 5:  5   0    47    5
 6:  6  45    42    4
 7:  7  43    48    3
 8:  8  42    45    6
 9:  9  45    52    8
10: 10  45    67    8

工作原理


  • x [i,on =,mult =,j] 是一个联接。

  • x 中查找 i 的每一行。

  • 如果 i 的多行与 x 的行匹配,则 mult = 确定会发生什么。

  • <$ c中的 x。* 前缀$ c> x.id 指示从哪个表中获取。

  • x[i, on=, mult=, j] is a join.
  • Each row of i is looked up in x.
  • If multiple rows of i match a row of x, mult= determines what happens.
  • The x.* prefix in x.id indicates which table it is taken from.

这篇关于比较列直到R中的某些索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆