加入具有多个匹配项的data.table [英] Join in data.table with multiple matches

查看：69 发布时间：2020/10/15 20:44:10 r data.table

本文介绍了加入具有多个匹配项的data.table的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我早些时候发布了一个关于在data.table中联接列的问题，其中一列（dep）具有条目的依赖信息。因此，条目3取决于标签为 40的记录。然后，为匹配列分配条目所依赖的标签的ID值。问题发布在这里：比较直到R中某些索引的列

I posted a question earlier about joining columns in data.table, where one column (dep) has the dependence information of an entry . So entry 3 is dependent on a record with label '40'. Then the 'match' column is assigned the id value of the label on which an entry depends. The question is posted here : Comparing columns uptill certain index in R

library(data.table)
trace <- data.table(id=1:10, dep=c(-1,45,40,47,0,45,43,42,45,45), 
label=c(99,40,43,45,47,42,48,45,52,67), mark=rep("",10))
   id dep label mark
1:  1  -1    99      
2:  2  45    40    
3:  3  40    43     
4:  4  47    45    
5:  5  0     47     
6:  6  45    42    
7:  7  43    48  
8:  8  42    45     
9:  9  45    52    
10: 10  45   67

将导致

    id dep label mark
1:  1  -1    99  1    
2:  2  45    40  2  
3:  3  40    43  2   
4:  4  47    45  4  
5:  5  0     47  5   
6:  6  45    42  4  
7:  7  43    48  3
8:  8  42    45  6   
9:  9  45    52  8  
10: 10  45   67  8

以下解决方案对我有用：

The following solution worked for me:

trace[, mark := trace[.(dep = dep, id = id), on=.(label = dep, id < id), mult="last", x.id]]

# if not found, use current id
trace[is.na(mark), mark := id ]

对于上述情况，对于重复项，我们使用的是最近的匹配项。
但是，如果我不想保留所有匹配项，而不是匹配到最后一个条目，是否有办法获得类似于此的输出（其中最后一个和倒数第二个条目具有多个依赖性）：

For the above case, for duplicates we were using the most recent match. However, if instead of matching to last entry if I want to keep all matches, is there a way get an output similar to this (where last and second last entries have multiple dependencies):

   id dep label mark
1:  1  -1    99  1    
2:  2  45    40  2  
3:  3  40    43  2   
4:  4  47    45  4  
5:  5  0     47  5   
6:  6  45    42  4  
7:  7  43    48  3
8:  8  42    45  6   
9:  9  45    52  4,8  
10: 10  45   67  4,8

我并不担心这些依赖项的记录格式。使用mult = all，

I am not that concerned about the format in which these dependencies are recorded. A slight modification of the earlier solution using mult="all",

trace[, mark := trace[.(dep = dep, id = id), on=.(label = dep, id < id), mult="all", toString(x.id)]]

结果

 id dep label                                   mark
 1:  1  -1    99 NA, NA, 2, NA, NA, 4, 3, 6, 4, 8, 4, 8
 2:  2  45    40 NA, NA, 2, NA, NA, 4, 3, 6, 4, 8, 4, 8
 3:  3  40    43 NA, NA, 2, NA, NA, 4, 3, 6, 4, 8, 4, 8
 4:  4  47    45 NA, NA, 2, NA, NA, 4, 3, 6, 4, 8, 4, 8
 5:  5   0    47 NA, NA, 2, NA, NA, 4, 3, 6, 4, 8, 4, 8
 6:  6  45    42 NA, NA, 2, NA, NA, 4, 3, 6, 4, 8, 4, 8
 7:  7  43    48 NA, NA, 2, NA, NA, 4, 3, 6, 4, 8, 4, 8
 8:  8  42    45 NA, NA, 2, NA, NA, 4, 3, 6, 4, 8, 4, 8
 9:  9  45    52 NA, NA, 2, NA, NA, 4, 3, 6, 4, 8, 4, 8
10: 10  45    67 NA, NA, 2, NA, NA, 4, 3, 6, 4, 8, 4, 8

推荐答案

好，稍作修改：

trace[, mark := trace[.(dep = dep, id = id), on=.(label = dep, id < id), 
  if (all(is.na(x.id))) NA_character_ else toString(x.id), by=.EACHI]$V1 ]

# if not found, use current id
trace[is.na(mark), mark := as.character(id) ]

它使用 as.character（id），因为 mark 现在是一个字符串变量。

It uses as.character(id) because mark is now a string variable.

要查看 by = .EACHI 的工作方式，请尝试单独运行此部分：

To see how the by=.EACHI works, try running this part on its own:

trace[.(dep = dep, id = id), on=.(label = dep, id < id), 
  if (all(is.na(x.id))) NA_character_ else toString(x.id), by=.EACHI]

评论。我希望这对于较大的表来说不会很好地扩展。另外，该列不再匹配 id 的类型，因此不能用于合并等。列表 -class列会遇到相同的问题：

Comments. I expect this will not scale up well for larger tables. Also, the column no longer matches id's type, so it cannot be used for merging, etc. A list-class column would have the same problem:

trace[, mark := trace[.(dep = dep, id = id), on=.(label = dep, id < id), 
  list(list(na.omit(x.id))), by=.EACHI]$V1 ]

# if not found, use current id
trace[lengths(mark) == 0L, mark := as.list(id)]

这篇关于加入具有多个匹配项的data.table的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

加入具有多个匹配项的data.table [英] Join in data.table with multiple matches

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

加入具有多个匹配项的data.table [英] Join in data.table with multiple matches

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭