在data.table中索引唯一值 [英] Index unique values in data.table
本文介绍了在data.table中索引唯一值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
不知道如何用字来形成问题,但是如何为不同值出现时每个组增量创建一个data.table的索引列?
Not sure how to formulate the question in words, but how can I create an index-column for a data.table that per group increments when a different value appear?
这里是MWE
library(data.table)
in.data <- data.table(fruits=c(rep("banana", 4), rep("pear", 5)),vendor=c("a", "b", "b", "c", "d", "d", "e", "f", "f"))
代码应该生成
in.data[, wanted.column:=c(1,2,2,3,1,1,2,3,3)]
# fruits vendor wanted.column
# 1: banana a 1
# 2: banana b 2
# 3: banana b 2
# 4: banana c 3
# 5: pear d 1
# 6: pear d 1
# 7: pear e 2
# 8: pear f 3
# 9: pear f 3
3,...每个水果。可能有一个非常简单的解决方案,但我被困住了。
So it labels each vendor 1, 2, 3, ... within each fruit. There is probably a very simple solution, but I'm stuck.
推荐答案
您可以使用嵌套组计数器:
I have a few ideas. You can use a nested group counter:
in.data[, w := setDT(list(v = vendor))[, g := .GRP, by=v]$g, by=fruits]
一个运行ID,取决于排序的数据(感谢@eddi),似乎浪费:
Alternately, make a run ID, which depends on sorted data (thanks @eddi) and seems wasteful:
in.data[, w := rleid(vendor), by=fruits]
base-R方法可能是:
The base-R approach would probably be:
in.data[, w := match(vendor, unique(vendor)), by=fruits]
# or in base R ...
in.data$w = with(in.data, ave(vendor, fruits, FUN = function(x) match(x, unique(x))))
这篇关于在data.table中索引唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文