索引 data.table 中的唯一值 [英] Index unique values in data.table

查看：14 发布时间：2022/1/13 18:59:04 r data.table

本文介绍了索引 data.table 中的唯一值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

不确定如何用文字表达问题，但是如何为 data.table 创建一个索引列，当出现不同值时每组递增?

Not sure how to formulate the question in words, but how can I create an index-column for a data.table that per group increments when a different value appear?

这里是 MWE

library(data.table)
in.data <- data.table(fruits=c(rep("banana", 4), rep("pear", 5)),vendor=c("a", "b", "b", "c", "d", "d", "e", "f", "f"))

这是 R 代码应该生成的结果

Here is the result the R-code should generate

in.data[, wanted.column:=c(1,2,2,3,1,1,2,3,3)]

#    fruits vendor wanted.column
# 1: banana      a             1
# 2: banana      b             2
# 3: banana      b             2
# 4: banana      c             3
# 5:   pear      d             1
# 6:   pear      d             1
# 7:   pear      e             2
# 8:   pear      f             3
# 9:   pear      f             3

因此，它在每个水果中标记每个供应商 1、2、3…….可能有一个非常简单的解决方案，但我被卡住了.

So it labels each vendor 1, 2, 3, ... within each fruit. There is probably a very simple solution, but I'm stuck.

推荐答案

我有一些想法.您可以使用嵌套组计数器:

I have a few ideas. You can use a nested group counter:

in.data[, w := setDT(list(v = vendor))[, g := .GRP, by=v]$g, by=fruits]

或者，创建一个运行 ID，它依赖于排序的数据(感谢 @eddi)并且看起来很浪费:

Alternately, make a run ID, which depends on sorted data (thanks @eddi) and seems wasteful:

in.data[, w := rleid(vendor), by=fruits]

base-R 方法可能是:

The base-R approach would probably be:

in.data[, w := match(vendor, unique(vendor)), by=fruits]

# or in base R ...

in.data$w = with(in.data, ave(vendor, fruits, FUN = function(x) match(x, unique(x))))

这篇关于索引 data.table 中的唯一值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

索引 data.table 中的唯一值 [英] Index unique values in data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

索引 data.table 中的唯一值 [英] Index unique values in data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭