使用data.table的shift（）按组（bug？）的意外结果 [英] Unexpected result using data.table's shift() by group (bug?)

查看：185 发布时间：2017/3/12 12:59:13 r data.table

本文介绍了使用data.table的shift（）按组（bug？）的意外结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

请考虑此数据集

dt <- data.table(ID = c(1,8,9,20,32,33), Char = c("A", "A", "B", "B", "C", "C"))
dt
   ID Char
1:  1    A
2:  8    A
3:  9    B
4: 20    B
5: 32    C
6: 33    C

我想通过ID识别其中ID相差1的行，但我只想考虑在同一个 Char 组中的运行。我可以这样做：

I want to identify "runs" by ID, i.e. consecutive rows where the ID differs by 1, but I only want to consider runs within the same Char group. I can do this as follows

dt[, InRun := FALSE]
dt[, DistToAbove := abs(ID - shift(ID, type="lag")), by=Char]
dt[, DistToBelow := abs(ID - shift(ID, type="lead")), by=Char]
dt[DistToAbove <= 1 | DistToBelow <= 1, InRun := TRUE, by=Char]
dt
   ID Char InRun DistToAbove DistToBelow
1:  1    A FALSE          NA           7
2:  8    A FALSE           7          NA
3:  9    B FALSE          NA          11
4: 20    B FALSE          11          NA
5: 32    C  TRUE          NA           1
6: 33    C  TRUE           1          NA

我尝试简化上面的代码到下面的行，但答案不同

I tried simplifying the above code into the lines below, but the answer differs

dt[, InRun := FALSE]
dt[abs(ID - shift(ID, type="lag")) <= 1 | abs(shift(ID, type="lead") - ID) <= 1, InRun := TRUE, by=Char]
dt
   ID Char InRun DistToAbove DistToBelow
1:  1    A FALSE          NA           7
2:  8    A  TRUE           7          NA
3:  9    B  TRUE          NA          11
4: 20    B FALSE          11          NA
5: 32    C  TRUE          NA           1
6: 33    C  TRUE           1          NA

（注意我使用的是data.table v1.9.7）

What gives? (Note I'm using data.table v1.9.7)

推荐答案

运行ID，即ID相差1的连续行，但我只想考虑同一Char组内的运行。

I want to identify "runs" by ID, i.e. consecutive rows where the ID differs by 1, but I only want to consider runs within the same Char group.

以下是我的处理方式：

dt[, run_id := cumsum(
  ( ID != shift(ID, fill = ID[1L]) + 1L )
  |
  ( Char != shift(Char, fill = Char[1L]) )
)]
dt[, in_run := .N > 1L, by=.(Char, run_id)]

   ID Char run_id in_run
1:  1    A      1  FALSE
2:  8    A      2  FALSE
3:  9    B      3  FALSE
4: 20    B      4  FALSE
5: 32    C      5   TRUE
6: 33    C      5   TRUE

此代码标识所有运行（包括长度为1的运行），然后测试大于1的长度（OP的定义）。

This code identifies all runs (including those with length of one) and then tests for length greater than one (the OP's definition).

关于OP的方法：

dt[abs(ID - shift(ID, type="lag")) <= 1 | abs(shift(ID, type="lead") - ID) <= 1, # i
  InRun := TRUE # j
  , by=Char] # by

在 DT [i，j，by] i ，然后用通过分组，然后计算 j 。您不能以 i 按此处尝试的方式进行小组计算。

In DT[i,j,by] the steps are: filter using i, then group with by, then calculate j. You can't do by-group calculations in i in the way attempted here.

这篇关于使用data.table的shift（）按组（bug？）的意外结果的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用data.table的shift（）按组（bug？）的意外结果 [英] Unexpected result using data.table's shift() by group (bug?)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用data.table的shift（）按组（bug？）的意外结果 [英] Unexpected result using data.table&#39;s shift() by group (bug?)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

使用data.table的shift（）按组（bug？）的意外结果 [英] Unexpected result using data.table's shift() by group (bug?)

登录关闭