使用data.table在每行中指定值范围内的值计数 [英] Count of values within specified range of value in each row using data.table
问题描述
要为分类变量的每个级别(或级别的组合)计算一列计数,可以使用
来处理表语法,例如:
To come up with a column of counts for each level (or combination of levels) for categorical variables is data.table syntax can be handled with something like:
#setting up the data so it's pasteable
df <- data.table(var1 = c('dog','cat','dog','cat','dog','dog','dog'),
var2 = c(1,5,90,95,91,110,8),
var3 = c('lamp','lamp','lamp','table','table','table','table'))
#adding a count column for var1
df[, var1count := .N, by = .(var1)]
#adding a count of each combo of var1 and var3
df[, var1and3comb := .N, by = .(var1,var3)]
我很好奇我如何可以产生一个count列来计算带有在var2的每个值的+-5范围内。
I am curious as to how I could instead produce a count column that counts the number of records with a value that is within +- 5 from each value of var2.
在我对此无法正常工作的尝试中,
In my non-functioning attempt at this,
df[, var2withinrange := .N, by = .(between((var2-5),(var2+5),var2))]
我得到一列记录总数,而不是期望的结果。我希望第一行的值保持为2,因为1和5属于该范围。第2行的值应为3,因为1、5和8都落在了5的范围内,依此类推。
I get a column with the total number of records as opposed to the desired result. I'd be hoping for the first row to hold a value of 2, since the 1 and 5 fall into that range. Row 2 should have a value of 3, since the 1, 5, and 8 all fall into that range for the 5, and so on.
任何帮助解决方案倍受赞赏。理想的是使用data.table代码!
Any help on coming up with a solution is much appreciated. Ideally in data.table code!
推荐答案
具有 data.table :
df[, var2withinrange := df[.(var2min = var2 - 5, var2plus = var2 + 5)
, on = .(var2 >= var2min, var2 <= var2plus)
, .N
, by = .EACHI][, N]][]
给出:
> df
var1 var2 var3 var2withinrange
1: dog 1 lamp 2
2: cat 5 lamp 3
3: dog 90 lamp 3
4: cat 95 table 3
5: dog 91 table 3
6: dog 110 table 1
7: dog 8 table 2
这篇关于使用data.table在每行中指定值范围内的值计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!