聚合数据表到原始值的间隔行 [英] aggregate data.table to rows of intervals of original values
本文介绍了聚合数据表到原始值的间隔行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一些 data.table
和一个金额列:
I have some data.table
with an amount column like:
n = 1e5
set.seed(1)
dt <- data.table(id = 1:n, amount = pmax(0,rnorm(n, mean = 5e3, sd = 1e4)))
$ b
And a vector of breaks given like:
breaks <- as.vector( c(0, t(sapply(c(1, 2.5, 5, 7.5), function(x) x * 10^(1:4))) ) )
想要使用 data.table
语法:
- $ c> amount 包含
- 获得等于或大于左边界的计数
amount
n *(1-cdf(amount))
- get counts of
amount
contained - get counts of
amount
equal to or greater than the left bound (basicallyn * (1-cdf(amount))
大部分工作,但不为空间隔返回行:
For 1, this mostly works, but doesn't return rows for the empty intervals:
dt[, .N, keyby = breaks[findInterval(amount,breaks)] ] #would prefer to get 0 for empty intvl
dt[, sum(amount >= thresh[.GRP]), keyby = breaks[findInterval(amount,breaks)] ]
但无效,因为 sum
被限制在组内,而不是超出。所以想出了一个解决方法,它也返回空的时间间隔:
but it didn't work because sum
is restricted to within the group, not beyond. So came up with a workaround, which also returns the empty intervals:
dt[, cbind(breaks, sapply(breaks, function(x) sum(amount >= x)))] # desired result
data.table
修复我的方法,并得到两个空的时间间隔?
So, what's the data.table
way to fix my 2. and to get the empty intervals for both?
推荐答案
我会考虑这样做:
mybreaks = c(-Inf, breaks, Inf)
dt[, g := cut(amount, mybreaks)]
dt[.(g = levels(g)), .N, on="g", by=.EACHI]
g N
1: (-Inf,0] 30976
2: (0,10] 23
3: (10,25] 62
4: (25,50] 73
5: (50,75] 85
6: (75,100] 88
7: (100,250] 503
8: (250,500] 859
9: (500,750] 916
10: (750,1e+03] 912
11: (1e+03,2.5e+03] 5593
12: (2.5e+03,5e+03] 9884
13: (5e+03,7.5e+03] 9767
14: (7.5e+03,1e+04] 9474
15: (1e+04,2.5e+04] 28434
16: (2.5e+04,5e+04] 2351
17: (5e+04,7.5e+04] 0
18: (7.5e+04, Inf] 0
您可以使用 cumsum
如果你想要CDF。
You can use cumsum
if you want the CDF.
这篇关于聚合数据表到原始值的间隔行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文