聚合数据表到原始值的间隔行 [英] aggregate data.table to rows of intervals of original values

查看：133 发布时间：2017/3/12 13:02:56 r data.table

本文介绍了聚合数据表到原始值的间隔行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些 data.table 和一个金额列：

I have some data.table with an amount column like:

n = 1e5
set.seed(1)

dt <- data.table(id = 1:n, amount = pmax(0,rnorm(n, mean = 5e3, sd = 1e4)))

$ b

And a vector of breaks given like:

breaks <- as.vector( c(0, t(sapply(c(1, 2.5, 5, 7.5), function(x) x * 10^(1:4))) ) )

想要使用 data.table 语法：

$ c> amount 包含

获得等于或大于左边界的计数 amount n *（1-cdf（amount））

get counts of amount contained
get counts of amount equal to or greater than the left bound (basically n * (1-cdf(amount))

大部分工作，但不为空间隔返回行：

For 1, this mostly works, but doesn't return rows for the empty intervals:

dt[, .N, keyby = breaks[findInterval(amount,breaks)] ] #would prefer to get 0 for empty intvl

dt[, sum(amount >= thresh[.GRP]), keyby = breaks[findInterval(amount,breaks)]  ]

但无效，因为 sum 被限制在组内，而不是超出。所以想出了一个解决方法，它也返回空的时间间隔：

but it didn't work because sum is restricted to within the group, not beyond. So came up with a workaround, which also returns the empty intervals:

dt[, cbind(breaks, sapply(breaks, function(x) sum(amount >= x)))] # desired result

data.table 修复我的方法，并得到两个空的时间间隔？

So, what's the data.table way to fix my 2. and to get the empty intervals for both?

推荐答案

我会考虑这样做：

mybreaks = c(-Inf, breaks, Inf)
dt[, g := cut(amount, mybreaks)]
dt[.(g = levels(g)), .N, on="g", by=.EACHI]


                  g     N
 1:        (-Inf,0] 30976
 2:          (0,10]    23
 3:         (10,25]    62
 4:         (25,50]    73
 5:         (50,75]    85
 6:        (75,100]    88
 7:       (100,250]   503
 8:       (250,500]   859
 9:       (500,750]   916
10:     (750,1e+03]   912
11: (1e+03,2.5e+03]  5593
12: (2.5e+03,5e+03]  9884
13: (5e+03,7.5e+03]  9767
14: (7.5e+03,1e+04]  9474
15: (1e+04,2.5e+04] 28434
16: (2.5e+04,5e+04]  2351
17: (5e+04,7.5e+04]     0
18:  (7.5e+04, Inf]     0

您可以使用 cumsum 如果你想要CDF。

You can use cumsum if you want the CDF.

这篇关于聚合数据表到原始值的间隔行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

聚合数据表到原始值的间隔行 [英] aggregate data.table to rows of intervals of original values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

聚合数据表到原始值的间隔行 [英] aggregate data.table to rows of intervals of original values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭