findInterval（）在data.table R中具有不同的间隔 [英] findInterval() with varying intervals in data.table R

查看：359 发布时间：2017/3/12 12:06:26 r data.table intervals

本文介绍了findInterval（）在data.table R中具有不同的间隔的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

很久以前我就问过这个问题，但还没有找到答案。我不知道这是否合法在stackoverflow，但我repost它。

I have asked this question a long time ago, but haven't found the answer yet. I do not know if this is legit in stackoverflow, but I repost it.

我在R中有一个data.table，我想创建一个新列，用于查找相应年份/月份的每个价格的间隔。

I have a data.table in R and I want to create a new column that finds the interval for every price of the respective year/month.

可重现的例子：

Reproducible example:

set.seed(100) DT <- data.table(year=2000:2009, month=1:10, price=runif(5*26^2)*100) intervals <- list(year=2000:2009, month=1:10, interval = sort(round(runif(9)*100))) intervals <- replicate(10, (sample(10:100,100, replace=T))) intervals <- t(apply(intervals, 1, sort)) intervals.dt <- data.table(intervals) intervals.dt[, c("year", "month") := list(rep(2000:2009, each=10), 1:10)] setkey(intervals.dt, year, month) setkey(DT, year, month)

我刚刚尝试过：

按月/年合并 DT 和 intervals.dt data.tables，

创建一个新的 intervalsstring 列，其中包含所有的V *列到
一列字符串（不太优雅，我承认）最后

将其子字符串化为向量，以便在 findInterval（）中使用它，但是该解决方案不适用于每个row（！）

merging the DT and intervals.dt data.tables by month/year,

creating a new intervalsstring column consisting of all the V* columns to one column string, (not very elegant, I admit), and finally

substringing it to a vector, so as I can use it in findInterval() but the solution does not work for every row (!)

所以，之后：

DT <- merge(DT, intervals.dt) DT <- DT[, intervalsstring := paste(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10)] DT <- DT[, c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10") := NULL] DT[, interval := findInterval(price, strsplit(intervalsstring, " ")[[1]])]

我获得

> DT year month price intervalsstring interval 1: 2000 1 30.776611 12 21 36 46 48 51 63 72 91 95 2 2: 2000 1 62.499648 12 21 36 46 48 51 63 72 91 95 6 3: 2000 1 53.581115 12 21 36 46 48 51 63 72 91 95 6 4: 2000 1 48.830599 12 21 36 46 48 51 63 72 91 95 5 5: 2000 1 33.066053 12 21 36 46 48 51 63 72 91 95 2 --- 3376: 2009 10 33.635924 12 40 45 48 50 65 75 90 96 97 2 3377: 2009 10 38.993769 12 40 45 48 50 65 75 90 96 97 3 3378: 2009 10 75.065820 12 40 45 48 50 65 75 90 96 97 8 3379: 2009 10 6.277403 12 40 45 48 50 65 75 90 96 97 0 3380: 2009 10 64.189162 12 40 45 48 50 65 75 90 96 97 7

第一行，但不表示最后一行（或其他行）。
例如，对于行3380，价格〜64.19应该在第5个时间间隔，而不是第7个。我想我的错误是，通过我的最后一个命令，查找间隔仅依赖于 intervalsstring 的第一行。

which is correct for the first rows, but not for the last (or other) rows. For example, for the row 3380, the price ~64.19 should be in the 5th interval and not the 7th. I guess my mistake is that by my last command, finding Intervals relies only on the first row of intervalsstring.

谢谢！

推荐答案

你的主要问题是你刚才没有做 findInterval 为每个组。但我也没有看到这么大的合并 data.table 或粘贴 / strsplit 业务。这是我会做的：

Your main problem is that you just didn't do findInterval for each group. But I also don't see the point of making that large merged data.table, or the paste/strsplit business. This is what I would do:

DT[, interval := findInterval(price, intervals.dt[.BY][, V1:V10, with = F]), by = .(year, month)][] # year month price interval # 1: 2000 1 30.776611 2 # 2: 2000 1 62.499648 6 # 3: 2000 1 53.581115 6 # 4: 2000 1 48.830599 5 # 5: 2000 1 33.066053 2 # --- #3376: 2009 10 33.635924 1 #3377: 2009 10 38.993769 1 #3378: 2009 10 75.065820 7 #3379: 2009 10 6.277403 0 #3380: 2009 10 64.189162 5

注意 intervals.dt [.BY] 是一个键控子集。

这篇关于findInterval（）在data.table R中具有不同的间隔的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

findInterval（）在data.table R中具有不同的间隔 [英] findInterval() with varying intervals in data.table R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

findInterval（）在data.table R中具有不同的间隔 [英] findInterval() with varying intervals in data.table R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭