如何使用回溯窗口在data.table中获取快速摘要？ [英] How to get quick summary in data.table with a look-back window?

查看：94 发布时间：2020/10/15 21:18:21 r data.table

本文介绍了如何使用回溯窗口在data.table中获取快速摘要？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

类似地，这是功能工程的一部分，该功能通过在一定的时间窗口内进行回顾，根据称为Col 的列汇总每个ID。相同的预处理将应用于测试集。由于数据集很大，因此可能更优选基于数据表的解决方案。

Similarly, this is a part of feature engineering that summarizes each ID depending on column called Col by looking back with certain time window. The same preprocess will be applied to the testing set. Since the data set is large, data.table based solution may be more preferred.

训练输入：

ID Time Col Count A 2017-06-05 M 1 A 2017-06-02 M 1 A 2017-06-03 M 1 B 2017-06-02 K 1 B 2017-06-01 M 4

通过应用两个回顾过去的日子，我们有：

By applying two days looking back, we have:

ID Time Time-2D Col Count A 2017-06-05 2017-06-03 M 1 #Time-2D by moving time two days back A 2017-06-02 2017-05-31 M 1 A 2017-06-03 2017-06-01 M 1 B 2017-06-02 2017-05-31 K 1 B 2017-06-01 2017-05-30 M 4

预期的输出（计数）

ID Time Time-2D Col_M Col_K A 2017-06-05 2017-06-03 1 0 #from 2017-06-03 to 2017-06-05, for user A, there are 0 (sum(count)) of K and 1 (sum(count)) of M. A 2017-06-02 2017-05-31 1 0 A 2017-06-03 2017-06-01 2 0 # 2 is because from 06-01 to 06-03, there is two rows in the first table (A 2017-06-02 M 1; A 2017-06-03 M 1) that the count summarization on M is 2. B 2017-06-02 2017-05-31 0 1 B 2017-06-01 2017-05-30 4 0

2。计算比率

根据上表，
预期产出（比率）：

2. Calculate ratio

Based on above table, Expected output (ratio):

ID Time Time-2D Col_M Col_K A 2017-06-05 2017-06-03 1 0 # 1/sum(1+0) A 2017-06-02 2017-05-31 1 0 A 2017-06-03 2017-06-01 1 0 #2/sum(2+0) B 2017-06-02 2017-05-31 0 1 B 2017-06-01 2017-05-30 1 0 # 4/sum(4+0)

以上用于处理训练数据。对于测试数据集，如果需要映射到Col_M，Col_K，则意味着，如果其他值（如S）出现在Col中，它将被忽略。

Above is for processing training data. For testing dataset, if requires to mapping over Col_M, Col_K, meaning, if other value like S appearing in Col, it will be ignored.

推荐答案

我想我理解您的要求。您似乎关心观察的顺序，例如，第二个观察 Time 是否在第一个观察 Time 。这没有多大意义，但是为了达到此目的，这里提供了一种高效的data.table解决方案。这基本上是通过 ID ， Col 和都 Time 列和行索引（基本上是显示顺序）。之后，只需 dcast 即可从长转换为宽（就像您上一个问题一样）。请注意，结果按日期排序，但是我保留了 rowindx 变量，因此您可以使用 setorder 。另外，我将比率calc保留给您，因为这是非常基本的（提示-不要使用循环，它是完全矢量化的一个衬里）
I think I understand your request. You seem to care about the order of the observations regardless if, for instance, the second observations Time is prior to the first observations Time. That doesn't make much sense, but here is a quit efficient data.table solution in order to achieve this. This is basically does a non-equi join by ID, Col, Both Time columns and the row index (which is basically the appearance order). Afterwards, it just dcast to convert from long to wide (like in your previous question). Note that the result is ordered by the dates, but I've kept the rowindx variable, so you can reorder it back using setorder. Also, I'll keep the ratio calc to you as this is very basic (hint - Don't use loops, it is a fully vectorized one liner) library(data.table) #v1.10.4+ ## Read the data DT <- fread("ID Time Col Count A 2017-06-05 M 1 A 2017-06-02 M 1 A 2017-06-03 M 1 B 2017-06-02 K 1 B 2017-06-01 M 4") ## Prepare the variables we need for the join DT[, Time := as.IDate(Time)] DT[, Time_2D := Time - 2L] DT[, rowindx := .I] ## Non-equi join, sum `Count` by each join DT2 <- DT[DT, sum(Count), on = .(ID, Col, rowindx <= rowindx, Time <= Time, Time >= Time_2D), by = .EACHI] ## Fix column names (a known issue) setnames(DT2, make.unique(names(DT2))) ## Long to wide (You can reorder back using `rowindx` and `setorder` function) dcast(DT2, ID + Time + Time.1 + rowindx ~ Col, value.var = "V1", fill = 0) # ID Time Time.1 rowindx K M # 1: A 2017-06-02 2017-05-31 2 0 1 # 2: A 2017-06-03 2017-06-01 3 0 2 # 3: A 2017-06-05 2017-06-03 1 0 1 # 4: B 2017-06-01 2017-05-30 5 0 4 # 5: B 2017-06-02 2017-05-31 4 1 0 这篇关于如何使用回溯窗口在data.table中获取快速摘要？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用回溯窗口在data.table中获取快速摘要？ [英] How to get quick summary in data.table with a look-back window?

问题描述

2。计算比率

2. Calculate ratio

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用回溯窗口在data.table中获取快速摘要？ [英] How to get quick summary in data.table with a look-back window?

问题描述

2。计算比率

2. Calculate ratio

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭