我的目标是根据先前的事件预测每个id_num的接下来的3个事件 [英] My objective is to predict the next 3 events of each id_num based on their previous events
问题描述
我是数据科学的新手,我正在研究一个看起来像下面的示例数据的模型.但是,在原始数据中,有许多id_num
和Events
.我的目标是根据每个id_num
的先前Events
预测接下来的3个事件.
I am new to data science and I am working on a model that kind of looks like the sample data shown below. However in the orginal data there are many id_num
and Events
. My objective is to predict the next 3 events of each id_num
based on their previous Events
.
请使用R编程帮助我解决此问题或解决方法.
Please help me in solving this or regarding the method to be used for solving, using R programming.
推荐答案
最简单的预测"是假设字母序列将对每个id_num
重复.我希望这与OP通过预测"所理解的一致.
The simplest "prediction" is to assume that the sequence of letters will repeat for each id_num
. I hope this is in line what the OP understands by "prediction".
代码
library(data.table)
DT[, .(Events = append(Events, head(rep(Events, 3L), 3L))), by = id_num]
创建
id_num Events
1: 1 A
2: 1 B
3: 1 C
4: 1 D
5: 1 E
6: 1 A
7: 1 B
8: 1 C
9: 2 B
10: 2 E
11: 2 B
12: 2 E
13: 2 B
14: 3 E
15: 3 A
16: 3 E
17: 3 A
18: 3 E
19: 3 A
20: 3 E
21: 4 C
22: 4 C
23: 4 C
24: 4 C
25: 5 F
26: 5 G
27: 5 F
28: 5 G
29: 5 F
id_num Events
在这里使用
data.table
是因为它易于使用的分组功能,并且我很熟悉它.
data.table
is used here because of the easy to use grouping function and because I'm acquainted with it.
对于每个id_num
,使用rep()
将现有字母序列复制3次,以确保有足够的值来填充至少3个下一个值.但是,只有前3个值是使用head()
获取的.对于每个id_num
For each id_num
the existing sequence of letters is replicated 3 times using rep()
to ensure enough values to fill at least 3 next values. But, only the first 3 values are taken using head()
. These 3 values are appended to the existing sequence for each id_num
有两种可能的优化方法:
There are two possible optimisations:
- 如果值的序列比预测
n_pred
的值的数量长得多,那么简单地重复长序列n_pred
的时间就是浪费. - 如果现有序列将再重复一次,则可以避免调用
append()
.
- If the sequence of values is much longer than the number of values to predict
n_pred
, simply repeating the long sequencen_pred
times is a waste. - The call to
append()
can be avoided if the existing sequence will be repeated one more time.
因此,优化后的代码如下:
So, the optimised code looks like:
n_pred <- 3L
DT[, .(Events = head(rep(Events, 1L + ceiling(n_pred / .N)), .N + n_pred)), by = id_num]
请注意,.N
是data.table
语法中的特殊符号,包含组中的数字行. head()
现在返回原始序列加上预测值.
Note that .N
is a special symbol in data.table
syntax containing the number rows in a group. head()
now returns the original sequence plus the predicted values.
DT <- data.table(
id_num = c(rep(1L, 5L), 2L, 2L, rep(3L, 4L), 4L, 5L, 5L),
Events = c(LETTERS[1:5], "B", "E", rep(c("E", "A"), 2L), "C", "F", "G")
)
DT
id_num Events
1: 1 A
2: 1 B
3: 1 C
4: 1 D
5: 1 E
6: 2 B
7: 2 E
8: 3 E
9: 3 A
10: 3 E
11: 3 A
12: 4 C
13: 5 F
14: 5 G
这篇关于我的目标是根据先前的事件预测每个id_num的接下来的3个事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!