我的目标是根据先前的事件预测每个id_num的接下来的3个事件 [英] My objective is to predict the next 3 events of each id_num based on their previous events

查看:119
本文介绍了我的目标是根据先前的事件预测每个id_num的接下来的3个事件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是数据科学的新手,我正在研究一个看起来像下面的示例数据的模型.但是,在原始数据中,有许多id_numEvents.我的目标是根据每个id_num的先前Events预测接下来的3个事件.

I am new to data science and I am working on a model that kind of looks like the sample data shown below. However in the orginal data there are many id_num and Events. My objective is to predict the next 3 events of each id_num based on their previous Events.

请使用R编程帮助我解决此问题或解决方法.

Please help me in solving this or regarding the method to be used for solving, using R programming.

推荐答案

最简单的预测"是假设字母序列将对每个id_num重复.我希望这与OP通过预测"所理解的一致.

The simplest "prediction" is to assume that the sequence of letters will repeat for each id_num. I hope this is in line what the OP understands by "prediction".

代码

library(data.table)
DT[, .(Events = append(Events, head(rep(Events, 3L), 3L))), by = id_num]

创建

    id_num Events
 1:      1      A
 2:      1      B
 3:      1      C
 4:      1      D
 5:      1      E
 6:      1      A
 7:      1      B
 8:      1      C
 9:      2      B
10:      2      E
11:      2      B
12:      2      E
13:      2      B
14:      3      E
15:      3      A
16:      3      E
17:      3      A
18:      3      E
19:      3      A
20:      3      E
21:      4      C
22:      4      C
23:      4      C
24:      4      C
25:      5      F
26:      5      G
27:      5      F
28:      5      G
29:      5      F
    id_num Events

在这里使用

data.table是因为它易于使用的分组功能,并且我很熟悉它.

data.table is used here because of the easy to use grouping function and because I'm acquainted with it.

对于每个id_num,使用rep()将现有字母序列复制3次,以确保有足够的值来填充至少3个下一个值.但是,只有前3个值是使用head()获取的.对于每个id_num

For each id_num the existing sequence of letters is replicated 3 times using rep() to ensure enough values to fill at least 3 next values. But, only the first 3 values are taken using head(). These 3 values are appended to the existing sequence for each id_num

有两种可能的优化方法:

There are two possible optimisations:

  1. 如果值的序列比预测n_pred的值的数量长得多,那么简单地重复长序列n_pred的时间就是浪费.
  2. 如果现有序列将再重复一次,则可以避免调用append().
  1. If the sequence of values is much longer than the number of values to predict n_pred, simply repeating the long sequence n_pred times is a waste.
  2. The call to append() can be avoided if the existing sequence will be repeated one more time.

因此,优化后的代码如下:

So, the optimised code looks like:

n_pred <- 3L
DT[, .(Events = head(rep(Events, 1L + ceiling(n_pred / .N)), .N + n_pred)), by = id_num]

请注意,.Ndata.table语法中的特殊符号,包含组中的数字行. head()现在返回原始序列加上预测值.

Note that .N is a special symbol in data.table syntax containing the number rows in a group. head() now returns the original sequence plus the predicted values.

DT <- data.table(
  id_num = c(rep(1L, 5L), 2L, 2L, rep(3L, 4L), 4L, 5L, 5L),
  Events = c(LETTERS[1:5], "B", "E", rep(c("E", "A"), 2L), "C", "F", "G")
)
DT

    id_num Events
 1:      1      A
 2:      1      B
 3:      1      C
 4:      1      D
 5:      1      E
 6:      2      B
 7:      2      E
 8:      3      E
 9:      3      A
10:      3      E
11:      3      A
12:      4      C
13:      5      F
14:      5      G

这篇关于我的目标是根据先前的事件预测每个id_num的接下来的3个事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆