使VLMC适合很长的序列 [英] Fitting a VLMC to very long sequences

查看:115
本文介绍了使VLMC适合很长的序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将VLMC拟合到最长序列为296个状态的数据集.我这样做,如下所示:

I am trying to fit a VLMC to a dataset where the longest sequence is 296 states. I do it as shown below:

# Load libraries
library(PST)
library(RCurl)
library(TraMineR)

# Load and transform data
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/241ef39125ecb55a85b43d7f4cd3d58f617b2ecf/challenge_level.csv")
data <- read.csv(text = x)

data.seq <- seqdef(data[,2:ncol(data)], missing = NA, right = NA, nr = "*")
S1 <- pstree(data.seq, ymin = 0.01, lik = TRUE, with.missing = TRUE, nmin = 2)

但是,这会产生以下错误:

This, however, yields the following error:

Error in res[i, , drop = FALSE] : subscript out of bounds

如何将模型拟合到具有如此长序列的数据?有没有很好的理由限制模型的长度?

How can I fit the model to data with sequences this long? Are there any good justifications for limiting the length within the model?

推荐答案

问题出在您的数据上.通过不在pstree函数中设置L表示您要拟合最大阶模型.拟合过程会在L = 8时产生错误,因为您有nmin = 2,但按此顺序,只有一个上下文具有nmin = 2

The problem comes from your data. By not setting L in the pstree function, you mean that you want to fit a model of maximum order. The fitting process produces an error at L=8, since you have nmin=2 but at this order only one context has nmin=2

> cprob(data.seq, L=8, nmin=2)
 [>] 21 sequences, min/max length: 19/296
 [>] computing prob., L=8, 2043 distinct context(s)
 [>] removing 1894 context(s) where n<2
 [>] total time: 0.156 secs
                        EX  FA I1  I2 I3 N1 N2 N3 NR QU TR [n]
I2-I3-FA-I3-EX-I3-EX-I2  0 0.5  0 0.5  0  0  0  0  0  0  0   2

使用L = 8拟合模型效果很好

Fitting a model using L=8 works fine

S1 <- pstree(data.seq, ymin = 0.01, lik = TRUE, nmin = 2, L=8) 

 [>] 21 sequence(s) - min/max length: 19/296
 [>] max. depth L=8, nmin=2, ymin=0.01
     [L]  [nodes]
       0        1
       1       11
       2       99
       3      368
       4      340
       5      126
       6       34
       7        4
       8        1
 [>] computing sequence(s) likelihood ... (0.804 secs)
 [>] total time: 2.968 secs

同样,您无需在seqdef()中使用任何'missing','right'或'nr'选项,也无需在pstree()中使用'with.missing'

Again, you don't need to use any 'missing', 'right' or 'nr' option in seqdef(), nor 'with.missing' in pstree()

最好, 亚历克西斯

这篇关于使VLMC适合很长的序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆