R中seqdef的级别错误 [英] Error in levels for seqdef in R
问题描述
我每次尝试对已经使用seqformat转换为STS格式的数据运行seqdef时,都会看到此错误.我的数据框示例看起来像
I've seen this error everytime I try to run seqdef on my data that has already been converted to STS format using seqformat. A sample of my dataframe looks like
head(df.new, 10)
user_id orderdate cart to
1 8 1 produce 30
2 8 31 produce 60
3 8 61 produce 70
4 8 71 produce 92
5 10 1 produce 30
6 10 31 produce 42
7 10 43 meat seafood 56
8 10 57 deli 77
9 17 1 beverages 3
10 17 4 beverages 8
它总共有14000行订单,并且每个用户在同一天发生了一些订单(即orderdate == to).下面是我用来创建STS数据的代码,这些代码用作seqdef的输入.
It has a total of 14000 rows of orders and there are some orders which occur on the same day for each user (i.e. orderdate == to). Below are the codes that I have used to create my STS data which is used as input to seqdef.
df.form <- seqformat(df.new, id='user_id', begin='orderdate', end='to', status='cart', from='SPELL', to='STS', process=FALSE)
df.seq <- seqdef(df.form, left='DEL', right = 'unknown', xtstep=10, void = 'unknown')
运行seqdef时收到的错误消息是
The error message I get when running the seqdef is
[>] found missing values ('NA') in sequence data
[>] preparing 35000 sequences
[>] coding void elements with 'unknown' and missing values with '*'
[>] 21 distinct states appear in the data:
1 = alcohol
2 = babies
3 = bakery
4 = beverages
5 = breakfast
6 = bulk
7 = canned goods
8 = dairy eggs
9 = deli
10 = dry goods pasta
11 = frozen
12 = household
...
[>] adding special state(s) to the alphabet: unknown
Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, :
factor level [24] is duplicated
我尝试删除orderdate == to的那些订单,并且仍然发生相同的错误.我将很乐意为您解决此问题提供帮助.谢谢.
I tried removing those orders where orderdate == to and the same error still occurs. I would appreciate any help I can get to fix this problem. Thanks.
推荐答案
发生此错误是因为您使用相同的代码(未知")来处理正确的缺失和空白.
The error occurs because you are using the same code ('unknown') for right missings and voids.
当序列中包含缺失"时,在诸如seqdist
或seqdplot
之类的函数中设置with.missing = TRUE
时,这些缺失将被视为单独的状态,而void用于调整行长,只是在绘制序列(seqplot
)或计算差异(seqdist
)时将被忽略.
When the sequences contain 'missings', these missings will be considered as a separate state when you set with.missing = TRUE
in functions such as seqdist
or seqdplot
, while voids are used to adjust the row lengths and are simply ignored when plotting the sequences (seqplot
) or computing dissimilarities (seqdist
).
这篇关于R中seqdef的级别错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!