TraMineR:如果给出事件子序列,我可以得到完整的序列吗? [英] TraMineR: Can I get the complete sequence if I give an event sub sequence?
问题描述
我有一个如下的序列数据集:
I have a sequence dataset like below:
customerid flag 0 1 2 3 4 5 6 7 8 9 10 11
abc234 1 3 4 3 4 5 8 4 3 3 2 14 14
abc233 0 4 4 4 4 4 4 4 4 4 4 4 4
qpr81 0 9 8 7 8 8 7 8 8 7 8 8 7
qnr94 0 14 14 14 2 14 14 14 14 14 14 14 14
列0
至11
中的值是序列.有两组客户,它们的标志分别为flag = 1和flag = 0,我对这两组都有不同的事件序列. (这里仅显示两组频率和残差)
Values in column 0
to 11
are the sequences. There are two sets of customers with flag=1 and flag=0, I have differentiating event sequences for both sets. ( Only frequencies and residuals for 2 groups are shown here)
Subsequence Freq.0 Freq.1 Resid.0 Resid.1
(3>4) 0.19208177 0.0753386 5.540793 -21.43304
(4>5) 0.15752553 0.059960497 5.115241 -19.78691
(5>4) 0.15950556 0.062782167 5.037413 -19.48586
我想找到事件序列匹配的客户ID和标志.
I want to find the customer ids and the flags for which the event sequences match.
我应该编写一个python脚本来遍历事务吗,还是R中有一些直接方法可以做到这一点?
Should I write a python script to traverse the transactions or is there some direct method in R to do this?
`
CODE
--------------
library(TraMineR)
custid=c(a1,a2,a3,b4,b5,c6,c7,d8,d9)#sample customer ids
flag=c(0,0,0,1,0,1,1,0,1)#flag
col1=c(14,14,14,14,14,5,14,14,2)
col2=c(14,14,3,14,3,14,6,3,3)
col3=c(14,2,2,14,2,14,2,2,2)
col4=c(14,2,2,14,2,14,2,2,14)
df=data.frame(custid,flag,col1,col2,col3,col4)#dataframe generation
print(df)
#Defining sequence from col1 to col4
df.s<-seqdef(df,3:6)
print(df.s)
#finding the transitions
transition<-seqetm(df.s,method='transition')
print(transition)
#converting to TSE format
df.tse=seqformat(df.s,from='SPS',to='TSE',tevent = transition)
print(df.tse)
#Event sequence generation
df.seqe=seqecreate(id=df.tse$id,timestamp=df.tse$time,event=df.tse$event)
print(df.seqe)
#subsequences
fsubseq <- seqefsub(df.seqe, pMinSupport = 0.01)
print(fsubseq)
groups <- factor(df$flag>0,labels=c(1,0))
#finding differentiating event sequences based on flag using ChiSquare test
diff <- seqecmpgroup(fsubseq, group = df$flag, method = "chisq")
#Using seqeapplysub for finding the presence of subsequences?
presence=seqeapplysub(fsubseq,method="presence")
print(presence[1:3,3:1])
`
谢谢
推荐答案
据我了解,您拥有状态序列,并已使用TraMineR
的seqecreate
函数将它们转换为事件序列.您正在考虑的事件是状态更改.因此,(3>4)
代表仅包含一个事件的子序列,即事件3>4
(从3切换到4).然后,使用seqefsub
和seqecmpgroup
函数确定可以最好地区分两个标志的事件子序列.
From what I understand, you have state sequences and have transformed them into event sequences using the seqecreate
function of TraMineR
. The events you are considering are the state changes. Thus (3>4)
stands for a subsequence with only one event, namely the event 3>4
(switching from 3 to 4). Then, you identify the event subsequences that best discriminate your two flags using the seqefsub
and seqecmpgroup
functions.
如果这是正确的,则可以使用seqeapplysub
函数识别包含每个子序列的序列.我无法在此处说明,因为您没有在问题中提供任何代码.查看seqeapplysub
函数的联机帮助.
If this is correct, then you can identify the sequences containing each subsequence with the seqeapplysub
function. I cannot illustrate here because you do not provide any code in your question. Look at the online help of the seqeapplysub
function.
========引用您添加的代码进行更新=======
======= update referring to your added code =======
在这里,您将获得包含最具区分性的子序列的序列的ID.
Here is how you get the ids of the sequences that contain the most discriminating subsequence.
首先,我们从您的diff
对象中提取前三个最有区别的序列.其次,我们计算presence
矩阵,该矩阵为每个提取的子序列提供一列,其中包含子序列的序列为1,否则为0.
First we extract the first three most discriminating sequences from your diff
object. Second, we compute the presence
matrix that provides a column for each extracted subsequence with a 1 in regard of the sequences that contain the subsequence and 0 otherwise.
diffseq <- seqefsub(df.seqe, strsubseq = paste(diff$subseq[1:3]))
(presence=seqeapplysub(diffseq, method="presence"))
现在,您将获得第一个子序列的ID,
Now you get the ids for the first subsequence with
custid[presence[,1]==1]
第二个是custid[presence[,2]==1]
等.
同样,您会得到标志
flag[presence[,1]==1]
希望这会有所帮助.
这篇关于TraMineR:如果给出事件子序列,我可以得到完整的序列吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!