TraMineR:如果给出事件子序列,我可以得到完整的序列吗? [英] TraMineR: Can I get the complete sequence if I give an event sub sequence?

查看:118
本文介绍了TraMineR:如果给出事件子序列,我可以得到完整的序列吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下的序列数据集:

I have a sequence dataset like below:

customerid    flag  0   1   2   3   4   5   6   7   8   9   10  11
abc234          1   3   4   3   4   5   8   4   3   3   2   14  14
abc233          0   4   4   4   4   4   4   4   4   4   4   4   4
qpr81           0   9   8   7   8   8   7   8   8   7   8   8   7
qnr94           0   14  14  14  2   14  14  14  14  14  14  14  14

011中的值是序列.有两组客户,它们的标志分别为flag = 1和flag = 0,我对这两组都有不同的事件序列. (这里仅显示两组频率和残差)

Values in column 0 to 11 are the sequences. There are two sets of customers with flag=1 and flag=0, I have differentiating event sequences for both sets. ( Only frequencies and residuals for 2 groups are shown here)

Subsequence Freq.0      Freq.1     Resid.0       Resid.1
(3>4)       0.19208177  0.0753386   5.540793    -21.43304
(4>5)       0.15752553  0.059960497 5.115241    -19.78691
(5>4)       0.15950556  0.062782167 5.037413    -19.48586

我想找到事件序列匹配的客户ID和标志.

I want to find the customer ids and the flags for which the event sequences match.

我应该编写一个python脚本来遍历事务吗,还是R中有一些直接方法可以做到这一点?

Should I write a python script to traverse the transactions or is there some direct method in R to do this?

`

CODE
--------------

library(TraMineR)

custid=c(a1,a2,a3,b4,b5,c6,c7,d8,d9)#sample customer ids
flag=c(0,0,0,1,0,1,1,0,1)#flag
col1=c(14,14,14,14,14,5,14,14,2)
col2=c(14,14,3,14,3,14,6,3,3)
col3=c(14,2,2,14,2,14,2,2,2)
col4=c(14,2,2,14,2,14,2,2,14)
df=data.frame(custid,flag,col1,col2,col3,col4)#dataframe generation
print(df)
#Defining sequence from col1 to col4
df.s<-seqdef(df,3:6)
print(df.s)
#finding the transitions
transition<-seqetm(df.s,method='transition')
print(transition)
#converting to TSE format
df.tse=seqformat(df.s,from='SPS',to='TSE',tevent = transition)
print(df.tse)
#Event sequence generation
df.seqe=seqecreate(id=df.tse$id,timestamp=df.tse$time,event=df.tse$event)
print(df.seqe)
#subsequences
fsubseq <- seqefsub(df.seqe, pMinSupport = 0.01)
print(fsubseq)
groups <- factor(df$flag>0,labels=c(1,0))
#finding differentiating event sequences based on flag using ChiSquare test
diff <- seqecmpgroup(fsubseq, group = df$flag, method = "chisq")

#Using seqeapplysub for finding the presence of subsequences?
presence=seqeapplysub(fsubseq,method="presence")
print(presence[1:3,3:1])

`

谢谢

推荐答案

据我了解,您拥有状态序列,并已使用TraMineRseqecreate函数将它们转换为事件序列.您正在考虑的事件是状态更改.因此,(3>4)代表仅包含一个事件的子序列,即事件3>4(从3切换到4).然后,使用seqefsubseqecmpgroup函数确定可以最好地区分两个标志的事件子序列.

From what I understand, you have state sequences and have transformed them into event sequences using the seqecreate function of TraMineR. The events you are considering are the state changes. Thus (3>4) stands for a subsequence with only one event, namely the event 3>4 (switching from 3 to 4). Then, you identify the event subsequences that best discriminate your two flags using the seqefsub and seqecmpgroup functions.

如果这是正确的,则可以使用seqeapplysub函数识别包含每个子序列的序列.我无法在此处说明,因为您没有在问题中提供任何代码.查看seqeapplysub函数的联机帮助.

If this is correct, then you can identify the sequences containing each subsequence with the seqeapplysub function. I cannot illustrate here because you do not provide any code in your question. Look at the online help of the seqeapplysub function.

========引用您添加的代码进行更新=======

======= update referring to your added code =======

在这里,您将获得包含最具区分性的子序列的序列的ID.

Here is how you get the ids of the sequences that contain the most discriminating subsequence.

首先,我们从您的diff对象中提取前三个最有区别的序列.其次,我们计算presence矩阵,该矩阵为每个提取的子序列提供一列,其中包含子序列的序列为1,否则为0.

First we extract the first three most discriminating sequences from your diff object. Second, we compute the presence matrix that provides a column for each extracted subsequence with a 1 in regard of the sequences that contain the subsequence and 0 otherwise.

diffseq <- seqefsub(df.seqe, strsubseq = paste(diff$subseq[1:3]))
(presence=seqeapplysub(diffseq, method="presence"))

现在,您将获得第一个子序列的ID,

Now you get the ids for the first subsequence with

custid[presence[,1]==1]

第二个是custid[presence[,2]==1]等.

同样,您会得到标志

flag[presence[,1]==1]

希望这会有所帮助.

这篇关于TraMineR:如果给出事件子序列,我可以得到完整的序列吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆