R arulesSequences查找序列支持哪些模式 [英] R arulesSequences Find which patterns are supported by a sequence
问题描述
我在R中的arulesSequences
库上遇到了麻烦
I'm having troubles with the arulesSequences
library in R
我有一个带有时间信息的事务数据集(在这里,让我们使用默认的zaki
数据集).我使用SPADE(cspade
函数)在数据集中查找频繁的子序列.
I have a transactional dataset with temporal information (here, let's use the default zaki
dataset). I use SPADE (cspade
function) to find the frequent subsequences in the dataset.
library(arulesSequences)
data(zaki)
frequent_sequences <- cspade(zaki, parameter=list(support=0.5))
现在,我要为每个序列(即每个custumer)找到它支持的频繁子序列.我尝试了%in%
和subset
的各种组合,但没有成功.
Now, what I want is to find, for each sequence (i.e. for each custumer) which are the frequent subsequences that it supports. I tried various combinations of %in%
and subset
without much success.
例如,对于第二个客户,初始交易inspect(zaki[zaki@itemsetInfo$sequenceID==2])
是:
For example for the second custumer, the initial transactions inspect(zaki[zaki@itemsetInfo$sequenceID==2])
are:
items sequenceID eventID SIZE
5 {A,B,F} 2 15 3
6 {E} 2 20 1
整个数据集inspect(frequent_sequences)
中的频繁序列是:
The frequent sequences in the whole dataset inspect(frequent_sequences)
are:
items support
1 <{A}> 1.00
2 <{B}> 1.00
3 <{D}> 0.50
4 <{F}> 1.00
5 <{A, F}> 0.75
6 <{B, F}> 1.00
7 <{D}, {F}> 0.50
8 <{D}, {B, F}> 0.50
9 <{A, B, F}> 0.75
10 <{A, B}> 0.75
11 <{D}, {B}> 0.50
12 <{B}, {A}> 0.50
13 <{D}, {A}> 0.50
14 <{F}, {A}> 0.50
15 <{D}, {F}, {A}> 0.50
16 <{B, F}, {A}> 0.50
17 <{D}, {B, F}, {A}> 0.50
18 <{D}, {B}, {A}> 0.50
我想看到的是,客户2支持频繁序列1、2、4、5、6、9和10,但不支持其他序列.
What I'd like to see is that customer 2 supports the frequent sequences 1, 2, 4, 5, 6, 9 and 10, but does not support the others.
我也可以确定反向信息:支持给定频繁子序列的基本序列是哪些? R以某种方式知道此信息,因为它使用它来计算频繁序列的支持.
I could also settle for the reverse information: which are the base sequences that support a given frequent subsequence? R somehow knows this information, since it uses it to compute the support of the frequent sequences.
在我看来,这应该很容易(可能是!),但我似乎无法弄清楚……
It seems to me that this should be easy (and it probably is!) but I can't seem to figure it out...
有什么主意吗?
推荐答案
经过一些头脑冷静的挖掘,我找到了一种方法,而且确实很容易...因为support
函数可以完成此工作!
After some cool-headed digging, I found a way to do it, and indeed, it was easy... since the support
function does the job!
ids <- unique(zaki@itemsetInfo$sequenceID)
encoding <- data.frame()
# Prepare the data.frame: as many columns as there are frequent sequences
for (seq_id in 1:length(frequent_sequences)){
encoding[,labels(frequent_sequences[seq_id])] <- logical(0)
}
# Fill the rows
for (id in ids){
transaction_subset <- zaki[zaki@itemsetInfo$sequenceID==id]
encoding[id, ] <- as.logical(
support(frequent_sequences, transaction_subset, type="absolute")
)
}
可能会有更多美学方法来达到目的,但这会产生预期的结果:
There might be more aesthetic ways to reach the result, but this yields the expected result:
> encoding
<{A}> <{B}> <{D}> <{F}> <{A,F}> <{B,F}> <{D},{F}> <{D},{B,F}> <{A,B,F}>
1 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
2 TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
3 TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
4 TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
<{A,B}> <{D},{B}> <{B},{A}> <{D},{A}> <{F},{A}> <{D},{F},{A}> <{B,F},{A}>
1 TRUE TRUE TRUE TRUE TRUE TRUE TRUE
2 TRUE FALSE FALSE FALSE FALSE FALSE FALSE
3 TRUE FALSE FALSE FALSE FALSE FALSE FALSE
4 FALSE TRUE TRUE TRUE TRUE TRUE TRUE
<{D},{B,F},{A}> <{D},{B},{A}>
1 TRUE TRUE
2 FALSE FALSE
3 FALSE FALSE
4 TRUE TRUE
这篇关于R arulesSequences查找序列支持哪些模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!