R arulesSequences查找序列支持哪些模式 [英] R arulesSequences Find which patterns are supported by a sequence

查看:75
本文介绍了R arulesSequences查找序列支持哪些模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中的arulesSequences库上遇到了麻烦

I'm having troubles with the arulesSequences library in R

我有一个带有时间信息的事务数据集(在这里,让我们使用默认的zaki数据集).我使用SPADE(cspade函数)在数据集中查找频繁的子序列.

I have a transactional dataset with temporal information (here, let's use the default zaki dataset). I use SPADE (cspade function) to find the frequent subsequences in the dataset.

library(arulesSequences)
data(zaki)
frequent_sequences <- cspade(zaki, parameter=list(support=0.5))

现在,我要为每个序列(即每个custumer)找到它支持的频繁子序列.我尝试了%in%subset的各种组合,但没有成功.

Now, what I want is to find, for each sequence (i.e. for each custumer) which are the frequent subsequences that it supports. I tried various combinations of %in% and subset without much success.

例如,对于第二个客户,初始交易inspect(zaki[zaki@itemsetInfo$sequenceID==2])是:

For example for the second custumer, the initial transactions inspect(zaki[zaki@itemsetInfo$sequenceID==2]) are:

items     sequenceID eventID SIZE
5 {A,B,F} 2          15      3   
6 {E}     2          20      1 

整个数据集inspect(frequent_sequences)中的频繁序列是:

The frequent sequences in the whole dataset inspect(frequent_sequences) are:

items support 
1 <{A}>    1.00 
2 <{B}>    1.00 
3 <{D}>    0.50 
4 <{F}>    1.00 
5 <{A, F}>    0.75 
6 <{B, F}>    1.00 
7 <{D}, {F}>    0.50 
8 <{D}, {B, F}>    0.50 
9 <{A, B, F}>    0.75 
10 <{A, B}>    0.75 
11 <{D}, {B}>    0.50 
12 <{B}, {A}>    0.50 
13 <{D}, {A}>    0.50 
14 <{F}, {A}>    0.50 
15 <{D}, {F}, {A}>    0.50 
16 <{B, F}, {A}>    0.50 
17 <{D}, {B, F}, {A}>    0.50 
18 <{D}, {B}, {A}>    0.50 

我想看到的是,客户2支持频繁序列1、2、4、5、6、9和10,但不支持其他序列.

What I'd like to see is that customer 2 supports the frequent sequences 1, 2, 4, 5, 6, 9 and 10, but does not support the others.

我也可以确定反向信息:支持给定频繁子序列的基本序列是哪些? R以某种方式知道此信息,因为它使用它来计算频繁序列的支持.

I could also settle for the reverse information: which are the base sequences that support a given frequent subsequence? R somehow knows this information, since it uses it to compute the support of the frequent sequences.

在我看来,这应该很容易(可能是!),但我似乎无法弄清楚……

It seems to me that this should be easy (and it probably is!) but I can't seem to figure it out...

有什么主意吗?

推荐答案

经过一些头脑冷静的挖掘,我找到了一种方法,而且确实很容易...因为support函数可以完成此工作!

After some cool-headed digging, I found a way to do it, and indeed, it was easy... since the support function does the job!

ids <- unique(zaki@itemsetInfo$sequenceID)
encoding <- data.frame()

# Prepare the data.frame: as many columns as there are frequent sequences
for (seq_id in 1:length(frequent_sequences)){
    encoding[,labels(frequent_sequences[seq_id])] <- logical(0)
}

# Fill the rows
for (id in ids){
    transaction_subset <- zaki[zaki@itemsetInfo$sequenceID==id]
    encoding[id, ] <- as.logical(
        support(frequent_sequences, transaction_subset, type="absolute")
        )
}

可能会有更多美学方法来达到目的,但这会产生预期的结果:

There might be more aesthetic ways to reach the result, but this yields the expected result:

> encoding
  <{A}> <{B}> <{D}> <{F}> <{A,F}> <{B,F}> <{D},{F}> <{D},{B,F}> <{A,B,F}>
1  TRUE  TRUE  TRUE  TRUE    TRUE    TRUE      TRUE        TRUE      TRUE
2  TRUE  TRUE FALSE  TRUE    TRUE    TRUE     FALSE       FALSE      TRUE
3  TRUE  TRUE FALSE  TRUE    TRUE    TRUE     FALSE       FALSE      TRUE
4  TRUE  TRUE  TRUE  TRUE   FALSE    TRUE      TRUE        TRUE     FALSE
  <{A,B}> <{D},{B}> <{B},{A}> <{D},{A}> <{F},{A}> <{D},{F},{A}> <{B,F},{A}>
1    TRUE      TRUE      TRUE      TRUE      TRUE          TRUE        TRUE
2    TRUE     FALSE     FALSE     FALSE     FALSE         FALSE       FALSE
3    TRUE     FALSE     FALSE     FALSE     FALSE         FALSE       FALSE
4   FALSE      TRUE      TRUE      TRUE      TRUE          TRUE        TRUE
  <{D},{B,F},{A}> <{D},{B},{A}>
1            TRUE          TRUE
2           FALSE         FALSE
3           FALSE         FALSE
4            TRUE          TRUE

这篇关于R arulesSequences查找序列支持哪些模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆