在R?中使用PAM对事务数据进行聚类? [英] Clustering transactional data using PAM in R?
问题描述
我需要将交易集分为不同的组。
我的数据以这种格式存储在文本文件中:
T1 17 20 22 35 37 60 62
T2 39 51 53 54 57 65 73
T3 17 20 21 22 34 37 62
T4 20 22 54 57 65 73 45
T5 20 54 57 65 73 75 80
T6 2 20 54 57 59 63 71
T7 2 20 22 57 59 71 66
T8 17 20 28 29 30 34 35
T9 16 20 28 32 54 57 65
T10 16 20 22 28 57 59 71
-
-
等,超过5000行。
每行代表一项交易。
到目前为止,我所做的事情:
txIn< -read.transactions( data2.txt,format = basket,sep =)
d< -dissimilarity(txIn,method = Jaccard)
library( cluster)
clustersA <-pam(d,k = 100)
txOut<-paste( txOu, .txt)
write.table(clustersA $群集,file = txOu,sep =)
,但是文件存储了其簇如下:
x
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2
9 1
10 2
-
-
并且我需要将其另存为:
集群1:
T1 17 20 22 35 37 60 62
T2 39 51 53 54 57 65 73
T3 17 20 21 22 34 37 62
T4 20 22 54 57 65 73 45
T5 20 54 57 65 73 75 80
T9 16 20 28 32 54 57 65
集群2:
T6 2 20 54 57 59 63 71
T7 2 20 22 57 59 71 66
T8 17 20 28 29 30 34 35
T10 16 20 22 28 28 59 59 71
-
-
等,
,因为我想分别处理每个群集。 / p>
请我搜索了很多东西,我需要任何信息,例如文档,任何帮助。
您确定要进行集群吗?
对我来说,听起来您可能更感兴趣在频繁项集挖掘中。
I need to group sets of transactions in different groups. My data in a text file as this format:
T1 17 20 22 35 37 60 62
T2 39 51 53 54 57 65 73
T3 17 20 21 22 34 37 62
T4 20 22 54 57 65 73 45
T5 20 54 57 65 73 75 80
T6 2 20 54 57 59 63 71
T7 2 20 22 57 59 71 66
T8 17 20 28 29 30 34 35
T9 16 20 28 32 54 57 65
T10 16 20 22 28 57 59 71
-
-
and so on, over 5000 lines. Each line represents one transaction.
What I did so far:
txIn<-read.transactions("data2.txt",format="basket",sep=" ")
d<-dissimilarity(txIn,method="Jaccard")
library("cluster")
clustersA<-pam(d,k=100)
txOut <- paste("txOu", ".txt")
write.table(clustersA$clustering, file="txOu",sep=" ")
but the file stores the transaction# with its cluster like:
"x"
"1" 1
"2" 1
"3" 1
"4" 1
"5" 1
"6" 2
"7" 2
"8" 2
"9" 1
"10" 2
-
-
and I need to save it as, for example:
cluster 1:
T1 17 20 22 35 37 60 62
T2 39 51 53 54 57 65 73
T3 17 20 21 22 34 37 62
T4 20 22 54 57 65 73 45
T5 20 54 57 65 73 75 80
T9 16 20 28 32 54 57 65
cluster 2:
T6 2 20 54 57 59 63 71
T7 2 20 22 57 59 71 66
T8 17 20 28 29 30 34 35
T10 16 20 22 28 57 59 71
-
-
and so on, because I want to deal with each cluster individually.
Please I have searched a lot, I need any information, example, doc, any help.
Are you sure you want to do clustering?
To me, it sounds like you might be more interested in frequent itemset mining.
这篇关于在R?中使用PAM对事务数据进行聚类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!