模拟R中的同现数据 [英] Simulating Co-occurrence data in R
问题描述
软件名称:
软件< - c(a,b,c,d)
每个软件使用#次:
times.each.sw < - c(5,10,12,3)
#同现数据。帧
swdf< - setNames(data.frame(t(combn(software,2))),c(sw1,sw2))
swdf $ freq.cooc< - apply(combn(times.each.sw ,2),2,function(x)sample(1:min(x),1))
#sw1 sw2 freq.cooc
#1 ab 5
#2 ac 5
#3广告1
#4 bc 9
#5 bd 2
#6 cd 2
如果你喜欢一个共同的矩阵,那么这样可能是:
mat < - diag(times.each.sw)
dimnames(mat)< - 列表(软件,软件)
mat [lower.tri(mat)]< - swdf $ freq.cooc
mat [upper.tri(mat)]< - t(mat)[upper.tri(mat)]
#a bcd
#a 5 5 5 1
#b 5 10 9 2
#c 5 9 12 2
#d 1 2 2 3
对角线包含每个软件使用的次数(即与自己一起使用)。下/上三角形将包含每个组合使用的次数,总是必须等于或小于使用较少频繁使用的次数。
I am trying to create a data set of co-occurrence data where the variable of interest is a software application and I want to simulate an n by n matrix where each cell has a number that says the number of times application A was used with application B. How can I create a data set in R that I can use to test a set of clustering and partitioning algorithms. What model would I use and how would I generate the data in R?
set.seed(42)
# software names:
software <- c("a","b","c","d")
# times each software used:
times.each.sw <- c(5,10,12,3)
# co-occurrence data.frame
swdf <- setNames(data.frame(t(combn(software,2))),c("sw1","sw2"))
swdf$freq.cooc <- apply(combn(times.each.sw,2),2,function(x) sample(1:min(x),1) )
# sw1 sw2 freq.cooc
#1 a b 5
#2 a c 5
#3 a d 1
#4 b c 9
#5 b d 2
#6 c d 2
If you prefer a matrix of co-occurrence, then something like this maybe:
mat <- diag(times.each.sw)
dimnames(mat) <- list(software,software)
mat[lower.tri(mat)] <- swdf$freq.cooc
mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
# a b c d
#a 5 5 5 1
#b 5 10 9 2
#c 5 9 12 2
#d 1 2 2 3
The diagonal contains the number of times each software was used (i.e. used with itself). The lower/upper triangles will contain the number of times each combination was used, which will always have to be equal or less to the number of times the less frequently used of the pair was used.
这篇关于模拟R中的同现数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!