生成唯一的用户项跨产品组合矩阵 [英] Generate matrix of unique user-item cross-product combinations
问题描述
我试图在R中创建唯一用户的跨产品矩阵.我在SO上搜索它,但找不到我想要的东西.任何帮助表示赞赏. 我有一个大数据框(超过一百万),并显示了一个示例:
I am trying to create a cross-product matrix of unique users in R. I searched for it on SO but could not find what I was looking for. Any help is appreciated. I have a large dataframe (over a million) and a sample is shown:
df <- data.frame(Products=c('Product a', 'Product b', 'Product a',
'Product c', 'Product b', 'Product c'),
Users=c('user1', 'user1', 'user2', 'user1',
'user2','user3'))
df的输出是:
Products Users
1 Product a user1
2 Product b user1
3 Product a user2
4 Product c user1
5 Product b user2
6 Product c user3
我想看两个矩阵: 第一个将显示拥有这两个产品(OR)的唯一用户的数量-因此输出将类似于:
I would like to see two matrices: The first one will show the number of unique users that had either products(OR) - so the output will be something like:
Product a Product b Product c
Product a 2 3
Product b 2 3
Product c 3 3
第二个矩阵将是同时拥有这两种产品(AND)的唯一身份用户数:
The second matrix will be the number of unique users that had both products(AND):
Product a Product b Product c
Product a 2 1
Product b 2 1
Product c 1 1
感谢您的帮助.
谢谢
更新:
这里更加清楚:User1和User2使用产品a.产品b由User1和User2使用,产品c由User1和User3使用.因此,在第一个矩阵中,由于有2个唯一用户,因此产品a和产品b将为2.类似地,乘积a和乘积c将为3.在第二个矩阵中,由于我想要交集,因此乘积将为2和1. 谢谢
Here is more clarity: Product a is used by User1 and User2. Product b is used by User1 and User2 and Product c is used by User1 and User3. So in the first matrix, Product a and Product b will be 2 since there are 2 unique users. Similarly, Product a and Product c will be 3. Where as in the second matrix, they would be 2 and 1 since I want the intersection. Thanks
推荐答案
尝试
lst <- split(df$Users, df$Products)
ln <- length(lst)
m1 <- matrix(0, ln,ln, dimnames=list(names(lst), names(lst)))
m1[lower.tri(m1, diag=FALSE)] <- combn(seq_along(lst), 2,
FUN= function(x) length(unique(unlist(lst[x]))))
m1[upper.tri(m1)] <- m1[lower.tri(m1)]
m1
# Product a Product b Product c
#Product a 0 2 3
#Product b 2 0 3
#Product c 3 3 0
或使用outer
f1 <- function(u, v) length(unique(unlist(c(lst[[u]], lst[[v]]))))
res <- outer(seq_along(lst), seq_along(lst), FUN= Vectorize(f1)) *!diag(3)
dimnames(res) <- rep(list(names(lst)),2)
res
# Product a Product b Product c
#Product a 0 2 3
#Product b 2 0 3
#Product c 3 3 0
第二种情况
tcrossprod(table(df))*!diag(3)
# Products
#Products Product a Product b Product c
# Product a 0 2 1
# Product b 2 0 1
# Product c 1 1 0
这篇关于生成唯一的用户项跨产品组合矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!