生成所有可能的对并计算R中的频率 [英] Generate all possible pairs and count frequency in R

查看：98 发布时间：2020/10/26 5:01:44 r dplyr

本文介绍了生成所有可能的对并计算R中的频率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个在不同类别（食品和食品）中不同地点（城市）销售的产品（苹果，梨，香蕉）的数据框。

I have a data frame of products (apple, pear, banana) sold across different locations (cities) within different categories (food and edibles).

我会想要计算任何给定的产品对在任何类别中一起出现的次数。

I would like to count how many times any given pair of products appeared together in any category.

这是示例数据集，我正在尝试使其工作：

This is an example dataset I'm trying to make this to work on:

category <- c('food','food','food','food','food','food','edibles','edibles','edibles','edibles', 'edibles')
location <- c('houston, TX', 'houston, TX', 'las vegas, NV', 'las vegas, NV', 'philadelphia, PA', 'philadelphia, PA', 'austin, TX', 'austin, TX', 'charlotte, NC', 'charlotte, NC', 'charlotte, NC')
item <- c('apple', 'banana', 'apple', 'pear', 'apple', 'pear', 'pear', 'apple', 'apple', 'pear', 'banana')

food_data <- data.frame(cbind(category, location, item), stringsAsFactors = FALSE)

例如，一对苹果和香蕉一起出现在内华达州拉斯维加斯中的食品类别，也位于北卡罗莱纳州夏洛特中的食品类别中。因此，苹果和香蕉对的计数为2。

For example, the pair "apple & banana" appeared together in the "food" category in "las vegas, NV", but also in the "edibles" category in "charlotte, NC". Therefore, the count for the "apple & banana" pair would be 2.

我想要的输出是像这样的对计数：

My desired output is count of pairs like this:

（无序）数量的 apple&香蕉

（无序）计数为苹果&梨

任何人都知道如何完成这个？对R来说相对较新，并且已经混淆了一段时间。

Anyone have an idea for how to accomplish this? Relatively new to R and have been confused for a while.

我正试图用它来计算不同项目之间的亲和力。

I'm trying to use this to calculate affinities between different items.

输出的其他说明：
我的完整数据集包含数百个不同的项目。想要获得一个数据帧，其中第一列是该对，第二列是每对的计数。

Additional clarification on output: My full dataset consists of hundreds of different items. Would like to get a data frame where the first column is the pair and the second column is the count for each pair.

推荐答案

这是使用 tidyverse 和 crossprod 的一种方法；通过使用 spread ，它会将所有 item / fruit 从同一类别-位置组合转换为与 item 在一起的一行作为标头（这要求您在每个类别国家中没有重复的 item ，否则，您需要进行预汇总步骤），这些值指示存在； crossprod 本质上评估成对的 items 列的内积，并给出共现次数。

Here is one way using tidyverse and crossprod; By using spread, it turns all item/fruit from the same category-location combination into one row with the item as headers (this requires you have no duplicated item in each category-country, otherwise you need a pre-aggregation step), values indicating existence; crossprod essentially evaluates the inner product of pairs of items columns and gives the number of cooccurrences.

library(tidyverse)
food_data %>% 
    mutate(n = 1) %>% 
    spread(item, n, fill=0) %>% 
    select(-category, -location) %>% 
    {crossprod(as.matrix(.))} %>% 
    `diag<-`(0)

#       apple banana pear
#apple      0      2    4
#banana     2      0    1
#pear       4      1    0

要将其转换为数据框：

To convert this to a data frame:

food_data %>% 
    mutate(n = 1) %>% 
    spread(item, n, fill=0) %>% 
    select(-category, -location) %>% 
    {crossprod(as.matrix(.))} %>% 
    replace(lower.tri(., diag=T), NA) %>%
    reshape2::melt(na.rm=T) %>%
    unite('Pair', c('Var1', 'Var2'), sep=", ")

#           Pair value
#4 apple, banana     2
#7   apple, pear     4
#8  banana, pear     1

这篇关于生成所有可能的对并计算R中的频率的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

生成所有可能的对并计算R中的频率 [英] Generate all possible pairs and count frequency in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

生成所有可能的对并计算R中的频率 [英] Generate all possible pairs and count frequency in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭