最小的列表覆盖数 [英] Minimal number of coverage of lists

查看:61
本文介绍了最小的列表覆盖数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下内容:

dist<-c('att1','att2','att3','att4','att5','att6')
p1<-c('att1','att5','att2')
p2<-c('att5','att1','att4')
p3<-c('att3','att4','att2')
p4<-c('att1','att2','att3')
p5<-c('att6')

我想找到所有相关的p,它们的统一将是dist的最大组成部分. 在这种情况下,解决方案将是p1, p3, p5. 我想选择最小数量的p.另外,如果没有办法覆盖所有dist组件,那么我想选择最大覆盖率.

I would like to find all the relevant p that the unification of them will be the maximal components of dist. I this case the solution would be p1, p3, p5. I want to choose the minimal number of p. In addition, in case there is no way to cover all the of dist component so I want to choose the maximal cover.

推荐答案

这是我尝试的解决方案.我已经尽我所能来矢量化/矩阵化它的速度足够快的希望.每个步骤都在注释中说明

Here is my attempted solution. I've tried as much I can to vectorize/matricize hope it's fast enough. Each step is explained in the comment

library(qdapTools)
library(dplyr)
library(data.table)
## generate matrix of attributes
grid_matrix <- do.call(CJ, rep(list(1:0), 5))  %>% as.matrix
attribute_matrix
##   att1 att2 att3 att4 att5 att6
## 1    1    1    0    0    1    0
## 2    1    0    0    1    1    0
## 3    0    1    1    1    0    0
## 4    1    1    1    0    0    0
## 5    0    0    0    0    0    1

## create a grid of combination of matrix
grid_matrix <- do.call(CJ, rep(list(1:0), 5))  %>% as.matrix
colnames(grid_matrix) <- paste0("p", 1:5)

## check whether each combination has all attribute presented
combin_all_element_present <- rowSums(grid_matrix %*% attribute_matrix > 0) %>% 
  `==`(., ncol(attribute_matrix))

combin_all_element_present
##  [1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

## generate a submatrix which satisfy the condition
grid_matrix_sub <- grid_matrix[combin_all_element_present, ]
## find the combinations with minumun number of p
grid_matrix_sub[rowSums(grid_matrix_sub) == min(rowSums(grid_matrix_sub)), ]
##      p1 p2 p3 p4 p5
## [1,]  0  1  0  1  1
## [2,]  0  1  1  0  1
## [3,]  1  0  1  0  1

注意

如果要使用Quanteda,可以使用

In case you want to use quanteda, you can generate attribute_matrix with

library(quanteda)
attribute_matrix <- lapply(list(p1, p2, p3, p4, p5), function(x) paste(x, collapse = ' ')) %>% 
  unlist %>% tokens %>% dfm %>% as.matrix
attribute_matrix
##        features
## docs    att1 att5 att2 att4 att3 att6
##   text1    1    1    1    0    0    0
##   text2    1    1    0    1    0    0
##   text3    0    0    1    1    1    0
##   text4    1    0    1    0    1    0
##   text5    0    0    0    0    0    1

这篇关于最小的列表覆盖数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆