R中列表中相交向量的并集 [英] Union of intersecting vectors in a list in R

查看:26
本文介绍了R中列表中相交向量的并集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个向量列表如下.

I have a list of vectors as follows.

data <- list(v1=c("a", "b", "c"), v2=c("g", "h", "k"), 
             v3=c("c", "d"), v4=c("n", "a"), v5=c("h", "i"))

我正在努力实现以下目标:

I am trying to achieve the following:

  1. 检查是否有任何向量相互交叉
  2. 如果找到相交向量,则获取它们的并集

所以想要的输出是

out <- list(v1=c("a", "b", "c", "d", "n"), v2=c("g", "h", "k", "i"))

我可以得到一组相交集的并集如下.

I can get the union of a group of intersecting sets as follows.

 Reduce(union, list(data[[1]], data[[3]], data[[4]]))
 Reduce(union, list(data[[2]], data[[5]])

如何首先识别相交向量?有没有办法将列表分成相交向量组的列表?

How to first identify the intersecting vectors? Is there a way of dividing the list into lists of groups of intersecting vectors?

#更新

这是一个使用 data.table 的尝试.得到想要的结果.但是对于像这个 example 数据集的大型列表来说仍然很慢.

Here is an attempt using data.table. Gets the desired results. But still slow for large lists as in this example dataset.

datasets. 
data <- sapply(data, function(x) paste(x, collapse=", "))
data <- as.data.frame(data, stringsAsFactors = F)

repeat {
  M <- nrow(data)
  data <- data.table( data , key = "data" )
  data <- data[ , list(dataelement = unique(unlist(strsplit(data , ", " )))), by = list(data)]
  data <- data.table(data , key = "dataelement" )
  data <- data[, list(data = paste0(sort(unique(unlist(strsplit(data, split=", ")))), collapse=", ")), by = "dataelement"]
  data$dataelement <- NULL
  data <- unique(data)
  N <- nrow(data)
  if (M == N)
    break
}

data <- strsplit(as.character(data$data) , "," )

推荐答案

这有点像图问题,所以我喜欢使用 igraph 库来解决这个问题,使用您的示例数据,您可以做

This is kind of like a graph problem so I like to use the igraph library for this, using your sample data, you can do

library(igraph)
#build edgelist
el <- do.call("rbind",lapply(data, embed, 2))
#make a graph
gg <- graph.edgelist(el, directed=F)
#partition the graph into disjoint sets
split(V(gg)$name, clusters(gg)$membership)

# $`1`
# [1] "b" "a" "c" "d" "n"
# 
# $`2`
# [1] "h" "g" "k" "i"

我们可以查看结果

V(gg)$color=c("green","purple")[clusters(gg)$membership]
plot(gg)

这篇关于R中列表中相交向量的并集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆