R找到忽略NA的tupples组 [英] R find groups of tupples ignoring NAs

查看:51
本文介绍了R找到忽略NA的tupples组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基于几乎相同的问题,我正在尝试创建唯一的问题基于如果存在通过列的任何组合的路径",则应将行分组为相同ID的几列.区别在于我有不应该用于链接行的NA:

Based on an almost identical question, I am trying to create unique based on several columns where rows should grouped into the same ID if "there exists a path through any combination of the columns". The difference is that I have NAs that should not be used to link rows:

R的目标是基于 id1 id2 创建 id3 ,最小示例:

The goal is for R to create id3 based on id1 and id2, minimal example:

例如 id1 = 1 id2 a b 有关.但是 id1 = 2 也与 a 相关,因此它们都属于一个组( id3 = group1 ).但是由于 id1 = 2 id1 = 3 共享 id2 = c ,所以 id1 = 3 也属于该组( id3 = 1 ).元组((1,2 ,,,''a','b','c'))的值在其他任何地方都没有显示,因此没有其他行属于该组(标记为 group1 一般).

For example id1=1 is related to a and b of id2. But id1=2 is also related to a so both belong to one group (id3=group1). But since id1=2 and id1=3 share id2=c, also id1=3 belongs to that group (id3=1). The values of the tuple ((1,2),('a','b','c')) appear no where else, so no other row belongs to that group (which is labeled group1 generically).

library(igraph)
df = data.frame(id1 = c(1,1,2,2,3,3,4,4,5,5,6,6,NA,NA),
                id2 = c('a',NA,'a','c','c','d','x',NA,'y','z','x','z',NA,NA),
                id3 = c(rep('group1',6), rep('group2',6),NA,NA))

我的解决方案 NA 个值而失败.

g <- graph_from_data_frame(df, FALSE)
cg <- clusters(g)$membership
df$id4 <- cg[df$id1]
df

操作(第2行)和第8行链接在一起,因为它们都具有 id2 NA ,但这应该忽略.有办法吗

Obervation (row) 2 and 8 are linked because both have NA for id2, but this should be ignored. Is there a way t

推荐答案

您可以尝试使用以下代码

You can try the code below using

  • 组件 + 成员身份 + 合并
  • components + memberships + merge
g <- graph_from_data_frame(na.omit(df))
merge(
  df,
  transform(
    rev(stack(membership(components(g))[V(g)[names(V(g)) %in% df$id1]])),
    values = paste0("group", values)
  ),
  by.x = "id1",
  by.y = "ind",
  all = TRUE
)

  • 分解 + 合并
subg <- decompose(graph_from_data_frame(na.omit(df)))
merge(df,
  do.call(
    rbind,
    Map(
      function(x, y) cbind(setNames(unique(as_data_frame(x)[1]), "id1"), id3 = y),
      subg,
      paste0("group", seq_along(subg))
    )
  ),
  by = "id1",
  all = TRUE
)

这给你

   id1  id2    id3
1    1    a group1
2    1 <NA> group1
3    2    a group1
4    2    c group1
5    3    c group1
6    3    d group1
7    4    x group2
8    4 <NA> group2
9    5    y group2
10   5    z group2
11   6    x group2
12   6    z group2
13  NA <NA>   <NA>
14  NA <NA>   <NA>

这篇关于R找到忽略NA的tupples组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆