从两个ID列创建superID列 [英] Create superID column from two Id columns

查看:69
本文介绍了从两个ID列创建superID列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个数据框,

data <- data.frame(a=c(1,1,2,2,3,3,4,5), b=c(1,2,2,3,3,4,5,6))

    a   b
    1   1
    1   2
    2   2
    2   3
    3   3
    3   4
    4   5
    5   6

现在我要生成主列C,如下所示:

now I want to generate master column C as following:

    a   b  c
    1   1  1
    1   2  1
    2   2  1
    2   3  1
    3   3  1
    3   4  1
    4   5  2
    5   6  3

这通常是更新列的values(ID) a和b的中间ID。
,例如 a栏 b栏中有1个对应值,现在搜索在 b栏中具有1个值的所有值,然后为那些主ID 分配1,同时在列a 中具有ID 1的另一行具有对应的列b = 2,因此在中搜索所有2 >列b 并分配主ID。

This is generally updating values(ID) of column a and column b from their intermediate ids. e.g column a has 1 corresponding value in column b is 1, now search for all values which have 1 in column b and assign those master id 1, simillarly another row having Id 1 in column a have corresponding column b=2 so search for all 2 in column b and assign master id. and vice versa.

我已经完成了以下代码,但是只进行了1次旋转:第a列到b列,第b列到a列

I have done following code but it goes only 1 roatation: column a to column b and b to a

  masterCombine <- function(data, col1="a", col2="b", masterName="c"){

  skipList <- NULL

  masterId <- 1

  for( p in 1: nrow(data)){
    ind <- ind1 <- ind2 <- ind3 <- ind4 <- NULL
    if(!p %in% skipList){

      ind1 <- which(data[, col1] == data[, col1][p])
      for( ij in ind1){
        ind2 <-  which(data[ ,col2] == data[ ,col2][ij])
        for(j in ind2){
          ind3<- which(data[ , col1] == data[ ,col1][j])
          ind4 <- append(ind4, ind3)
        }

      }

      ind <- unique(append(ind1,ind4))
      skipList <- append(skipList, ind)
      data[ind, masterName] <- masterId

      masterId <-  masterId + 1
    }
  }

  return(data)
}

如何实现此递归匹配?

推荐答案

您可以使用 igraph 包及其 clusters()函数。您只需要确保首先将 a 列中的值记录到 b 列中即可。

You can do something like this with the igraph package and its clusters() function. You just need to make sure first that the values in column a are recorded distinctly to the column b values.

library(igraph)
data <- data.frame(a=c(1,1,2,2,3,3,4,5), b=c(1,2,2,3,3,4,5,6))
newdata <- mapply(paste0, names(data), data)
g <- graph.edgelist(newdata)
clusters(g)$membership
#a1 b1 b2 a2 b3 a3 b4 a4 b5 a5 b6 
# 1  1  1  1  1  1  1  2  2  3  3 

cg <- clusters(g)$membership
data$c <- cg[match(newdata[,"a"],names(V(g)))]

#  a b c
#1 1 1 1
#2 1 2 1
#3 2 2 1
#4 2 3 1
#5 3 3 1
#6 3 4 1
#7 4 5 2
#8 5 6 3

对于视觉人士,这是图(g)

这篇关于从两个ID列创建superID列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆