从两个ID列创建superID列 [英] Create superID column from two Id columns

查看：69 发布时间：2020/10/17 2:13:04 r dataframe

本文介绍了从两个ID列创建superID列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我创建了一个数据框，

data <- data.frame(a=c(1,1,2,2,3,3,4,5), b=c(1,2,2,3,3,4,5,6))

    a   b
    1   1
    1   2
    2   2
    2   3
    3   3
    3   4
    4   5
    5   6

现在我要生成主列C，如下所示：

now I want to generate master column C as following:

这通常是更新列的values（ID） a和b的中间ID。
，例如 a栏在 b栏中有1个对应值，现在搜索在 b栏中具有1个值的所有值，然后为那些主ID 分配1，同时在列a 中具有ID 1的另一行具有对应的列b = 2，因此在中搜索所有2 >列b 并分配主ID。

This is generally updating values(ID) of column a and column b from their intermediate ids. e.g column a has 1 corresponding value in column b is 1, now search for all values which have 1 in column b and assign those master id 1, simillarly another row having Id 1 in column a have corresponding column b=2 so search for all 2 in column b and assign master id. and vice versa.

我已经完成了以下代码，但是只进行了1次旋转：第a列到b列，第b列到a列

I have done following code but it goes only 1 roatation: column a to column b and b to a

  masterCombine <- function(data, col1="a", col2="b", masterName="c"){

  skipList <- NULL

  masterId <- 1

  for( p in 1: nrow(data)){
    ind <- ind1 <- ind2 <- ind3 <- ind4 <- NULL
    if(!p %in% skipList){

      ind1 <- which(data[, col1] == data[, col1][p])
      for( ij in ind1){
        ind2 <-  which(data[ ,col2] == data[ ,col2][ij])
        for(j in ind2){
          ind3<- which(data[ , col1] == data[ ,col1][j])
          ind4 <- append(ind4, ind3)
        }

      }

      ind <- unique(append(ind1,ind4))
      skipList <- append(skipList, ind)
      data[ind, masterName] <- masterId

      masterId <-  masterId + 1
    }
  }

  return(data)
}

如何实现此递归匹配？

推荐答案

您可以使用 igraph 包及其 clusters（）函数。您只需要确保首先将 a 列中的值记录到 b 列中即可。


You can do something like this with the igraph package and its clusters() function. You just need to make sure first that the values in column a are recorded distinctly to the column b values.
library(igraph)
data <- data.frame(a=c(1,1,2,2,3,3,4,5), b=c(1,2,2,3,3,4,5,6))
newdata <- mapply(paste0, names(data), data)
g <- graph.edgelist(newdata)
clusters(g)$membership
#a1 b1 b2 a2 b3 a3 b4 a4 b5 a5 b6 
# 1  1  1  1  1  1  1  2  2  3  3 

cg <- clusters(g)$membership
data$c <- cg[match(newdata[,"a"],names(V(g)))]

#  a b c
#1 1 1 1
#2 1 2 1
#3 2 2 1
#4 2 3 1
#5 3 3 1
#6 3 4 1
#7 4 5 2
#8 5 6 3

对于视觉人士，这是图（g） 
  

                        这篇关于从两个ID列创建superID列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

从两个ID列创建superID列 [英] Create superID column from two Id columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从两个ID列创建superID列 [英] Create superID column from two Id columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭