从两个ID列创建superID列 [英] Create superID column from two Id columns
问题描述
我创建了一个数据框,
data <- data.frame(a=c(1,1,2,2,3,3,4,5), b=c(1,2,2,3,3,4,5,6))
a b
1 1
1 2
2 2
2 3
3 3
3 4
4 5
5 6
现在我要生成主列C,如下所示:
now I want to generate master column C as following:
a b c
1 1 1
1 2 1
2 2 1
2 3 1
3 3 1
3 4 1
4 5 2
5 6 3
这通常是更新列的values(ID) a和b的中间ID。
,例如 a栏在 b栏中有1个对应值,现在搜索在 b栏中具有1个值的所有值,然后为那些主ID 分配1,同时在列a 中具有ID 1的另一行具有对应的列b = 2,因此在中搜索所有2 >列b 并分配主ID。
This is generally updating values(ID) of column a and column b from their intermediate ids. e.g column a has 1 corresponding value in column b is 1, now search for all values which have 1 in column b and assign those master id 1, simillarly another row having Id 1 in column a have corresponding column b=2 so search for all 2 in column b and assign master id. and vice versa.
我已经完成了以下代码,但是只进行了1次旋转:第a列到b列,第b列到a列
I have done following code but it goes only 1 roatation: column a to column b and b to a
masterCombine <- function(data, col1="a", col2="b", masterName="c"){
skipList <- NULL
masterId <- 1
for( p in 1: nrow(data)){
ind <- ind1 <- ind2 <- ind3 <- ind4 <- NULL
if(!p %in% skipList){
ind1 <- which(data[, col1] == data[, col1][p])
for( ij in ind1){
ind2 <- which(data[ ,col2] == data[ ,col2][ij])
for(j in ind2){
ind3<- which(data[ , col1] == data[ ,col1][j])
ind4 <- append(ind4, ind3)
}
}
ind <- unique(append(ind1,ind4))
skipList <- append(skipList, ind)
data[ind, masterName] <- masterId
masterId <- masterId + 1
}
}
return(data)
}
如何实现此递归匹配?
推荐答案
您可以使用 igraph $ c $做类似的事情c>包及其
clusters()
函数。您只需要确保首先将 a
列中的值记录到 b
列中即可。
You can do something like this with the igraph
package and its clusters()
function. You just need to make sure first that the values in column a
are recorded distinctly to the column b
values.
library(igraph)
data <- data.frame(a=c(1,1,2,2,3,3,4,5), b=c(1,2,2,3,3,4,5,6))
newdata <- mapply(paste0, names(data), data)
g <- graph.edgelist(newdata)
clusters(g)$membership
#a1 b1 b2 a2 b3 a3 b4 a4 b5 a5 b6
# 1 1 1 1 1 1 1 2 2 3 3
cg <- clusters(g)$membership
data$c <- cg[match(newdata[,"a"],names(V(g)))]
# a b c
#1 1 1 1
#2 1 2 1
#3 2 2 1
#4 2 3 1
#5 3 3 1
#6 3 4 1
#7 4 5 2
#8 5 6 3
对于视觉人士,这是图(g)
这篇关于从两个ID列创建superID列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!