以特定方式排列数据框 [英] Arrange data frame in a specific way
问题描述
对不起,不好意思了,但是我真的不知道该怎么简洁。
Sorry in advanced for the bad title, but I really didn't know how to word it succinctly.
我有一个数据框,我在哪里玩一个项目可以是4个类别中的任何一个,但不限于1。这是我正在使用的虚拟矩阵的示例:
I have a dataframe I'm playing around with where an item can be in any of 4 categories, not limited to 1. Here's an example of the dummy matrix I'm working with:
ID <- 1:7
A <- c(1,0,0,1,1,0,0)
B <- c(0,1,0,0,1,0,1)
C <- c(0,0,0,0,0,1,1)
D <- c(1,0,1,1,0,0,0)
A_B <- (A+B > 0)*1
C_D <- (C+D > 0)*1
Cost <- c(25, 52, 11, 75, 45, 5, 34)
df <- data.frame(ID, A, B, C, D, A_B, C_D, A_B_C_D = 1, Cost)
df
ID A B C D A_B C_D A_B_C_D Cost
1 1 0 0 1 1 1 1 25
2 0 1 0 0 1 0 1 52
3 0 0 0 1 0 1 1 11
4 1 0 0 1 1 1 1 75
5 1 1 0 0 1 0 1 45
6 0 0 1 0 0 1 1 5
7 0 1 1 0 1 1 1 34
我需要组织此数据框,例如tha第1行包含A,第2行B,第3行C,第4行D,第5行A或B,第6行C或D,第7行。我不能使用 arrange
,因为从 desc(A)
开始会自动给出1、4、5。解决这个问题的方法是:
I need for this data frame to be organized such that row 1 contains an A, row 2 a B, row 3 a C, row 4 a D, row 5 an A or B, row 6 a C or D, and row 7 whatever is left over. I can't use arrange
since starting with desc(A)
would automatically give 1, 4, 5. An acceptable solution to this problem would be:
Order <- c(4, 2, 7, 1, 5, 3, 6)
df[Order,]
df
ID A B C D A_B C_D A_B_C_D Cost
4 1 0 0 1 1 1 1 75
2 0 1 0 0 1 0 1 52
7 0 1 1 0 1 1 1 34
1 1 0 0 1 1 1 1 25
5 1 1 0 0 1 0 1 45
3 0 0 0 1 0 1 1 11
6 0 0 1 0 0 1 1 5
基本上,对角线必须为7个直线,但无论数据集如何,我都无法想到如何对其进行正确编程。我觉得这应该真的很容易,但我只是没有看到。换位会更容易吗?
Essentially, the diagonal needs to be 7 straight ones, but I can't think of how to program it to sort correctly no matter the data set. I feel like this should be really easy but I'm just not seeing it. Would transposing make it easier?
预先感谢。
推荐答案
一种方法是通过获取行排列的所有排列并检查是否满足对角线期望来使用蛮力:
One approach would be to use brute force, by getting all the permutations of row arrangements and checking which satisfy the diagonal expectation:
z <- apply(permute::allPerms(1:7), 1, function(x){
mat <- as.matrix(df[,2:8])
if(all(diag(mat[x,]) == rep(1,7))){
return(df[x,])
}
})
然后您可以删除NULL值:
then you can just remove the NULL values:
z <- Filter(Negate(is.null), z)
并获取全部88个解决方案
and get all the 88 solutions
length(z) #88
z[[5]] #random solution
#output
ID A B C D A_B C_D A_B_C_D Cost
1 1 1 0 0 1 1 1 1 25
2 2 0 1 0 0 1 0 1 52
6 6 0 0 1 0 0 1 1 5
4 4 1 0 0 1 1 1 1 75
5 5 1 1 0 0 1 0 1 45
3 3 0 0 0 1 0 1 1 11
7 7 0 1 1 0 1 1 1 34
要获得第一个匹配的排列,可以使用while循环:
To just get the first matching permutation one can use a while loop:
perms <- permute::allPerms(1:7)
mat <- as.matrix(df[,2:8])
i <- 1
while (!all(diag(mat[perms[i,],]) == rep(1,7))) {
i = i+1
}
df[perms[i,],]
# ID A B C D A_B C_D A_B_C_D Cost
1 1 1 0 0 1 1 1 1 25
2 2 0 1 0 0 1 0 1 52
6 6 0 0 1 0 0 1 1 5
3 3 0 0 0 1 0 1 1 11
4 4 1 0 0 1 1 1 1 75
7 7 0 1 1 0 1 1 1 34
5 5 1 1 0 0 1 0 1 45
让我们检查速度:
test <- function(df){
z <- apply(permute::allPerms(1:7), 1, function(x){
mat <- as.matrix(df[,2:8])
if(all(diag(mat[x,]) == rep(1,7))){
return(df[x,])
}
})
z <- Filter(Negate(is.null), z)
return(z)
}
test2 <- function(df){
perms <- permute::allPerms(1:7)
mat <- as.matrix(df[,2:8])
i <- 1
while (!all(diag(mat[perms[i,],]) == rep(1,7))) {
i = i+1
}
df[perms[i,],]
}
microbenchmark::microbenchmark(b <- test(df),
c <- test2(df), times = 10L)
Unit: milliseconds
expr min lq mean median uq max neval cld
b <- test(df) 392.68257 396.81450 412.41600 401.0613 408.15582 509.77693 10 b
c <- test2(df) 46.11754 46.92276 47.80778 47.3977 48.82543 50.05795 10 a
不是那么糟糕
这篇关于以特定方式排列数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!