以特定方式排列数据框 [英] Arrange data frame in a specific way

查看:68
本文介绍了以特定方式排列数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对不起,不好意思了,但是我真的不知道该怎么简洁。

Sorry in advanced for the bad title, but I really didn't know how to word it succinctly.

我有一个数据框,我在哪里玩一个项目可以是4个类别中的任何一个,但不限于1。这是我正在使用的虚拟矩阵的示例:

I have a dataframe I'm playing around with where an item can be in any of 4 categories, not limited to 1. Here's an example of the dummy matrix I'm working with:

ID <- 1:7
A <- c(1,0,0,1,1,0,0)
B <- c(0,1,0,0,1,0,1)
C <- c(0,0,0,0,0,1,1)
D <- c(1,0,1,1,0,0,0)
A_B <- (A+B > 0)*1
C_D <- (C+D > 0)*1
Cost <- c(25, 52, 11, 75, 45, 5, 34)

df <- data.frame(ID, A, B, C, D, A_B, C_D, A_B_C_D = 1, Cost)
df

ID A B C D A_B C_D A_B_C_D Cost
1  1 0 0 1  1   1     1     25
2  0 1 0 0  1   0     1     52
3  0 0 0 1  0   1     1     11
4  1 0 0 1  1   1     1     75
5  1 1 0 0  1   0     1     45
6  0 0 1 0  0   1     1     5
7  0 1 1 0  1   1     1     34

我需要组织此数据框,例如tha第1行包含A,第2行B,第3行C,第4行D,第5行A或B,第6行C或D,第7行。我不能使用 arrange ,因为从 desc(A)开始会自动给出1、4、5。解决这个问题的方法是:

I need for this data frame to be organized such that row 1 contains an A, row 2 a B, row 3 a C, row 4 a D, row 5 an A or B, row 6 a C or D, and row 7 whatever is left over. I can't use arrange since starting with desc(A) would automatically give 1, 4, 5. An acceptable solution to this problem would be:

Order <- c(4, 2, 7, 1, 5, 3, 6)
df[Order,]
df

ID A B C D A_B C_D A_B_C_D Cost
4  1 0 0 1   1   1       1   75
2  0 1 0 0   1   0       1   52
7  0 1 1 0   1   1       1   34
1  1 0 0 1   1   1       1   25
5  1 1 0 0   1   0       1   45
3  0 0 0 1   0   1       1   11
6  0 0 1 0   0   1       1    5

基本上,对角线必须为7个直线,但无论数据集如何,我都无法想到如何对其进行正确编程。我觉得这应该真的很容易,但我只是没有看到。换位会更容易吗?

Essentially, the diagonal needs to be 7 straight ones, but I can't think of how to program it to sort correctly no matter the data set. I feel like this should be really easy but I'm just not seeing it. Would transposing make it easier?

预先感谢。

推荐答案

一种方法是通过获取行排列的所有排列并检查是否满足对角线期望来使用蛮力:

One approach would be to use brute force, by getting all the permutations of row arrangements and checking which satisfy the diagonal expectation:

z <- apply(permute::allPerms(1:7), 1, function(x){
  mat <- as.matrix(df[,2:8])
  if(all(diag(mat[x,]) == rep(1,7))){
    return(df[x,])
  }
  })

然后您可以删除NULL值:

then you can just remove the NULL values:

z <- Filter(Negate(is.null), z)

并获取全部88个解决方案

and get all the 88 solutions

length(z) #88

z[[5]] #random solution
#output

  ID A B C D A_B C_D A_B_C_D Cost
1  1 1 0 0 1   1   1       1   25
2  2 0 1 0 0   1   0       1   52
6  6 0 0 1 0   0   1       1    5
4  4 1 0 0 1   1   1       1   75
5  5 1 1 0 0   1   0       1   45
3  3 0 0 0 1   0   1       1   11
7  7 0 1 1 0   1   1       1   34

要获得第一个匹配的排列,可以使用while循环:

To just get the first matching permutation one can use a while loop:

perms <- permute::allPerms(1:7)
mat <- as.matrix(df[,2:8])
i <- 1
while (!all(diag(mat[perms[i,],])  == rep(1,7))) {
  i = i+1
}

df[perms[i,],]

#  ID A B C D A_B C_D A_B_C_D Cost
1  1 1 0 0 1   1   1       1   25
2  2 0 1 0 0   1   0       1   52
6  6 0 0 1 0   0   1       1    5
3  3 0 0 0 1   0   1       1   11
4  4 1 0 0 1   1   1       1   75
7  7 0 1 1 0   1   1       1   34
5  5 1 1 0 0   1   0       1   45

让我们检查速度:

test <- function(df){
  z <- apply(permute::allPerms(1:7), 1, function(x){
    mat <- as.matrix(df[,2:8])
    if(all(diag(mat[x,]) == rep(1,7))){
      return(df[x,])
    }
  })
  z <- Filter(Negate(is.null), z)
  return(z)
}

test2 <- function(df){
  perms <- permute::allPerms(1:7)
  mat <- as.matrix(df[,2:8])
  i <- 1
  while (!all(diag(mat[perms[i,],])  == rep(1,7))) {
    i = i+1
  }
  df[perms[i,],]
}
microbenchmark::microbenchmark(b <- test(df), 
                           c <- test2(df), times = 10L)

    Unit: milliseconds
           expr       min        lq      mean   median        uq       max neval cld
  b <- test(df) 392.68257 396.81450 412.41600 401.0613 408.15582 509.77693    10   b
 c <- test2(df)  46.11754  46.92276  47.80778  47.3977  48.82543  50.05795    10  a 

不是那么糟糕

这篇关于以特定方式排列数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆