R：比较矩阵中的字段 [英] R: Comparing fields in matrix

查看：170 发布时间：2016/12/21 15:16:08 r matrix compare

本文介绍了R：比较矩阵中的字段的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个数据帧要比较：
如果两个数据帧中的特定位置满足要求，则在单独的数据帧中为该特定位置分配X。

如何以有效的方式获得预期的输出？真实的数据框包含1000列，数千行到数百万行。
我认为 data.table 将是最快的选项，但我没有掌握 data.table 尚未完成

预期输出：

  print（result）
＃[，1] [，2] [，3] [，4] [，5] [，6] [1，]「A」「A」「O」「X」「X」「X」「X」「O」「O」
＃[2，]「A」「A」「O」 XXXXOO
＃[3，]AAO X

我的代码：

  df1 < - 结构（c（1,1,1,2,2,2,3,3,3,1,1,1,1,1,1,1,2,2 ，
 2,2,2,2,3,3,3,2,0,1），.Dim = c（3L，9L），.Dimnames = list（
c（A B，C），NULL））
 df2<  - 结构（c（1,1,1,2,2,2,3,3,3,1,1,1,1） 1，1，2，2，
 2,2,2,2,1,3,3,4,4,2），.Dim = c（3L，9L），.Dimnames = list $ bc（A，B，C），NULL））
 
结果<  -  matrix（O，nrow（df1），ncol（df1））
 
 
 for（i in 1：nrow（df1））
 {
 for（j in 3：ncol（df1））
 {
 result [i，1] = c（A）
 result [i，2] = c（A）
 if（is.na（df1 [i，j]）|| is。如果（！is.na（df1 [i，j]），则返回结果， j]）& ！is.na（df2 [i，j]）&& ％is（df1 [i，j]％in％c（0，1，2 ）& df2 [i，j]％in％c（0，1，2））{
 result [i，j] b} 
} 
} 
} 
 
 
 print（result）

编辑

我喜欢@ David和@ Heroka的解决方案。
在一个小数据集上，Heroka的解决方案的速度是原始速度的125倍，而David的速度是29倍。
这是基准：

 > mbm 
单位：毫秒
 expr min lq平均值中位数uq最大值neval 
原始1058.81826 1110.481659 1131.81711 1112.848211 1124.775989 1428.18079 100 
 Heroka 8.46317 8.711986 9.03517 8.914616 9.067793 18.06716 100 
 DavidAarenburg ）35.58350 36.660565 39.85823 37.061160 38.175700 53.83976 100

感谢alot guys！

一种方法可能是使用ifelse（和％in％一个数字变量，
节省大约50％的时间来避免时间转换。

  result< -  ifelse（is.na（df1）| is.na（df2），N，
 ifelse（df1％in％0：2& df2％in％0：2，X，O ））
 result [，1：2]<  - A
 result

$ b b

感谢@DavidArenburg，更快的速度改善

  nrow（df1），ncol = ncol（df1））
 result [is.na（df1）| is.na（df2）]<  - N
 result [df1< 3& df2 < 3]<  - X
 result [，1：2]<  - A

I've got two data frames I want to compare: If a specific location in both data frames meet a requirement assign "X" to that specific location in a seperate data frame.

How can I get the expected output in an efficient way? The real data frame contains 1000 columns with thousands to millions of rows. I think data.table would be the quickest option, but I don't have a grasp of how data.table works yet

Expected output:

> print(result)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] "A"  "A"  "O"  "X"  "X"  "X"  "X"  "O"  "O" 
# [2,] "A"  "A"  "O"  "X"  "X"  "X"  "X"  "O"  "O" 
# [3,] "A"  "A"  "O"  "X"  "X"  "X"  "X"  "O"  "X"

My code:

df1 <- structure(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 1, 1, 1, 2, 2, 
            2, 2, 2, 2, 3, 3, 3, 2, 0, 1), .Dim = c(3L, 9L), .Dimnames = list(
              c("A", "B", "C"), NULL))
df2 <- structure(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 1, 1, 1, 2, 2, 
            2, 2, 2, 2, 1, 3, 3, 4, 4, 2), .Dim = c(3L, 9L), .Dimnames = list(
              c("A", "B", "C"), NULL))

result <- matrix("O", nrow(df1), ncol(df1))


for (i in 1:nrow(df1)) 
{
  for (j in 3:ncol(df1)) 
  {
    result[i,1] = c("A")
    result[i,2] = c("A")
    if (is.na(df1[i,j]) || is.na(df2[i,j])){
      result[i,j] <- c("N")
    }
    if (!is.na(df1[i,j]) && !is.na(df2[i,j]) && !is.na(df2[i,j]))
    {

      if (df1[i,j] %in% c("0","1","2") & df2[i,j] %in% c("0","1","2")) {
        result[i,j] <- c("X") 
      }
    }
  }
}   


print(result)

Edit

I like both @David's and @Heroka's solutions. On a small dataset, Heroka's solution is 125x as fast as the original, and David's is 29 times as fast. Here's the benchmark:

> mbm
Unit: milliseconds
             expr        min          lq       mean      median          uq        max neval
         original 1058.81826 1110.481659 1131.81711 1112.848211 1124.775989 1428.18079   100
           Heroka    8.46317    8.711986    9.03517    8.914616    9.067793   18.06716   100
 DavidAarenburg()   35.58350   36.660565   39.85823   37.061160   38.175700   53.83976   100

Thanks alot guys!

解决方案

You have matrices, not dataframes.

One approach might be to use ifelse (and %in% a numeric variable, saves about 50% of the time to avoid the time-conversion.:

  result <- ifelse(is.na(df1)|is.na(df2),"N",
                   ifelse(df1 %in% 0:2 & df2 %in% 0:2,"X","O"))
  result[,1:2] <- "A"
  result

With thanks to @DavidArenburg, more improvement in speed

result <- matrix("O",nrow=nrow(df1),ncol=ncol(df1))
result[is.na(df1) | is.na(df2)] <- "N"
result[df1 < 3 & df2 < 3] <- "X"
result[, 1:2] <- "A"

这篇关于R：比较矩阵中的字段的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R：比较矩阵中的字段 [英] R: Comparing fields in matrix

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R：比较矩阵中的字段 [英] R: Comparing fields in matrix

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭