如何在不考虑Na值的情况下返回多列,并按R中其他列的名称分组? [英] How do I return multiple columns without consider Na values and group by other columns name in R?

查看:33
本文介绍了如何在不考虑Na值的情况下返回多列,并按R中其他列的名称分组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

mexico <- c(1,2,5,1,NA,1)
argentina <- c(2,2,2,2,NA,2)
italy<- c(NA,10,10,10,NA,10)
spain <- c(NA,NA,11,11,11,11)
england <- c(5,NA,10,NA,NA,12)
germany <- c(1,NA,NA,NA,NA,10)

Data_Risk = data.frame( Mexico, Argentina, Italy, Spain, England, Germany)

Data_Risk 

给予

 mexico     argentina italy spain england germany

1      1         2    NA    NA       5       1
2      2         2    10    NA      NA      NA
3      5         2    10    11      10      NA
4      1         2    10    11      NA      NA
5     NA        NA    NA    11      NA      NA
6      1         2    10    11      12      10

在这种情况下,我不需要考虑NA情况,因此我尝试了

in this case, I need no consider NA cases, for this reason I tried this

Data_Risk <- as.data.table(Data_Risk)
my_c <- !apply(Data_Risk, 1, is.na)[,1]
my_L <- Data_Risk[1]
as.data.frame(my_L)[my_c]

结果:

  Mexico Argentina England Germany
1      1         2       5       1

在这种情况下,我不仅需要考虑行,而且还需要考虑所有行。

每行都需要放入新列而不考虑
的值,因此最终表必须如下所示:

in this case, I need not only that it considers a row, but all of them.
Moreover group by each row need to be put in new columns without consider the values, so the final tables have to look like this:

var1           var2          var3       var4     var5    var6
mexico    argentina       england    germany     null    null
mexico    argentina         italy       null     null    null 
mexico    argentina         italy      spain  england    null
mexico    argentina         italy      spain     null    null
spain      null             null       null      null    null
mexico    argentina         italy      spain england  germany


推荐答案

一种选择是查看 which(!is.na(Data_Risk),arr.ind = T)并将其扩展为宽幅形式,将 col 变量替换为 order(col),并添加 colnm 列用作value.var在扩展到多头( dcast )过程中。

One option is to look at which(!is.na(Data_Risk), arr.ind = T) and spread that to wide form, substituting the col variable with order(col), and adding a colnm column to use as the value.var in the spread-to-long (dcast) process.

library(data.table)
library(magrittr)

nms <- as.data.table(which(!is.na(Data_Risk), arr.ind = T))

nms[, .(colnm = names(Data_Risk)[col], col = paste0('var', order(col)))
    , by = row] %>% 
  dcast(row ~ col, value.var = 'colnm')

#    row   var1      var2    var3    var4    var5    var6
# 1:   1 mexico argentina england germany    <NA>    <NA>
# 2:   2 mexico argentina   italy    <NA>    <NA>    <NA>
# 3:   3 mexico argentina   italy   spain england    <NA>
# 4:   4 mexico argentina   italy   spain    <NA>    <NA>
# 5:   5  spain      <NA>    <NA>    <NA>    <NA>    <NA>
# 6:   6 mexico argentina   italy   spain england germany

等价 dplyr 代码:

library(dplyr)

nms <- as.data.frame(which(!is.na(Data_Risk), arr.ind = T))

nms %>% 
  group_by(row) %>% 
  mutate(colnm = names(Data_Risk)[col],
         col = paste0('var', order(col))) %>% 
  spread(col, value = colnm) %>% 
  ungroup

这篇关于如何在不考虑Na值的情况下返回多列,并按R中其他列的名称分组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆