R 将列名连接到新列中,同时按其值排序 [英] R Concatenate column names into new column while sorting by their value

查看:38
本文介绍了R 将列名连接到新列中,同时按其值排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试连接一个字符串,该字符串通过它们的值来标识列的顺序.

I'm trying to concatenate a string that identifies the order of the columns by their value.

set.seed(100)

df <- tibble(id = 1:5,
             col1 = sample(1:50, 5),
             col2 = sample(1:50, 5),
             col3 = sample(1:50, 5)) %>% 
  mutate_at(vars(-id), ~if_else(. <= 20, NA_integer_, .))

# A tibble: 5 x 4
     id  col1  col2  col3
  <int> <int> <int> <int>
1     1    NA    44    NA
2     2    38    23    34
3     3    48    22    NA
4     4    25    NA    48
5     5    NA    NA    43

res <- df %>% 
  add_column(order = c('col2',
                       'col2_col3_co1',
                       'col2_col1',
                       'col1_col3',
                       'col3'))

# A tibble: 5 x 5
     id  col1  col2  col3 order        
  <int> <int> <int> <int> <chr>        
1     1    NA    44    NA col2         
2     2    38    23    34 col2_col3_co1
3     3    48    22    NA col2_col1    
4     4    25    NA    48 col1_col3    
5     5    NA    NA    43 col3 

我当前的数据采用 df 的形式,而我尝试添加的列是 res 中的 order 列.字符串中元素的排序由每一列的值决定,也需要跳过NA.我试图确定每个 ID 在每列中填充一个值所采用的序列,因为这些值是以天为单位的时间.但是,并非所有 ID 在所有列中都有值,因此始终存在缺失值.我通常在 tidyverse 中工作,但任何解决方案或想法都将不胜感激.

My current data is in the form of df while the column I'm trying to add is the order column in res. The ordering of the elements in the string is determined by the value of each column, and it also needs to skip over NAs. I'm trying to identify the sequence that each ID takes to populate a value in each column as the values are time in days. However, not all IDs will have a value in all columns, so there's missing values throughout. I usually work within tidyverse, but any solution or thoughts would be much appreciated.

推荐答案

一个更简单的选择是 apply,循环遍历行 (MARGIN = 1),删除 applycode>NA 元素,order 其余的非NA,使用索引获取列名并粘贴它们在一起

An easier option is apply, loop over the rows (MARGIN = 1), remove the NA elements, order the rest of the non-NA, use the index to get the column names and paste them together

df$order <- apply(df[-1], 1, function(x) {x1 <- x[!is.na(x)]
           paste(names(x1)[order(x1)], collapse="_")})
df$order
#[1] "col2"           "col2_col3_col1" "col2_col1"      "col1_col3"      "col3" 

<小时>

或者使用tidyverse

library(dplyr)
library(tidyr)
library(stringr)
df %>%
   pivot_longer(cols = -id, values_drop_na = TRUE) %>%
   arrange(id,  value) %>%
   group_by(id) %>%
   summarise(order = str_c(name, collapse="_")) %>% 
   right_join(df) %>%
   select(names(df), order)
# A tibble: 5 x 5
#     id  col1  col2  col3 order         
#  <int> <int> <int> <int> <chr>         
#1     1    NA    44    NA col2          
#2     2    38    23    34 col2_col3_col1
#3     3    48    22    NA col2_col1     
#4     4    25    NA    48 col1_col3     
#5     5    NA    NA    43 col3       

<小时>

或者使用 purrr

library(purrr)
df %>% 
   mutate(order = pmap_chr(select(., starts_with('col')), ~
         {x <- c(...)
         x1 <- x[!is.na(x)]
         str_c(names(x1)[order(x1)], collapse="_")}))

这篇关于R 将列名连接到新列中,同时按其值排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆