R 将列名连接到新列中,同时按其值排序 [英] R Concatenate column names into new column while sorting by their value
问题描述
我正在尝试连接一个字符串,该字符串通过它们的值来标识列的顺序.
I'm trying to concatenate a string that identifies the order of the columns by their value.
set.seed(100)
df <- tibble(id = 1:5,
col1 = sample(1:50, 5),
col2 = sample(1:50, 5),
col3 = sample(1:50, 5)) %>%
mutate_at(vars(-id), ~if_else(. <= 20, NA_integer_, .))
# A tibble: 5 x 4
id col1 col2 col3
<int> <int> <int> <int>
1 1 NA 44 NA
2 2 38 23 34
3 3 48 22 NA
4 4 25 NA 48
5 5 NA NA 43
res <- df %>%
add_column(order = c('col2',
'col2_col3_co1',
'col2_col1',
'col1_col3',
'col3'))
# A tibble: 5 x 5
id col1 col2 col3 order
<int> <int> <int> <int> <chr>
1 1 NA 44 NA col2
2 2 38 23 34 col2_col3_co1
3 3 48 22 NA col2_col1
4 4 25 NA 48 col1_col3
5 5 NA NA 43 col3
我当前的数据采用 df 的形式,而我尝试添加的列是 res 中的 order 列.字符串中元素的排序由每一列的值决定,也需要跳过NA.我试图确定每个 ID 在每列中填充一个值所采用的序列,因为这些值是以天为单位的时间.但是,并非所有 ID 在所有列中都有值,因此始终存在缺失值.我通常在 tidyverse 中工作,但任何解决方案或想法都将不胜感激.
My current data is in the form of df while the column I'm trying to add is the order column in res. The ordering of the elements in the string is determined by the value of each column, and it also needs to skip over NAs. I'm trying to identify the sequence that each ID takes to populate a value in each column as the values are time in days. However, not all IDs will have a value in all columns, so there's missing values throughout. I usually work within tidyverse, but any solution or thoughts would be much appreciated.
推荐答案
一个更简单的选择是 apply
,循环遍历行 (MARGIN = 1
),删除 apply
code>NA 元素,order
其余的非NA,使用索引获取列名并粘贴
它们在一起
An easier option is apply
, loop over the rows (MARGIN = 1
), remove the NA
elements, order
the rest of the non-NA, use the index to get the column names and paste
them together
df$order <- apply(df[-1], 1, function(x) {x1 <- x[!is.na(x)]
paste(names(x1)[order(x1)], collapse="_")})
df$order
#[1] "col2" "col2_col3_col1" "col2_col1" "col1_col3" "col3"
<小时>
或者使用tidyverse
library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(cols = -id, values_drop_na = TRUE) %>%
arrange(id, value) %>%
group_by(id) %>%
summarise(order = str_c(name, collapse="_")) %>%
right_join(df) %>%
select(names(df), order)
# A tibble: 5 x 5
# id col1 col2 col3 order
# <int> <int> <int> <int> <chr>
#1 1 NA 44 NA col2
#2 2 38 23 34 col2_col3_col1
#3 3 48 22 NA col2_col1
#4 4 25 NA 48 col1_col3
#5 5 NA NA 43 col3
<小时>
或者使用 purrr
library(purrr)
df %>%
mutate(order = pmap_chr(select(., starts_with('col')), ~
{x <- c(...)
x1 <- x[!is.na(x)]
str_c(names(x1)[order(x1)], collapse="_")}))
这篇关于R 将列名连接到新列中,同时按其值排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!