合并列并创建另一个列以指定源 [英] Coalesce columns and create another column to specify source

查看:32
本文介绍了合并列并创建另一个列以指定源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 dplyr :: coalesce()将几列合并为一个.最初,跨列,每一行只有一列具有实际值,而其他列为 NA .基于合并,我想创建一个附加列,该列将指定从中获取合并值的 source 列.

I'm using dplyr::coalesce() to combine several columns into one. Originally, across columns, each row has only one column with actual value while the other columns are NA. Based on the coalescing, I want to create an additional column that will specify the source column from which the coalesced value was taken from.

我的尝试受到其他 dplyr 功能中现有功能的启发.例如, dplyr :: bind_rows()具有 .id 参数,该参数指定新数据帧中每一行的源数据帧.

My attempt is inspired by existing functionality in other dplyr functions. For example, dplyr::bind_rows() has .id argument that specifies the source dataframe for each row in the new dataframe.

来自 bind_rows()的文档:

提供.id时,将创建一个新的标识符列以进行链接每行到其原始数据帧.标签取自bind_rows()的命名参数.当数据帧列表是提供时,标签取自列表的名称.如果没有名字找到一个数字序列代替.

When .id is supplied, a new column of identifiers is created to link each row to its original data frame. The labels are taken from the named arguments to bind_rows(). When a list of data frames is supplied, the labels are taken from the names of the list. If no names are found a numeric sequence is used instead.

同样,我当前的问题是关于 coalesce(),而不是 bind_rows(),但我只是想将其放在上下文中.

Again, my current question is about coalesce(), not bind_rows(), but I just wanted to put it in context.

df <-
  data.frame(
  group_1 = c(NA, NA, NA, NA, 2),
  group_2 = c(NA, 4, NA, NA, NA),
  group_3 = c(NA, NA, 5, NA, NA),
  group_4 = c(1, NA, NA, 2, NA),
  group_5 = c(NA, NA, NA, NA, NA)
)

df

##   group_1 group_2 group_3 group_4 group_5         ## each row
## 1      NA      NA      NA       1      NA         ## has one value
## 2      NA       4      NA      NA      NA         ## and the rest
## 3      NA      NA       5      NA      NA         ## are NAs
## 4      NA      NA      NA       2      NA
## 5       2      NA      NA      NA      NA

将列合并为一个(附加)列

library(dplyr)

df %>%
  mutate(one_col = coalesce(group_1, group_2, group_3, group_4, group_5))

##   group_1 group_2 group_3 group_4 group_5 one_col
## 1      NA      NA      NA       1      NA       1
## 2      NA       4      NA      NA      NA       4
## 3      NA      NA       5      NA      NA       5
## 4      NA      NA      NA       2      NA       2
## 5       2      NA      NA      NA      NA       2



我如何添加另一列来指定源",即 one_col 中的值是从哪一列获取的?



How can I add yet another column that will specify the "source", i.e., from which column the value in one_col was taken from?

  group_1 group_2 group_3 group_4 group_5 one_col source_col
1      NA      NA      NA       1      NA       1    group_4
2      NA       4      NA      NA      NA       4    group_2
3      NA      NA       5      NA      NA       5    group_3
4      NA      NA      NA       2      NA       2    group_4
5       2      NA      NA      NA      NA       2    group_1



编辑



EDIT

@Karthik在下面的回答使我认为,我上面使用的示例数据说明了一种过于狭窄和具体的情况.Karthik提供的解决方案与合并操作无关.因此,如果我们交换订单并先创建 source_col ,然后再创建 coalesce ,该代码仍然可以使用.

@Karthik's answer below led me thinking that the example data I used above demonstrates a situation that is too narrow and specific. The solution Karthik offers is independent of the coalescing operation. Thus, the code would still work if we swap the order and create the source_col first and only then coalesce.

但是,如果数据每行有一个以上的 NA ,则 coalesce 仍会执行其操作,但是我们将不再基于 source_col 找到单个非缺失值.因此,我正在修改问题和数据.

However, if the data had more than one NA per row, coalesce would still do its thing, but we could no longer base source_col on finding the single non-missing value. Therefore, I'm revising the question and the data.

df_2 <-
  data.frame(
  group_1 = c(NA, NA, NA, NA, 2),
  group_2 = c(NA, 4, NA, NA, 1),
  group_3 = c(NA, NA, 5, NA, NA),
  group_4 = c(1, NA, NA, 2, NA),
  group_5 = c(NA, 3, NA, NA, NA)
)

> df_2

##   group_1 group_2 group_3 group_4 group_5
## 1      NA      NA      NA       1      NA   ## <--- one non-NA
## 2      NA       4      NA      NA       3   ## <--- *two* non-NA
## 3      NA      NA       5      NA      NA   ## <--- one non-NA
## 4      NA      NA      NA       2      NA   ## <--- one non-NA
## 5       2       1      NA      NA      NA   ## <--- *two* non-NA

促销

> df_2 %>%
   mutate(one_col = coalesce(group_1, group_2, group_3, group_4, group_5))

##   group_1 group_2 group_3 group_4 group_5 one_col
## 1      NA      NA      NA       1      NA       1
## 2      NA       4      NA      NA       3       4
## 3      NA      NA       5      NA      NA       5
## 4      NA      NA      NA       2      NA       2
## 5       2       1      NA      NA      NA       2

如何添加将 coalesce()选择的值与其原始列相匹配的源列?

How can I add a source column that will match the value chosen by coalesce() with the original column it came from?

所需的输出

  group_1 group_2 group_3 group_4 group_5 one_col source_col
1      NA      NA      NA       1      NA       1    group_4
2      NA       4      NA      NA       3       4    group_2
3      NA      NA       5      NA      NA       5    group_3
4      NA      NA      NA       2      NA       2    group_4
5       2       1      NA      NA      NA       2    group_1

推荐答案

这项工作是

df %>%
   mutate(one_col = coalesce(group_1, group_2, group_3, group_4, group_5)) %>% 
rowwise() %>% mutate(group_col = names(df)[!is.na(c_across(group_1:group_5))])
# A tibble: 5 x 7
# Rowwise: 
  group_1 group_2 group_3 group_4 group_5 one_col group_col
    <dbl>   <dbl>   <dbl>   <dbl> <lgl>     <dbl> <chr>    
1      NA      NA      NA       1 NA            1 group_4  
2      NA       4      NA      NA NA            4 group_2  
3      NA      NA       5      NA NA            5 group_3  
4      NA      NA      NA       2 NA            2 group_4  
5       2      NA      NA      NA NA            2 group_1  
>

最新答案:

df_2 %>% mutate(one_col = coalesce(group_1, group_2, group_3, group_4, group_5)) %>% rowwise() %>% 
   mutate(group_col = names(df_2)[!is.na(c_across(group_1:group_5))][1])
# A tibble: 5 x 7
# Rowwise: 
  group_1 group_2 group_3 group_4 group_5 one_col group_col
    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>    
1      NA      NA      NA       1      NA       1 group_4  
2      NA       4      NA      NA       3       4 group_2  
3      NA      NA       5      NA      NA       5 group_3  
4      NA      NA      NA       2      NA       2 group_4  
5       2       1      NA      NA      NA       2 group_1  

这篇关于合并列并创建另一个列以指定源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆