使用case_when，如何更改将向量嵌套在其中的新列表列? [英] Using case_when, how to mutate a new list-column that nests a vector within?

查看：45 发布时间：2021/5/2 20:47:16 r dplyr tibble

本文介绍了使用case_when，如何更改将向量嵌套在其中的新列表列?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 dplyr 的 case_when()根据其他列中的条件对新列进行突变.但是，我希望新列嵌套一个向量.

I'm trying to use dplyr's case_when() to mutate a new column based on conditions in other columns. However, I want the new column to be nesting a vector.

请考虑以下玩具数据.基于此，我想总结一下英国的地理区域.

Consider the following toy data. Based on it, I want to summarize the geographical territory of the UK.

library(tibble)

set.seed(1)
my_mat <- matrix(sample(c(TRUE, FALSE), size = 40, replace = TRUE), nrow = 10, ncol = 4) 
colnames(my_mat) <- c("England", "Wales", "Scotland", "Northern_Ireland")
my_df <- as_tibble(my_mat)

> my_df

## # A tibble: 10 x 4
##    England Wales Scotland Northern_Ireland
##    <lgl>   <lgl> <lgl>    <lgl>           
##  1 TRUE    TRUE  TRUE     FALSE           
##  2 FALSE   TRUE  TRUE     FALSE           
##  3 TRUE    TRUE  TRUE     TRUE            
##  4 TRUE    TRUE  TRUE     FALSE           
##  5 FALSE   TRUE  TRUE     TRUE            
##  6 TRUE    FALSE TRUE     TRUE            
##  7 TRUE    FALSE FALSE    FALSE           
##  8 TRUE    FALSE TRUE     TRUE            
##  9 FALSE   FALSE TRUE     FALSE           
## 10 FALSE   TRUE  FALSE    FALSE

我想对新的 collective_geo_territory 列进行突变.

I want to mutate a new collective_geo_territory column.

如果 England ， Scotland ， Wales 和 Northern_Ireland 均为 TRUE ，那么我们说这是 United_Kingdom .
否则，如果只有 England ， Scotland 和 Wales 是 TRUE ，那么我们说这是 Great_Britain
任何其他组合都将简单地返回一个向量，该向量的国家/地区名称为 TRUE .

if both England, Scotland, Wales, and Northern_Ireland are TRUE, then we say this is United_Kingdom.
otherwise, if only England, Scotland, and Wales are TRUE, then we say this is Great_Britain
any other combination would simply return a vector with the names of countries that are TRUE.

我的尝试

到目前为止，我知道如何使用以下代码解决上面详述的条件(1)和(2)

library(dplyr)

my_df %>%
  mutate(collective_geo_territory = case_when(England == TRUE & Wales == TRUE & Scotland == TRUE & Northern_Ireland == TRUE ~ "United_Kingdom",
                                              England == TRUE & Wales == TRUE & Scotland == TRUE ~ "Great_Britain"))

所需的输出

但是，我想通过 collective_geo_territory 列实现如下所示的输出:

Desired Output

However, I want to achieve an output with collective_geo_territory column that looks like the following:

## # A tibble: 10 x 5
##      England Wales Scotland Northern_Ireland collective_geo_territory
##      <lgl>   <lgl> <lgl>    <lgl>            <list>                   
##   1  TRUE    TRUE  TRUE     FALSE            <chr [1]>   # c("Great_Britain")           
##   2  FALSE   TRUE  TRUE     FALSE            <chr [2]>   # c("Wales", "Scotland")                      
##   3  TRUE    TRUE  TRUE     TRUE             <chr [1]>   # c("United_Kingdom")        
##   4  TRUE    TRUE  TRUE     FALSE            <chr [1]>   # c("Great_Britain")
##   5  FALSE   TRUE  TRUE     TRUE             <chr [3]>   # c("Wales", "Scotland", "Northern_Ireland")
##   6  TRUE    FALSE TRUE     TRUE             <chr [3]>   # c("England", "Scotland", "Northern_Ireland")
##   7  TRUE    FALSE FALSE    FALSE            <chr [1]>   # c("England") 
##   8  TRUE    FALSE TRUE     TRUE             <chr [3]>   # c("England", "Scotland", "Northern_Ireland")
##   9  FALSE   FALSE TRUE     FALSE            <chr [1]>   # c("Scotland") 
##   10 FALSE   TRUE  FALSE    FALSE            <chr [1]>   # c("Wales")

推荐答案

这里是一种方法:

library(purrr) # used for pmap

my_df %>%
  mutate(collective_geo_territory = case_when(
    England & Wales & Scotland & Northern_Ireland ~ list("United_Kingdom"),
    England & Wales & Scotland ~ list("Great_Britain"),
    TRUE ~ pmap(my_df, ~names(my_df)[c(...)]))
    )

本质上，最后一行的工作方式如下:

Essentially, the last line works as follows:

左侧可以简单地是 TRUE ，因为 case_when()终止于第一个相关的 TRUE .因此，只有在条件1和2失败的情况下，我们才会到达此行.
从本质上讲，右侧表示对我的数据集的行( pmap )进行迭代并应用以下函数:获取我的数据集中的列名( names )并将它们( [] )子集化为仅值为真(包含在 c()中)的那些

The left-hand side can simply be TRUE because case_when() terminates on the first relevant TRUE. So, we will only reach this line if conditions 1 and 2 have failed.
The right-hand side essentially says iterate over the rows of my dataset (pmap) and apply the follow function: get the names of the columns in my dataset (names) and subset them ([]) only to those where the values are true (contained in c())

一些附加说明:

请注意，我还必须将前两个条件(例如"United_Kingdom" )的右侧幻灯片包装在 list()中，因为case_when()要求所得向量的类型一致
我将多余的 England == TRUE (与其他国家/地区相同)简单地更改为 England .由于这些列已经包含逻辑值，因此无需重新检查其值，这使代码更具可读性.

Note that I also had to wrap the right-hand slide of the first two conditions (e.g. "United_Kingdom") in a list() because case_when() requires consistent types for the resulting vector
I changed the redundant England == TRUE (and same for other countries) simply to England. Since these columns already contain logical values, there's no need to recheck their values, and this makes the code a bit more readable.

这篇关于使用case_when，如何更改将向量嵌套在其中的新列表列?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用case_when，如何更改将向量嵌套在其中的新列表列? [英] Using case_when, how to mutate a new list-column that nests a vector within?

问题描述

我的尝试

所需的输出

Desired Output

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用case_when，如何更改将向量嵌套在其中的新列表列? [英] Using case_when, how to mutate a new list-column that nests a vector within?

问题描述

我的尝试

所需的输出

Desired Output

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭