使用case_when,如何更改将向量嵌套在其中的新列表列? [英] Using case_when, how to mutate a new list-column that nests a vector within?
问题描述
我正在尝试使用 dplyr
的 case_when()
根据其他列中的条件对新列进行突变.但是,我希望新列嵌套一个向量.
I'm trying to use dplyr
's case_when()
to mutate a new column based on conditions in other columns. However, I want the new column to be nesting a vector.
请考虑以下玩具数据.基于此,我想总结一下英国的地理区域.
Consider the following toy data. Based on it, I want to summarize the geographical territory of the UK.
library(tibble)
set.seed(1)
my_mat <- matrix(sample(c(TRUE, FALSE), size = 40, replace = TRUE), nrow = 10, ncol = 4)
colnames(my_mat) <- c("England", "Wales", "Scotland", "Northern_Ireland")
my_df <- as_tibble(my_mat)
> my_df
## # A tibble: 10 x 4
## England Wales Scotland Northern_Ireland
## <lgl> <lgl> <lgl> <lgl>
## 1 TRUE TRUE TRUE FALSE
## 2 FALSE TRUE TRUE FALSE
## 3 TRUE TRUE TRUE TRUE
## 4 TRUE TRUE TRUE FALSE
## 5 FALSE TRUE TRUE TRUE
## 6 TRUE FALSE TRUE TRUE
## 7 TRUE FALSE FALSE FALSE
## 8 TRUE FALSE TRUE TRUE
## 9 FALSE FALSE TRUE FALSE
## 10 FALSE TRUE FALSE FALSE
我想对新的 collective_geo_territory
列进行突变.
I want to mutate a new collective_geo_territory
column.
- 如果
England
,Scotland
,Wales
和Northern_Ireland
均为TRUE
,那么我们说这是United_Kingdom
. - 否则,如果只有
England
,Scotland
和Wales
是TRUE
,那么我们说这是Great_Britain
- 任何其他组合都将简单地返回一个向量,该向量的国家/地区名称为
TRUE
.
- if both
England
,Scotland
,Wales
, andNorthern_Ireland
areTRUE
, then we say this isUnited_Kingdom
. - otherwise, if only
England
,Scotland
, andWales
areTRUE
, then we say this isGreat_Britain
- any other combination would simply return a vector with the names of countries that are
TRUE
.
我的尝试
到目前为止,我知道如何使用以下代码解决上面详述的条件(1)和(2)
library(dplyr)
my_df %>%
mutate(collective_geo_territory = case_when(England == TRUE & Wales == TRUE & Scotland == TRUE & Northern_Ireland == TRUE ~ "United_Kingdom",
England == TRUE & Wales == TRUE & Scotland == TRUE ~ "Great_Britain"))
所需的输出
但是,我想通过 collective_geo_territory
列实现如下所示的输出:
Desired Output
However, I want to achieve an output with collective_geo_territory
column that looks like the following:
## # A tibble: 10 x 5
## England Wales Scotland Northern_Ireland collective_geo_territory
## <lgl> <lgl> <lgl> <lgl> <list>
## 1 TRUE TRUE TRUE FALSE <chr [1]> # c("Great_Britain")
## 2 FALSE TRUE TRUE FALSE <chr [2]> # c("Wales", "Scotland")
## 3 TRUE TRUE TRUE TRUE <chr [1]> # c("United_Kingdom")
## 4 TRUE TRUE TRUE FALSE <chr [1]> # c("Great_Britain")
## 5 FALSE TRUE TRUE TRUE <chr [3]> # c("Wales", "Scotland", "Northern_Ireland")
## 6 TRUE FALSE TRUE TRUE <chr [3]> # c("England", "Scotland", "Northern_Ireland")
## 7 TRUE FALSE FALSE FALSE <chr [1]> # c("England")
## 8 TRUE FALSE TRUE TRUE <chr [3]> # c("England", "Scotland", "Northern_Ireland")
## 9 FALSE FALSE TRUE FALSE <chr [1]> # c("Scotland")
## 10 FALSE TRUE FALSE FALSE <chr [1]> # c("Wales")
推荐答案
这里是一种方法:
library(purrr) # used for pmap
my_df %>%
mutate(collective_geo_territory = case_when(
England & Wales & Scotland & Northern_Ireland ~ list("United_Kingdom"),
England & Wales & Scotland ~ list("Great_Britain"),
TRUE ~ pmap(my_df, ~names(my_df)[c(...)]))
)
本质上,最后一行的工作方式如下:
Essentially, the last line works as follows:
- 左侧可以简单地是
TRUE
,因为case_when()
终止于第一个相关的TRUE
.因此,只有在条件1和2失败的情况下,我们才会到达此行. - 从本质上讲,右侧表示对我的数据集的行(
pmap
)进行迭代并应用以下函数:获取我的数据集中的列名(names
)并将它们([]
)子集化为仅值为真(包含在c()
中)的那些
- The left-hand side can simply be
TRUE
becausecase_when()
terminates on the first relevantTRUE
. So, we will only reach this line if conditions 1 and 2 have failed. - The right-hand side essentially says iterate over the rows of my dataset (
pmap
) and apply the follow function: get the names of the columns in my dataset (names
) and subset them ([]
) only to those where the values are true (contained inc()
)
一些附加说明:
- 请注意,我还必须将前两个条件(例如
"United_Kingdom"
)的右侧幻灯片包装在list()
中,因为case_when()
要求所得向量的类型一致 - 我将多余的
England == TRUE
(与其他国家/地区相同)简单地更改为England
.由于这些列已经包含逻辑值,因此无需重新检查其值,这使代码更具可读性.
- Note that I also had to wrap the right-hand slide of the first two conditions (e.g.
"United_Kingdom"
) in alist()
becausecase_when()
requires consistent types for the resulting vector - I changed the redundant
England == TRUE
(and same for other countries) simply toEngland
. Since these columns already contain logical values, there's no need to recheck their values, and this makes the code a bit more readable.
这篇关于使用case_when,如何更改将向量嵌套在其中的新列表列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!