tidyr :: unite跨列模式 [英] tidyr::unite across column patterns
问题描述
我有一个看起来像这样的数据集
I have a dataset that looks something like this
site <- c("A", "B", "C", "D", "E")
D01_1 <- c(1, 0, 0, 0, 1)
D01_2 <- c(1, 1, 0, 1, 1)
D02_1 <- c(1, 0, 1, 0, 1)
D02_2 <- c(0, 1, 0, 0, 1)
D03_1 <- c(1, 1, 0, 0, 0)
D03_2 <- c(0, 1, 0, 0, 1)
df <- data.frame(site, D01_1, D01_2, D02_1, D02_2, D03_1, D03_2)
我正试图团结 D0x_1
和 D0x_2
列,以便用斜杠分隔列中的值。我可以使用以下代码来做到这一点,并且效果很好:
I am trying to unite the D0x_1
and D0x_2
columns so that the values in the columns are separated by a slash. I can do this with the following code and it works just fine:
library(dplyr)
library(tidyr)
df.unite <- df %>%
unite(D01, D01_1, D01_2, sep = "/", remove = TRUE) %>%
unite(D02, D02_1, D02_2, sep = "/", remove = TRUE) %>%
unite(D03, D03_1, D03_2, sep = "/", remove = TRUE)
...但是问题在于,这需要我输入每个 unite
多次配对,在我的数据集中的大量列中显得笨拙。 dplyr
中是否有一种方法可以组合相似模式的列名,然后循环遍历这些列? unite_each
似乎不存在。
...but the problem is that it requires me to type out each unite
pair multiple times and it is unwieldy across the large number of columns in my dataset. Is there a way in dplyr
to unite across similarly patterned column names and then loop across the columns? unite_each
doesn't seem to exist.
推荐答案
两个选项,其中
首先,您可以使用 lapply
应用 unite _
(您可以将字符串传递到的标准评估版)以编程方式跨列。为此,您需要构建要使用的名称列表,然后将 lapply
包装在 do.call(cbind
捕获列,然后将 cbind
站点
返回到它。
First, you can use lapply
to apply unite_
(the standard evaluation version to which you can pass strings) programmatically across columns. To do so, you'll need to build a list of names for it to use, and then wrap the lapply
in do.call(cbind
to catch columns, and cbind
site
back to it. Altogether:
cols <- unique(substr(names(df)[-1], 1, 3))
cbind(site = df$site, do.call(cbind,
lapply(cols, function(x){unite_(df, x, grep(x, names(df), value = TRUE),
sep = '/', remove = TRUE) %>% select_(x)})
))
# site D01 D02 D03
# 1 A 1/1 1/0 1/0
# 2 B 0/1 0/1 1/1
# 3 C 0/0 1/0 0/0
# 4 D 0/1 0/0 0/0
# 5 E 1/1 1/1 0/1
选项2:链接
或者,如果您真的很喜欢管道,则可以将整个东西扎成一个链条( lapply
!),将一些基本函数替换为 dplyr
的那些基本函数:
Option 2: Chained
Alternately, if you really like pipes, you can actually hack the whole thing into a chain (lapply
included!), swapping out a few of the base functions for dplyr
ones:
df %>% select(-site) %>% names() %>% substr(1,3) %>% unique() %>%
lapply(function(x){unite_(df, x, grep(x, names(df), value = TRUE),
sep = '/', remove = TRUE) %>% select_(x)}) %>%
bind_cols() %>% mutate(site = as.character(df$site)) %>% select(site, starts_with('D'))
# Source: local data frame [5 x 4]
#
# site D01 D02 D03
# (chr) (chr) (chr) (chr)
# 1 A 1/1 1/0 1/0
# 2 B 0/1 0/1 1/1
# 3 C 0/0 1/0 0/0
# 4 D 0/1 0/0 0/0
# 5 E 1/1 1/1 0/1
查看中间产品以查看它如何组合在一起,但是与基本方法几乎相同。
Check out the intermediate products to see how it fits together, but it's pretty much the same logic as the base approach.
这篇关于tidyr :: unite跨列模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!