tidyr :: unite跨列模式 [英] tidyr::unite across column patterns

查看:291
本文介绍了tidyr :: unite跨列模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据集

I have a dataset that looks something like this

site <- c("A", "B", "C", "D", "E")
D01_1 <- c(1, 0, 0, 0, 1)
D01_2 <- c(1, 1, 0, 1, 1)
D02_1 <- c(1, 0, 1, 0, 1)
D02_2 <- c(0, 1, 0, 0, 1)
D03_1 <- c(1, 1, 0, 0, 0)
D03_2 <- c(0, 1, 0, 0, 1)
df <- data.frame(site, D01_1, D01_2, D02_1, D02_2, D03_1, D03_2)

我正试图团结 D0x_1 D0x_2 列,以便用斜杠分隔列中的值。我可以使用以下代码来做到这一点,并且效果很好:

I am trying to unite the D0x_1 and D0x_2 columns so that the values in the columns are separated by a slash. I can do this with the following code and it works just fine:

library(dplyr)
library(tidyr)

df.unite <- df %>%
  unite(D01, D01_1, D01_2, sep = "/", remove = TRUE) %>%
  unite(D02, D02_1, D02_2, sep = "/", remove = TRUE) %>%
  unite(D03, D03_1, D03_2, sep = "/", remove = TRUE)

...但是问题在于,这需要我输入每个 unite 多次配对,在我的数据集中的大量列中显得笨拙。 dplyr 中是否有一种方法可以组合相似模式的列名,然后循环遍历这些列? unite_each 似乎不存在。

...but the problem is that it requires me to type out each unite pair multiple times and it is unwieldy across the large number of columns in my dataset. Is there a way in dplyr to unite across similarly patterned column names and then loop across the columns? unite_each doesn't seem to exist.

推荐答案

两个选项,其中

首先,您可以使用 lapply 应用 unite _ (您可以将字符串传递到的标准评估版)以编程方式跨列。为此,您需要构建要使用的名称列表,然后将 lapply 包装在 do.call(cbind 捕获列,然后将 cbind 站点返回到它。

First, you can use lapply to apply unite_ (the standard evaluation version to which you can pass strings) programmatically across columns. To do so, you'll need to build a list of names for it to use, and then wrap the lapply in do.call(cbind to catch columns, and cbind site back to it. Altogether:

cols <- unique(substr(names(df)[-1], 1, 3))
cbind(site = df$site, do.call(cbind,
        lapply(cols, function(x){unite_(df, x, grep(x, names(df), value = TRUE), 
                                        sep = '/', remove = TRUE) %>% select_(x)})
        ))

#   site D01 D02 D03
# 1    A 1/1 1/0 1/0
# 2    B 0/1 0/1 1/1
# 3    C 0/0 1/0 0/0
# 4    D 0/1 0/0 0/0
# 5    E 1/1 1/1 0/1






选项2:链接



或者,如果您真的很喜欢管道,则可以将整个东西扎成一个链条( lapply !),将一些基本函数替换为 dplyr 的那些基本函数:


Option 2: Chained

Alternately, if you really like pipes, you can actually hack the whole thing into a chain (lapply included!), swapping out a few of the base functions for dplyr ones:

df %>% select(-site) %>% names() %>% substr(1,3) %>% unique() %>%
  lapply(function(x){unite_(df, x, grep(x, names(df), value = TRUE), 
                            sep = '/', remove = TRUE) %>% select_(x)}) %>%
  bind_cols() %>% mutate(site = as.character(df$site)) %>% select(site, starts_with('D'))

# Source: local data frame [5 x 4]
# 
#    site   D01   D02   D03
#   (chr) (chr) (chr) (chr)
# 1     A   1/1   1/0   1/0
# 2     B   0/1   0/1   1/1
# 3     C   0/0   1/0   0/0
# 4     D   0/1   0/0   0/0
# 5     E   1/1   1/1   0/1

查看中间产品以查看它如何组合在一起,但是与基本方法几乎​​相同。

Check out the intermediate products to see how it fits together, but it's pretty much the same logic as the base approach.

这篇关于tidyr :: unite跨列模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆