使用collect()将两个(或更多)列组收集为两个(或更多)键值对 [英] Using gather() to gather two (or more) groups of columns into two (or more) key-value pairs
问题描述
我想将两个单独的列组收集到两个键值对中.这是一些示例数据:
I want to gather two seperate groups of columns into two key-value pairs. Here's some example data:
library(dplyr)
library(tidyr)
ID = c(1:5)
measure1 = c(1:5)
measure2 = c(6:10)
letter1 = c("a", "b", "c", "d", "e")
letter2 = c("f", "g", "h", "i", "j")
df = data.frame(ID, measure1, measure2, letter1, letter2)
df = tbl_df(df)
df$letter1 <- as.character(df$letter1)
df$letter2 <- as.character(df$letter2)
我希望两个度量列(measure1和measure2)的值在同一列中(键-值对)旁边,并带有键列.我也想要相同的letter1和letter2.我认为我可以使用select()创建两个不同的数据集,分别在两个数据集上使用collect,然后加入(这可行):
I want the values of the two measure columns (measure1 and measure2) to be in one column with a key-column next to it (the key-value pair). I also want the same for letter1 and letter2. I figured that I could use select() to create two different datasets, use gather seperately on both datasets and then join (this worked):
df_measure = df %>%
select(ID, measure1, measure2) %>%
gather(measure_time, measure, -ID) %>%
mutate(id.extra = c(1:10))
df_letter = df %>%
select(ID, letter1, letter2) %>%
gather(letter_time, letter, -ID) %>%
mutate(id.extra = c(1:10))
df_long = df_measure %>%
left_join(df_letter, by = "id.extra")
因此(在这种情况下)这可以很好地工作,但是我想这可以做得更优雅(无需拆分或创建"id.extra"之类的东西).因此,请对此加以说明!
So this works perfectly (in this case), but i guess this could be done more elegantly (without stuff like splitting or creating 'id.extra').So please shed some light on it!
推荐答案
您可以使用类似以下内容的东西.从您当前的方法来看,我不确定这是否正是您想要的输出,因为它似乎包含很多冗余信息.
You can use something like the following. I'm not sure from your current approach if this is exactly your desired output or not since it seems to contain a lot of redundant information.
df %>%
gather(val, var, -ID) %>%
extract(val, c("value", "time"), regex = "([a-z]+)([0-9]+)") %>%
spread(value, var)
# # A tibble: 10 × 4
# ID time letter measure
# * <int> <chr> <chr> <chr>
# 1 1 1 a 1
# 2 1 2 f 6
# 3 2 1 b 2
# 4 2 2 g 7
# 5 3 1 c 3
# 6 3 2 h 8
# 7 4 1 d 4
# 8 4 2 i 9
# 9 5 1 e 5
# 10 5 2 j 10
使用"data.table"中的melt
+ patterns
更容易做到这一点:
This is much more easily done with melt
+ patterns
from "data.table":
library(data.table)
melt(as.data.table(df), measure.vars = patterns("measure", "letter"))
或者您可能是老派,只使用基数R中的reshape
.但是请注意,基数R的reshape
不喜欢"tibbles",因此您必须使用as.data.frame
对其进行转换.) /p>
Or you can be old-school and just use reshape
from base R. Note, however, that base R's reshape
does not like "tibbles", so you have to convert it with as.data.frame
).
reshape(as.data.frame(df), direction = "long", idvar = "ID",
varying = 2:ncol(df), sep = "")
这篇关于使用collect()将两个(或更多)列组收集为两个(或更多)键值对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!