将多组测量列(宽格式)重塑为单列(长格式) [英] Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

查看:31
本文介绍了将多组测量列(宽格式)重塑为单列(长格式)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个宽格式的数据框,在不同的日期范围内进行了重复测量.在我的示例中,有三个不同的时期,所有时期都有相应的值.例如.第一个测量值 (Value1) 是在 DateRange1StartDateRange1End 期间测量的:

I have a dataframe in a wide format, with repeated measurements taken within different date ranges. In my example there are three different periods, all with their corresponding values. E.g. the first measurement (Value1) was measured in the period from DateRange1Start to DateRange1End:

ID DateRange1Start DateRange1End Value1 DateRange2Start DateRange2End Value2 DateRange3Start DateRange3End Value3
1 1/1/90 3/1/90 4.4 4/5/91 6/7/91 6.2 5/5/95 6/6/96 3.3 

我希望将数据重塑为长格式,以便对 DateRangeXStart 和 DateRangeXEnd 列进行分组.因此,原来表中的 1 行变成了新表中的 3 行:

I'm looking to reshape the data to a long format such that the DateRangeXStart and DateRangeXEnd columns are grouped,. Thus, what was 1 row in the original table becomes 3 rows in the new table:

ID DateRangeStart DateRangeEnd Value
1 1/1/90 3/1/90 4.4
1 4/5/91 6/7/91 6.2
1 5/5/95 6/6/96 3.3

我知道必须有一种方法可以使用 reshape2/melt/recast/tidyr,但我似乎无法弄清楚如何以这种特殊方式将多组度量变量映射到单组值列中.

I know there must be a way to do this with reshape2/melt/recast/tidyr, but I can't seem to figure it out how to map the multiple sets of measure variables into single sets of value columns in this particular way.

推荐答案

使用 tidyr 的函数 pivot_longer() 可以将具有多个值/度量列的宽格式改造成长格式 包自 1.0.0 版本.

Reshaping from wide to long format with multiple value/measure columns is possible with the function pivot_longer() of the tidyr package since version 1.0.0.

这优于 gather()spread() 之前的 tidyr 策略(请参阅@AndrewMacDonald 的回答),因为不再删除属性(日期在下面的示例中保留日期和数字仍然是数字).

This is superior to the previous tidyr strategy of gather() than spread() (see answer by @AndrewMacDonald), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the example below).

library("tidyr")
library("magrittr")

a <- structure(list(ID = 1L, 
                    DateRange1Start = structure(7305, class = "Date"), 
                    DateRange1End = structure(7307, class = "Date"), 
                    Value1 = 4.4, 
                    DateRange2Start = structure(7793, class = "Date"),
                    DateRange2End = structure(7856, class = "Date"), 
                    Value2 = 6.2, 
                    DateRange3Start = structure(9255, class = "Date"), 
                    DateRange3End = structure(9653, class = "Date"), 
                    Value3 = 3.3),
               row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))

pivot_longer()(对应物:pivot_wider())的工作原理类似于 gather().但是,它提供了额外的功能,例如多值列.只有一个值列,宽数据集的所有列名都将进入一个长列,名称在 names_to 中给出.对于多个值列,names_to 可能会收到多个新名称.

pivot_longer() (counterpart: pivot_wider()) works similar to gather(). However, it offers additional functionality such as multiple value columns. With only one value column, all colnames of the wide data set would go into one long column with the name given in names_to. For multiple value columns, names_to may receive multiple new names.

如果所有列名称都遵循特定模式(例如 Start_1End_1Start_2 等),则这是最简单的.因此,我在第一步中重命名了列.

This is easiest if all column names follow a specific pattern like Start_1, End_1, Start_2, etc. Therefore, I renamed the columns in the first step.

(names(a) <- sub("(\d)(\w*)", "\2_\1", names(a)))
#>  [1] "ID"               "DateRangeStart_1" "DateRangeEnd_1"  
#>  [4] "Value_1"          "DateRangeStart_2" "DateRangeEnd_2"  
#>  [7] "Value_2"          "DateRangeStart_3" "DateRangeEnd_3"  
#> [10] "Value_3"

pivot_longer(a, 
             cols = -ID, 
             names_to = c(".value", "group"),
             # names_prefix = "DateRange",
             names_sep = "_")
#> # A tibble: 3 x 5
#>      ID group DateRangeEnd DateRangeStart Value
#>   <int> <chr> <date>       <date>         <dbl>
#> 1     1 1     1990-01-03   1990-01-01       4.4
#> 2     1 2     1991-07-06   1991-05-04       6.2
#> 3     1 3     1996-06-06   1995-05-05       3.3

或者,可以使用提供更精细控制的枢轴规范来完成重塑(请参阅下面的链接):

Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below):

spec <- a %>%
    build_longer_spec(cols = -ID) %>%
    dplyr::transmute(.name = .name,
                     group = readr::parse_number(name),
                     .value = stringr::str_extract(name, "Start|End|Value"))

pivot_longer(a, spec = spec)

reprex 包 (v0.2.1) 于 2019 年 3 月 26 日创建

Created on 2019-03-26 by the reprex package (v0.2.1)

另见:https://tidyr.tidyverse.org/articles/pivot.html

这篇关于将多组测量列(宽格式)重塑为单列(长格式)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆