提迪尔传播后,如何控制新变量的名称? [英] How to control new variables' names after tidyr's spread?

查看:74
本文介绍了提迪尔传播后,如何控制新变量的名称?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个采用面板结构的数据框:两年内每个单元有2个观测值:

I have a dataframe with panel structure: 2 observations for each unit from two years:

library(tidyr)
mydf <- data.frame(
    id = rep(1:3, rep(2,3)), 
    year = rep(c(2012, 2013), 3), 
    value = runif(6)
)
mydf
#  id year      value
#1  1 2012 0.09668064
#2  1 2013 0.62739399
#3  2 2012 0.45618433
#4  2 2013 0.60347152
#5  3 2012 0.84537624
#6  3 2013 0.33466030

我想将此数据整形为宽格式,可以使用 tidyr :: spread轻松完成。但是,由于 year 变量的值是数字,所以我的新变量的名称也变成了数字,这使得进一步使用它变得更加困难。

I would like to reshape this data to wide format which can be done easily with tidyr::spread. However, as the values of the year variable are numbers, the names of my new variables become numbers as well which makes its further use harder.

spread(mydf, year, value)
#  id       2012      2013
#1  1 0.09668064 0.6273940
#2  2 0.45618433 0.6034715
#3  3 0.84537624 0.3346603

我知道我可以轻松地重命名列。但是,如果我想与其他操作一起在链中进行重塑,则将带来不便。例如。

I know I can easily rename the columns. However, if I would like to reshape within a chain with other operations, it becomes inconvenient. E.g. the following line obviously does not make sense.

library(dplyr)
mydf %>% spread(year, value) %>% filter(2012 > 0.5)

以下工作原理并不那么简洁:

The following works but is not that concise:

tmp <- spread(mydf, year, value)
names(tmp) <- c("id", "y2012", "y2013")
filter(tmp, y2012 > 0.5)

有什么想法可以在 spread 范围内更改新变量名吗?

Any idea how I can change the new variable names within spread?

推荐答案

我知道自最初提出此问题以来已经过去了几年,但为了后代,我还要强调 sep 自变量 的参数。不为 NULL 时,它将用作键名和值之间的分隔符:

I know some years has passed since this question was originally asked, but for posterity I want to also highlight the sep argument of spread. When not NULL, it will be used as separator between the key name and values:

mydf %>% 
 spread(key = year, value = value, sep = "")
#  id   year2012  year2013
#1  1 0.15608322 0.6886531
#2  2 0.04598124 0.0792947
#3  3 0.16835445 0.1744542

问题,但足以满足我的目的。参见?价差

This is not exactly as wanted in the question, but sufficient for my purposes. See ?spread.

使用提迪尔1.0.0更新:提迪尔1.0.0现在已经引入了 pivot_wider (和 pivot_longer ),它允许在这方面使用参数 names_sep 和 names_prefix 。因此现在的调用将是:

Update with tidyr 1.0.0: tidyr 1.0.0 have now introduced pivot_wider (and pivot_longer) which allows for more control in this respect with the arguments names_sep and names_prefix. So now the call would be:

mydf %>% 
  pivot_wider(names_from = year, values_from = value,
              names_prefix = "year")
# # A tibble: 3 x 3
#        id year2012 year2013
#     <int>    <dbl>    <dbl>
#   1     1    0.347    0.388
#   2     2    0.565    0.924
#   3     3    0.406    0.296

要获得最初想要的内容(仅以 y作为前缀),您现在当然可以通过简单地具有 names_prefix = y 来直接获得。

To get exactly what was originally wanted (prefixing "y" only) you can of course now get that directly by simply having names_prefix = "y".

使用 names_sep 的情况是,当您收集多个列时,如下所示,其中我在数据中添加了四分之一:

The names_sep is used in case you gather over multiple columns as demonstrated below where I have added quarters to the data:

# Add quarters to data
mydf2 <- data.frame(
  id = rep(1:3, each = 8), 
  year = rep(rep(c(2012, 2013), each = 4), 3), 
  quarter  = rep(c("Q1","Q2","Q3","Q4"), 3),
  value = runif(24)
)
head(mydf2)
# id year quarter     value
# 1  1 2012      Q1 0.8651470
# 2  1 2012      Q2 0.3944423
# 3  1 2012      Q3 0.4580580
# 4  1 2012      Q4 0.2902604
# 5  1 2013      Q1 0.4751588
# 6  1 2013      Q2 0.6851755

mydf2 %>% 
  pivot_wider(names_from = c(year, quarter), values_from = value,
              names_sep = "_", names_prefix = "y")
# # A tibble: 3 x 9
#      id  y2012_Q1  y2012_Q2  y2012_Q3  y2012_Q4  y2013_Q1  y2013_Q2  y2013_Q3  y2013_Q4 
#   <int>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
# 1     1     0.865     0.394     0.458    0.290      0.475     0.685     0.213     0.920
# 2     2     0.566     0.614     0.509    0.0515     0.974     0.916     0.681     0.509
# 3     3     0.968     0.615     0.670    0.748      0.723     0.996     0.247     0.449

这篇关于提迪尔传播后,如何控制新变量的名称?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆