将数据从长格式更改为宽格式-多个变量 [英] Reshape data from long to wide format - more than one variable

查看:95
本文介绍了将数据从长格式更改为宽格式-多个变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试使用dcast 函数将数据从长到宽的公式重塑-tag title =显示标记为'reshape2'的问题 rel = tag> reshape2

I’m trying to reshape my data from long to wide formula using the dcast function from reshape2.

目标是在 value.var 参数中使用不同的变量,但是R不让我使用更多的变量

The objective is to use different variables in the value.var parameter but R doesn't let me use more than one value in it.

我还有其他方法可以解决它吗?我看过其他类似的问题,但找不到类似的例子。

Is there any other way I could fix it? I've looked at other similar questions but I haven't been able to find a similar examples.

这是我当前的数据集:

+---------+------+--------+--------------+------------+
| Country | Year | Growth | Unemployment | Population |
+---------+------+--------+--------------+------------+
| A       | 2015 |      2 |          8.3 |         40 |
| B       | 2015 |      3 |          9.2 |         32 |
| C       | 2015 |    2.5 |          9.1 |         30 |
| D       | 2015 |    1.5 |          6.1 |         27 |
| A       | 2016 |      4 |          8.1 |         42 |
| B       | 2016 |    3.5 |            9 |       32.5 |
| C       | 2016 |    3.7 |            9 |         31 |
| D       | 2016 |    3.1 |          5.3 |         29 |
| A       | 2017 |    4.5 |          8.1 |       42.5 |
| B       | 2017 |    4.4 |          8.4 |         33 |
| C       | 2017 |    4.3 |          8.5 |         30 |
| D       | 2017 |    4.2 |          5.2 |         30 |
+---------+------+--------+--------------+------------+

我的目标是将Year列传递给其余部分列(增长,失业和人口)。我正在使用以下dcast函数。

My objective is to pass year column to the rest of the columns (growth, unemployment and population). I’m using the below dcast function.

data_wide <- dcast(world, country  ~ year,
     value.var=c("Growth","Unemployment","Population"))

理想结果

+---------+-------------+-------------------+-----------------+-------------+-------------------+-----------------+
| Country | Growth_2015 | Unemployment_2015 | Population_2015 | Growth_2016 | Unemployment_2016 | Population_2016 |
+---------+-------------+-------------------+-----------------+-------------+-------------------+-----------------+
| A       |           2 |               8.3 |              40 |           4 |               8.1 |              42 |
| B       |           3 |               9.2 |              32 |         3.5 |                 9 |            32.5 |
| C       |         2.5 |               9.1 |              30 |         3.7 |                 9 |              31 |
| D       |         1.5 |               6.1 |              27 |         3.1 |               5.3 |              29 |
+---------+-------------+-------------------+-----------------+-------------+-------------------+-----------------+


推荐答案

OP给出的 dcast()语句与 data.table的最新版本几乎完美地结合在一起包,因为这些包允许将多个度量变量与 dcast() melt()一起使用:

The dcast() statement given by the OP works almost perfect with the recent versions of the data.table package as these allow for multiple measure variables to be used with dcast() and melt():

library(data.table)   # CRAN version 1.10.4
setDT(world)   # coerce to data.table
data_wide <- dcast(world, Country ~ Year, 
                   value.var = c("Growth", "Unemployment", "Population"))

data_wide
#   Country Growth_2015 Growth_2016 Growth_2017 Unemployment_2015 Unemployment_2016 Unemployment_2017 Population_2015
#1:       A         2.0         4.0         4.5               8.3               8.1               8.1              40
#2:       B         3.0         3.5         4.4               9.2               9.0               8.4              32
#3:       C         2.5         3.7         4.3               9.1               9.0               8.5              30
#4:       D         1.5         3.1         4.2               6.1               5.3               5.2              27
#   Population_2016 Population_2017
1:            42.0            42.5
2:            32.5            33.0
3:            31.0            30.0
4:            29.0            30.0

这与 tidyr 解决方案相同。

This is the same result as the tidyr solution.

但是,OP为他的理想解决方案请求了特定的列顺序,其中采用了不同的度量

However, the OP has requested a specific column order for his ideal solution where the different measure variables of each year are grouped together.

如果列的正确顺序很重要,则有两种方法可以实现。第一种方法是使用 setcolorder()适当地重新排列列:

If the proper order of columns is important, there are two ways to achieve this. The first approach is to reorder the columns appropriately using setcolorder():

new_ord <- CJ(world$Year, c("Growth","Unemployment","Population"), 
              sorted = FALSE, unique = TRUE)[, paste(V2, V1, sep = "_")]
setcolorder(data_wide, c("Country", new_ord))

data_wide
#   Country Growth_2015 Unemployment_2015 Population_2015 Growth_2016 Unemployment_2016 Population_2016 Growth_2017
#1:       A         2.0               8.3              40         4.0               8.1            42.0         4.5
#2:       B         3.0               9.2              32         3.5               9.0            32.5         4.4
#3:       C         2.5               9.1              30         3.7               9.0            31.0         4.3
#4:       D         1.5               6.1              27         3.1               5.3            29.0         4.2
#   Unemployment_2017 Population_2017
#1:               8.1            42.5
#2:               8.4            33.0
#3:               8.5            30.0
#4:               5.2            30.0

请注意交叉连接函数 CJ()用于创建向量的叉积。

Note the the cross join function CJ() is used to create the cross product of the vectors.

实现所需列顺序的另一种方法是熔化并重铸

The other approach to achieve the desired column order is to melt and recast:

molten <- melt(world, id.vars = c("Country", "Year"))
dcast(molten, Country ~ Year + variable)
#   Country 2015_Growth 2015_Unemployment 2015_Population 2016_Growth 2016_Unemployment 2016_Population 2017_Growth
#1:       A         2.0               8.3              40         4.0               8.1            42.0         4.5
#2:       B         3.0               9.2              32         3.5               9.0            32.5         4.4
#3:       C         2.5               9.1              30         3.7               9.0            31.0         4.3
#4:       D         1.5               6.1              27         3.1               5.3            29.0         4.2
#   2017_Unemployment 2017_Population
#1:               8.1            42.5
#2:               8.4            33.0
#3:               8.5            30.0
#4:               5.2            30.0

这篇关于将数据从长格式更改为宽格式-多个变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆