在 R 中将多列堆叠/熔化为多列 [英] stacking/melting multiple columns into multiple columns in R

查看:34
本文介绍了在 R 中将多列堆叠/熔化为多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将数据帧的多个特定列融合/堆叠/收集到 2 列中,并保留所有其他列.我在stackoverflow上尝试了很多很多答案都没有成功(下面有一些).我基本上有一个类似于这篇文章的情况:重塑多组测量列(宽格式)成单列(长格式)只有更多的列要保留和组合.重要的是要提到我的年份列是因素,而且我的列比下面列出的示例多很多,所以我想称列名称而不是位置.

I am trying to melt/stack/gather multiple specific columns of a dataframe into 2 columns, retaining all the others. I have tried many, many answers on stackoverflow without success (some below). I basically have a situation similar to this post here: Reshaping multiple sets of measurement columns (wide format) into single columns (long format) only many more columns to retain and combine. It is important to mention my year columns are factors and I have many, many more columns than the sample listed below so I want to call column names not positions.

>df
ID Code Country     year.x   value.x  year.y value.y year.x.x value.x.x              
1  A    USA         2000     34.33422 2001 35.35241  2002   42.30042 
1  A    Spain       2000     34.71842 2001 39.82727  2002   43.22209 
3  B    USA         2000     35.98180 2001 37.70768  2002   44.40232 
3  B    Peru        2000     33.00000 2001 37.66468  2002   41.30232 
4  C    Argentina   2000     37.78005 2001 39.25627  2002   45.72927 
4  C    Peru        2000     40.52575 2001 40.55918  2002   46.62914

我根据上面的帖子尝试在 tidyr 中使用 pivot_longer ,这看起来非常相似,这导致了各种错误,具体取决于我所做的:

I tried using the pivot_longer in tidyr based on the post above which seemed very similar, which resulted in various errors depending on what I did:

pivot_longer(df, 
             cols = -c(ID, Code, Country), 
             names_to = c(".value", "group"),
             names_sep = ".")

我还以各种方式在 reshape2 中使用了melt,这些方式要么只熔化了值列,要么只熔化了年份列.如:

I also played with melt in reshape2 in various ways which either melted only the values columns or only the years columns. Such as:

new.df <- reshape2:::melt(df, id.var = c("ID", "Code", "Country"), measure.vars=c("value.x", "value.y", "value.x.x", "value.y.y", "value.x.x.x", "value.y.y.y"), value.name = "value", variable.vars=c('year.x','year.y', "year.x.x", "year.y.y", "year.x.x.x", "year.y.y.y", "value.x", variable.name = "year")

我也尝试过基于其他帖子的 dplyr gather,但我发现很难理解帮助页面和帖子.明确我希望实现的目标:

I also tried dplyr gather based on other posts but I find it extremely difficult to understand the help page and posts. To be clear what I am looking to achieve:

ID Code Country  year   value                
1  A    USA      2000   34.33422  
1  A    Spain    2000   34.71842  
3  B    USA      2000   35.98180  
3  B    Peru     2000   33.00000  
4  C    Argentina2000   37.78005  
4  C    Peru     2000   40.52575 
1  A    USA      2001   35.35241  
1  A    Spain    2001   39.82727  
3  B    USA      2001   37.70768  
3  B    Peru     2001   37.66468  
4  C    Argentina2001   39.25627  
4  C    Peru     2001   40.55918 
1  A    USA      2002   42.30042  
etc.

我非常感谢这里的帮助.

I really appreciate the help here.

推荐答案

我们可以指定names_pattern

library(tidyr)
library(dplyr)
df %>%  
   pivot_longer(cols = -c(ID, Code, Country),
       names_to = c(".value", "group"),names_pattern = "(.*)\\.(.*)")

<小时>

或者根据 ?pivot_longer

names_sep - names_sep 采用与separate() 相同的规范,可以是数字向量(指定要中断的位置),也可以是单个字符串(指定要拆分的正则表达式).

names_sep - names_sep takes the same specification as separate(), and can either be a numeric vector (specifying positions to break on), or a single string (specifying a regular expression to split on).

这意味着默认情况下正则表达式是 on 并且正则表达式中的 . 匹配任何字符而不是文字点.要获取文字值,请转义或将其放在方括号内

which implies that by default the regex is on and the . in regex matches any character and not the literal dot. To get the literal value, either escape or place it inside square bracket

pivot_longer(df, 
         cols = -c(ID, Code, Country), 
          names_to = c(".value", "group"),
          names_sep = "\\.")
# A tibble: 18 x 6
#      ID Code  Country   group  year value
#   <int> <chr> <chr>     <chr> <int> <dbl>
# 1     1 A     USA       x      2000  34.3
# 2     1 A     USA       y      2001  35.4
# 3     1 A     USA       z      2002  42.3
# 4     1 A     Spain     x      2000  34.7
# 5     1 A     Spain     y      2001  39.8
# 6     1 A     Spain     z      2002  43.2
# 7     3 B     USA       x      2000  36.0
# 8     3 B     USA       y      2001  37.7
# 9     3 B     USA       z      2002  44.4
#10     3 B     Peru      x      2000  33  
#11     3 B     Peru      y      2001  37.7
#12     3 B     Peru      z      2002  41.3
#13     4 C     Argentina x      2000  37.8
#14     4 C     Argentina y      2001  39.3
#15     4 C     Argentina z      2002  45.7
#16     4 C     Peru      x      2000  40.5
#17     4 C     Peru      y      2001  40.6
#18     4 C     Peru      z      2002  46.6

更新

对于更新后的数据集

Update

For the updated dataset

library(stringr)
df2 %>% 
   rename_at(vars(matches("year|value")), ~ 
     str_replace(., "^([^.]+\\.[^.]+)\\.([^.]+)$", "\\1\\2")) %>% 
     pivot_longer(cols = -c(ID, Code, Country),
        names_to = c(".value", "group"),names_pattern = "(.*)\\.(.*)")

或者不使用 rename,使用正则表达式查找

Or without the rename, use regex lookaround

df2 %>%
   pivot_longer(cols = -c(ID, Code, Country), 
       names_to = c(".value", "group"),
           names_sep = "(?<=year|value)\\.")

数据

df <- structure(list(ID = c(1L, 1L, 3L, 3L, 4L, 4L), Code = c("A", 
"A", "B", "B", "C", "C"), Country = c("USA", "Spain", "USA", 
"Peru", "Argentina", "Peru"), year.x = c(2000L, 2000L, 2000L, 
2000L, 2000L, 2000L), value.x = c(34.33422, 34.71842, 35.9818, 
33, 37.78005, 40.52575), year.y = c(2001L, 2001L, 2001L, 2001L, 
2001L, 2001L), value.y = c(35.35241, 39.82727, 37.70768, 37.66468, 
39.25627, 40.55918), year.z = c(2002L, 2002L, 2002L, 2002L, 2002L, 
2002L), value.z = c(42.30042, 43.22209, 44.40232, 41.30232, 45.72927, 
46.62914)), class = "data.frame", row.names = c(NA, -6L))



df2 <- structure(list(ID = c(1L, 1L, 3L, 3L, 4L, 4L), Code = c("A", 
"A", "B", "B", "C", "C"), Country = c("USA", "Spain", "USA", 
"Peru", "Argentina", "Peru"), year.x = c(2000L, 2000L, 2000L, 
2000L, 2000L, 2000L), value.x = c(34.33422, 34.71842, 35.9818, 
33, 37.78005, 40.52575), year.y = c(2001L, 2001L, 2001L, 2001L, 
2001L, 2001L), value.y = c(35.35241, 39.82727, 37.70768, 37.66468, 
39.25627, 40.55918), year.x.x = c(2002L, 2002L, 2002L, 2002L, 
2002L, 2002L), value.x.x = c(42.30042, 43.22209, 44.40232, 41.30232, 
45.72927, 46.62914)), class = "data.frame", row.names = c(NA, 
-6L))

这篇关于在 R 中将多列堆叠/熔化为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆