将数据框中的多列列从宽变长 [英] Reshaping multiple groups of columns in a data frame from wide to long

查看:97
本文介绍了将数据框中的多列列从宽变长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理空气质量数据。我试图使用 melt 函数将数据框从宽到长整形。这里是数据: Elev 代表海拔 Obs 用于观测 US3,DK1,DE1 的模型,其中 lm ul 代表第一和第三分位数。

I am working with air-quality data. I tried to reshape the data frame from wide to long using melt function. Here is the data: Elev stands for Elevation, Obs for observation and US3, DK1, DE1 are models, where lm and ul represents first and third quantiles.

Elev      Obs       lm       ul      US3       lm       ul      DK1       lm       ul
1    0 37.74289 34.33422 41.27840 38.82037 35.35241 42.30042 49.31111 45.00134 53.90968
2  100 38.14076 34.71842 41.36560 39.82727 36.49086 43.22209 50.46545 45.79068 55.44664
3  250 39.31056 35.98180 42.50011 40.94909 37.70768 44.40232 50.79818 45.76405 55.54795
4  500 41.03098 37.78005 44.02544 42.54909 39.25627 45.72927 51.24182 46.76091 55.88568
5  750 43.57307 40.52575 46.92804 43.48000 40.55918 46.62914 51.90364 47.40586 56.37514
       DE1       lm       ul
1 41.15185 37.81824 44.62509
2 40.89455 37.38491 44.34759
3 40.93455 37.33400 44.32573
4 41.26727 37.90150 44.68568
5 43.04545 40.04541 46.12386

我用过

 melt(f,id.vars=c("Elev", "lm","um"),measure.vars=c("US3", "DK1", "DE1","Obs" ))

我得到了

Elev       lm       ul      variable    value
   0 34.33422 41.27840           US3 38.82037
 100 34.71842 41.36560           US3 39.82727
 250 35.98180 42.50011           US3 40.94909
 500 37.78005 44.02544           US3 42.54909
 750 40.52575 46.92804           US3 43.48000
   0 34.33422 41.27840           DK1 49.31111
 100 34.71842 41.36560           DK1 50.46545
 250 35.98180 42.50011           DK1 50.79818
 500 37.78005 44.02544           DK1 51.24182
 750 40.52575 46.92804           DK1 51.90364
   0 34.33422 41.27840           DE1 41.15185
 100 34.71842 41.36560           DE1 40.89455
 250 35.98180 42.50011           DE1 40.93455
 500 37.78005 44.02544           DE1 41.26727
 750 40.52575 46.92804           DE1 43.04545
   0 34.33422 41.27840           Obs 37.74289
 100 34.71842 41.36560           Obs 38.14076
 250 35.98180 42.50011           Obs 39.31056
 500 37.78005 44.02544           Obs 41.03098
 750 40.52575 46.92804           Obs 43.57307

可以清楚地看到 lm的值对于每个海拔高度,都会重复 ul 。如何在不重复这些值的情况下使用长格式?
我的预期结果是:

As it can clearly be seen the values of lm and ul are repeated for every elevation. How can I have a long format without the repetition of those values? My expected result is:

Elev    lm      ul      variable  value
  0 35.35241 42.30042      US3 38.82037
100 36.49086 43.22209      US3 39.82727
250 37.70768 44.40232      US3 40.94909
500 39.25627 45.72927      US3 42.54909
750 40.55918 46.62914      US3 43.48000
  0 45.00134 53.90968      DK1 49.31111
100 45.79068 55.44664      DK1 50.46545
250 45.76405 55.54795      DK1 50.79818
500 46.76091 55.88568      DK1 51.24182
750 47.40586 56.37514      DK1 51.90364
  0 37.81824 44.62509      DE1 41.15185
100 37.38491 44.34759      DE1 40.89455
250 37.33400 44.32573      DE1 40.93455
500 37.90150 44.68568      DE1 41.26727
750 40.04541 46.12386      DE1 43.04545
  0 34.33422 41.27840      Obs 37.74289
100 34.71842 41.36560      Obs 38.14076
250 35.98180 42.50011      Obs 39.31056
500 37.78005 44.02544      Obs 41.03098
750 40.52575 46.92804      Obs 43.57307


推荐答案

data.table的最新版本允许 同时熔化多列

另一个困难是数据框包含具有相同名称的。借助 patterns()函数,不需要 预先重命名列。

An additional difficulty is that the data frame contains columns with the same name. Thanks to the patterns() function, it is not required to rename the columns beforehand.

library(data.table) # version 1.10.4 used here

# create vector of the names of data groups - in the order they appear in the DF !
dg_names <- c("Obs", "US3", "DK1", "DE1")

# coerce DF to data.table and melt using the patterns() function to identify columns
molten <- melt(setDT(DF), 
               measure.vars = patterns(paste(dg_names, collapse = "|"), "lm", "ul"), 
               value.name = c("value", "lm", "ul"))

# rename variable column to something meaningful
molten[, variable := factor(variable, labels = dg_names)]

尽管列和行的顺序不同,但结果与OP预期的一样:

Despite the different order of columns and rows, the result is as expected by the OP:

molten
#    Elev variable    value       lm       ul
# 1:    0      Obs 37.74289 34.33422 41.27840
# 2:  100      Obs 38.14076 34.71842 41.36560
# 3:  250      Obs 39.31056 35.98180 42.50011
# 4:  500      Obs 41.03098 37.78005 44.02544
# 5:  750      Obs 43.57307 40.52575 46.92804
# 6:    0      US3 38.82037 35.35241 42.30042
# 7:  100      US3 39.82727 36.49086 43.22209
# 8:  250      US3 40.94909 37.70768 44.40232
# 9:  500      US3 42.54909 39.25627 45.72927
#10:  750      US3 43.48000 40.55918 46.62914
#11:    0      DK1 49.31111 45.00134 53.90968
#12:  100      DK1 50.46545 45.79068 55.44664
#13:  250      DK1 50.79818 45.76405 55.54795
#14:  500      DK1 51.24182 46.76091 55.88568
#15:  750      DK1 51.90364 47.40586 56.37514
#16:    0      DE1 41.15185 37.81824 44.62509
#17:  100      DE1 40.89455 37.38491 44.34759
#18:  250      DE1 40.93455 37.33400 44.32573
#19:  500      DE1 41.26727 37.90150 44.68568
#20:  750      DE1 43.04545 40.04541 46.12386



数据



Data

DF <- structure(list(Elev = c(0L, 100L, 250L, 500L, 750L), Obs = c(37.74289, 
38.14076, 39.31056, 41.03098, 43.57307), lm = c(34.33422, 34.71842, 
35.9818, 37.78005, 40.52575), ul = c(41.2784, 41.3656, 42.50011, 
44.02544, 46.92804), US3 = c(38.82037, 39.82727, 40.94909, 42.54909, 
43.48), lm = c(35.35241, 36.49086, 37.70768, 39.25627, 40.55918
), ul = c(42.30042, 43.22209, 44.40232, 45.72927, 46.62914), 
    DK1 = c(49.31111, 50.46545, 50.79818, 51.24182, 51.90364), 
    lm = c(45.00134, 45.79068, 45.76405, 46.76091, 47.40586), 
    ul = c(53.90968, 55.44664, 55.54795, 55.88568, 56.37514), 
    DE1 = c(41.15185, 40.89455, 40.93455, 41.26727, 43.04545), 
    lm = c(37.81824, 37.38491, 37.334, 37.9015, 40.04541), ul = c(44.62509, 
    44.34759, 44.32573, 44.68568, 46.12386)), .Names = c("Elev", 
"Obs", "lm", "ul", "US3", "lm", "ul", "DK1", "lm", "ul", "DE1", 
"lm", "ul"), row.names = c(NA, -5L), class = "data.frame")

这篇关于将数据框中的多列列从宽变长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆