将数据框中的多列列从宽变长 [英] Reshaping multiple groups of columns in a data frame from wide to long
问题描述
我正在处理空气质量数据。我试图使用 melt
函数将数据框从宽到长整形。这里是数据: Elev
代表海拔
, Obs
用于观测
和 US3,DK1,DE1
的模型,其中 lm
和 ul
代表第一和第三分位数。
I am working with air-quality data. I tried to reshape the data frame from wide to long using melt
function. Here is the data: Elev
stands for Elevation
, Obs
for observation
and US3, DK1, DE1
are models, where lm
and ul
represents first and third quantiles.
Elev Obs lm ul US3 lm ul DK1 lm ul
1 0 37.74289 34.33422 41.27840 38.82037 35.35241 42.30042 49.31111 45.00134 53.90968
2 100 38.14076 34.71842 41.36560 39.82727 36.49086 43.22209 50.46545 45.79068 55.44664
3 250 39.31056 35.98180 42.50011 40.94909 37.70768 44.40232 50.79818 45.76405 55.54795
4 500 41.03098 37.78005 44.02544 42.54909 39.25627 45.72927 51.24182 46.76091 55.88568
5 750 43.57307 40.52575 46.92804 43.48000 40.55918 46.62914 51.90364 47.40586 56.37514
DE1 lm ul
1 41.15185 37.81824 44.62509
2 40.89455 37.38491 44.34759
3 40.93455 37.33400 44.32573
4 41.26727 37.90150 44.68568
5 43.04545 40.04541 46.12386
我用过
melt(f,id.vars=c("Elev", "lm","um"),measure.vars=c("US3", "DK1", "DE1","Obs" ))
我得到了
Elev lm ul variable value
0 34.33422 41.27840 US3 38.82037
100 34.71842 41.36560 US3 39.82727
250 35.98180 42.50011 US3 40.94909
500 37.78005 44.02544 US3 42.54909
750 40.52575 46.92804 US3 43.48000
0 34.33422 41.27840 DK1 49.31111
100 34.71842 41.36560 DK1 50.46545
250 35.98180 42.50011 DK1 50.79818
500 37.78005 44.02544 DK1 51.24182
750 40.52575 46.92804 DK1 51.90364
0 34.33422 41.27840 DE1 41.15185
100 34.71842 41.36560 DE1 40.89455
250 35.98180 42.50011 DE1 40.93455
500 37.78005 44.02544 DE1 41.26727
750 40.52575 46.92804 DE1 43.04545
0 34.33422 41.27840 Obs 37.74289
100 34.71842 41.36560 Obs 38.14076
250 35.98180 42.50011 Obs 39.31056
500 37.78005 44.02544 Obs 41.03098
750 40.52575 46.92804 Obs 43.57307
可以清楚地看到 lm的值对于每个海拔高度,都会重复
和 ul
。如何在不重复这些值的情况下使用长格式?
我的预期结果是:
As it can clearly be seen the values of lm
and ul
are repeated for every elevation. How can I have a long format without the repetition of those values?
My expected result is:
Elev lm ul variable value
0 35.35241 42.30042 US3 38.82037
100 36.49086 43.22209 US3 39.82727
250 37.70768 44.40232 US3 40.94909
500 39.25627 45.72927 US3 42.54909
750 40.55918 46.62914 US3 43.48000
0 45.00134 53.90968 DK1 49.31111
100 45.79068 55.44664 DK1 50.46545
250 45.76405 55.54795 DK1 50.79818
500 46.76091 55.88568 DK1 51.24182
750 47.40586 56.37514 DK1 51.90364
0 37.81824 44.62509 DE1 41.15185
100 37.38491 44.34759 DE1 40.89455
250 37.33400 44.32573 DE1 40.93455
500 37.90150 44.68568 DE1 41.26727
750 40.04541 46.12386 DE1 43.04545
0 34.33422 41.27840 Obs 37.74289
100 34.71842 41.36560 Obs 38.14076
250 35.98180 42.50011 Obs 39.31056
500 37.78005 44.02544 Obs 41.03098
750 40.52575 46.92804 Obs 43.57307
推荐答案
data.table的最新版本
允许 同时熔化
多列
另一个困难是数据框包含具有相同名称的列。借助 patterns()
函数,不需要 预先重命名列。
An additional difficulty is that the data frame contains columns with the same name. Thanks to the patterns()
function, it is not required to rename the columns beforehand.
library(data.table) # version 1.10.4 used here
# create vector of the names of data groups - in the order they appear in the DF !
dg_names <- c("Obs", "US3", "DK1", "DE1")
# coerce DF to data.table and melt using the patterns() function to identify columns
molten <- melt(setDT(DF),
measure.vars = patterns(paste(dg_names, collapse = "|"), "lm", "ul"),
value.name = c("value", "lm", "ul"))
# rename variable column to something meaningful
molten[, variable := factor(variable, labels = dg_names)]
尽管列和行的顺序不同,但结果与OP预期的一样:
Despite the different order of columns and rows, the result is as expected by the OP:
molten
# Elev variable value lm ul
# 1: 0 Obs 37.74289 34.33422 41.27840
# 2: 100 Obs 38.14076 34.71842 41.36560
# 3: 250 Obs 39.31056 35.98180 42.50011
# 4: 500 Obs 41.03098 37.78005 44.02544
# 5: 750 Obs 43.57307 40.52575 46.92804
# 6: 0 US3 38.82037 35.35241 42.30042
# 7: 100 US3 39.82727 36.49086 43.22209
# 8: 250 US3 40.94909 37.70768 44.40232
# 9: 500 US3 42.54909 39.25627 45.72927
#10: 750 US3 43.48000 40.55918 46.62914
#11: 0 DK1 49.31111 45.00134 53.90968
#12: 100 DK1 50.46545 45.79068 55.44664
#13: 250 DK1 50.79818 45.76405 55.54795
#14: 500 DK1 51.24182 46.76091 55.88568
#15: 750 DK1 51.90364 47.40586 56.37514
#16: 0 DE1 41.15185 37.81824 44.62509
#17: 100 DE1 40.89455 37.38491 44.34759
#18: 250 DE1 40.93455 37.33400 44.32573
#19: 500 DE1 41.26727 37.90150 44.68568
#20: 750 DE1 43.04545 40.04541 46.12386
数据
Data
DF <- structure(list(Elev = c(0L, 100L, 250L, 500L, 750L), Obs = c(37.74289,
38.14076, 39.31056, 41.03098, 43.57307), lm = c(34.33422, 34.71842,
35.9818, 37.78005, 40.52575), ul = c(41.2784, 41.3656, 42.50011,
44.02544, 46.92804), US3 = c(38.82037, 39.82727, 40.94909, 42.54909,
43.48), lm = c(35.35241, 36.49086, 37.70768, 39.25627, 40.55918
), ul = c(42.30042, 43.22209, 44.40232, 45.72927, 46.62914),
DK1 = c(49.31111, 50.46545, 50.79818, 51.24182, 51.90364),
lm = c(45.00134, 45.79068, 45.76405, 46.76091, 47.40586),
ul = c(53.90968, 55.44664, 55.54795, 55.88568, 56.37514),
DE1 = c(41.15185, 40.89455, 40.93455, 41.26727, 43.04545),
lm = c(37.81824, 37.38491, 37.334, 37.9015, 40.04541), ul = c(44.62509,
44.34759, 44.32573, 44.68568, 46.12386)), .Names = c("Elev",
"Obs", "lm", "ul", "US3", "lm", "ul", "DK1", "lm", "ul", "DE1",
"lm", "ul"), row.names = c(NA, -5L), class = "data.frame")
这篇关于将数据框中的多列列从宽变长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!