多列传播 [英] Multiple column spread

查看:83
本文介绍了多列传播的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要做的实际上是tidyr::spread()的工作,但是对于多个值列.

I have a need to do what is really what tidyr::spread() does, but for multiple value columns.

如果我有这样的数据集:

If I have a data set like this:

te <- structure(list(Syllable = c("[pa]", "[ta]", "[ka]", "[pa]", "[ta]", 
"[ka]", "[pa]", "[ta]", "[ka]", "[pa]"), PA = c(15.9252335141423, 
2.17504491982172, 5.26727958979289, 4.48590068583509, 2.1316282072803e-13, 
14.1415335887116, 3.51720477328246, 0.839953301362556, 5.74712643678048, 
7.01396701583887), transient_mean = c(4.43699436235785, 4.8733556527069, 
5.52844792982797, 3.63255704032305, 4.99835680315547, 5.5387775503751, 
3.19517346916471, 4.40360523945946, 4.14203491258186, 3.51900453101706
), transient_sd = c(0.871280094068596, 1.51392328075964, 2.65764846931951, 
1.25416942799974, 1.13391173514884, 1.75904804912773, 1.54594113209317, 
1.69526308849507, 1.73693971862859, 1.31626295142865)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))

看起来像这样(对于那些刚读过这篇文章的人来说):

which looks like this (for those of you just reading this):

> te
# A tibble: 10 x 4
   Syllable       PA transient_mean transient_sd
   <chr>       <dbl>          <dbl>        <dbl>
 1 [pa]     1.59e+ 1           4.44        0.871
 2 [ta]     2.18e+ 0           4.87        1.51 
 3 [ka]     5.27e+ 0           5.53        2.66 
 4 [pa]     4.49e+ 0           3.63        1.25 
 5 [ta]     2.13e-13           5.00        1.13 
 6 [ka]     1.41e+ 1           5.54        1.76 
 7 [pa]     3.52e+ 0           3.20        1.55 
 8 [ta]     8.40e- 1           4.40        1.70 
 9 [ka]     5.75e+ 0           4.14        1.74 
10 [pa]     7.01e+ 0           3.52        1.32

我想从Syllable列值的值中创建新列,以便使用列名"[pa] PA","[pa] transient_mean"获得更宽泛的标题,"[pa__transient_sd",[ta_PA," [ta_transient_mean,....等.

I would like to make new columns from the value of the Syllable column values so that I get a wider tibble with column names "[pa]PA","[pa] transient_mean","[pa]_ transient_sd",[ta]_PA","[ta]_transient_mean", .... and so on.

我当然尝试过:

> te %>%
+   spread(Syllable,PA:transient_sd)
Error: `var` must evaluate to a single number or a column name, not an integer vector
Call `rlang::last_error()` to see a backtrace

但是后来我投诉了,大概是因为我选择了多列.

but I get a complaint then, presumably due to me selecting multiple columns.

关于如何处理数据的任何构想?

Any ideas on how this data wrangling can be achieved?

推荐答案

可能您的数据缺少时间变量,该变量对"[pa]", "[ta]", "[ka]"的不同观察值进行计数.您可以使用ave修复此问题.

Probably your data is lacking a time variable that counts different observations of "[pa]", "[ta]", "[ka]". You could fix this with ave.

te$time <- with(te, ave(as.character(Syllable), Syllable, FUN=seq_along))
te
# # A tibble: 10 x 5
#    Syllable       PA transient_mean transient_sd time 
#    <chr>       <dbl>          <dbl>        <dbl> <chr>
#  1 [pa]     1.59e+ 1           4.44        0.871 1    
#  2 [ta]     2.18e+ 0           4.87        1.51  1    
#  3 [ka]     5.27e+ 0           5.53        2.66  1    
#  4 [pa]     4.49e+ 0           3.63        1.25  2    
#  5 [ta]     2.13e-13           5.00        1.13  2    
#  6 [ka]     1.41e+ 1           5.54        1.76  2    
#  7 [pa]     3.52e+ 0           3.20        1.55  3    
#  8 [ta]     8.40e- 1           4.40        1.70  3    
#  9 [ka]     5.75e+ 0           4.14        1.74  3    
# 10 [pa]     7.01e+ 0           3.52        1.32  4 

之后,您可以使用基数R的reshape.

After that you could use reshape of base R.

reshape(as.data.frame(te), timevar="Syllable", idvar="time", direction="wide")
#    time   PA.[pa] transient_mean.[pa] transient_sd.[pa]      PA.[ta]
# 1     1 15.925234            4.436994         0.8712801 2.175045e+00
# 4     2  4.485901            3.632557         1.2541694 2.131628e-13
# 7     3  3.517205            3.195173         1.5459411 8.399533e-01
# 10    4  7.013967            3.519005         1.3162630           NA
#    transient_mean.[ta] transient_sd.[ta]   PA.[ka] transient_mean.[ka]
# 1             4.873356          1.513923  5.267280            5.528448
# 4             4.998357          1.133912 14.141534            5.538778
# 7             4.403605          1.695263  5.747126            4.142035
# 10                  NA                NA        NA                  NA
#    transient_sd.[ka]
# 1           2.657648
# 4           1.759048
# 7           1.736940
# 10                NA

这篇关于多列传播的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆