将数据从宽转换为长(使用多列) [英] Converting data from wide to long (using multiple columns)

查看:42
本文介绍了将数据从宽转换为长(使用多列)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前拥有与此类似的广泛数据:

I currently have wide data which looks similar to this:

cid dyad f1 f2 op1 op2 ed1 ed2 junk 
1   2    0  0  2   4   5   7   0.876
1   5    0  1  2   4   4   3   0.765

我希望进入一个类似于此的长数据框:

And I wish into a long data frame which looks similar to this:

cid dyad f op ed junk  id
1   2    0 2  5  0.876 1
1   2    0 4  7  0.876 2
1   5    0 2  4  0.765 1
1   5    1 4  3  0.765 2 

我尝试过使用 gather() 函数以及 reshape() 函数,但无法弄清楚如何创建多列而不是将所有列折叠成长样式

I have tried using the gather() function as well as the reshape() function but cannot figure out how to create multiple columns instead of collapsing all of the columns into a long style

感谢所有帮助

推荐答案

您可以使用基本的 reshape() 函数来(大致)同时融合多组变量,通过使用 variing 参数并将 direction 设置为 "long".

You can use the base reshape() function to (roughly) simultaneously melt over multiple sets of variables, by using the varying parameter and setting direction to "long".

例如,您在此处为 variing 参数提供了三个变量名称集合"(向量)的列表:

For example here, you are supplying a list of three "sets" (vectors) of variable names to the varying argument:

dat <- read.table(text="
cid dyad f1 f2 op1 op2 ed1 ed2 junk 
1   2    0  0  2   4   5   7   0.876
1   5    0  1  2   4   4   3   0.765
", header=TRUE)

reshape(dat, direction="long", 
        varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")), 
        v.names=c("f","op","ed"))

你会得到这样的结果:

    cid dyad  junk time f op ed id
1.1   1    2 0.876    1 0  2  5  1
2.1   1    5 0.765    1 0  2  4  2
1.2   1    2 0.876    2 0  4  7  1
2.2   1    5 0.765    2 1  4  3  2

请注意,除了三个集合被折叠之外,还创建了两个变量:一个 $id 变量——它跟踪原始表中的行号 (dat) 和一个 $time 变量——它对应于折叠的原始变量的顺序.现在还有嵌套的行号——1.1, 2.1, 1.2, 2.2,这里只是$id$time的值分别在那一行.

Notice that two variables get created, in addition to the three sets getting collapsed: an $id variable -- which tracks the row number in the original table (dat), and a $time variable -- which corresponds to the order of the original variables that were collapsed. There are also now nested row numbers -- 1.1, 2.1, 1.2, 2.2, which here are just the values of $id and $time at that row, respectively.

在不确切知道您要跟踪的内容的情况下,很难说 $id$time 是否是您想要用作行标识符的内容,但它们都在.

Without knowing exactly what you're trying to track, hard to say whether $id or $time is what you want to use as the row identifier, but they're both there.

使用参数 timevaridvar 可能也很有用(您可以将 timevar 设置为 NULL,例如).

Might also be useful to play with the parameters timevar and idvar (you can set timevar to NULL, for example).

reshape(dat, direction="long", 
        varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")), 
        v.names=c("f","op","ed"), 
        timevar="id1", idvar="id2")

这篇关于将数据从宽转换为长(使用多列)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆