具有多个变量和 ID R 的从宽到长的数据框 [英] Data frame from wide to long with multiple variables and ids R
问题描述
我有一个数据框,其中包含参与者对两个文本的判断.假设每个文本都有一个正确答案和一个标识符,并且每个文本都被判断多次.
I have a dataframe with participants' judgments for two texts. Suppose each text has a correct answer and an identifier, and each text is judged multple times.
set.seed(123)
wide_df = data.frame('participant_id' = LETTERS[1:12]
, 'judgment_1' = round(rnorm(12)*100)
, 'correct_1' = round(rnorm(12)*100)
, 'text_id_1' = sample(1:12, 12, replace = F)
, 'judgment_2' = round(rnorm(12)*100)
, 'correct_2' = round(rnorm(12)*100)
, 'text_id_2' = sample(13:24, 12, replace = F)
)
所以:
participant_id judgment_1 correct_1 text_id_1 judgment_2 correct_2 text_id_2
1 A -56 40 4 43 -127 17
2 B -23 11 10 -30 217 14
3 C 156 -56 1 90 121 22
4 D 7 179 12 88 -112 15
5 E 13 50 7 82 -40 13
...
我想将其转换为带有列的长格式:
I would want to convert this to the long format with the columns:
participant_id text_id judgment correct
A 4 -56 40
A 17 43 127
...
我在此处找到并遵循了 SO 建议:
I found and followed the SO advice here:
wide_df %>%
gather(v, value, judgment_1:text_id_2) %>%
separate(v, c("var", "col")) %>%
arrange(participant_id) %>%
spread(col, value)
但是这种重塑方式会返回错误 Error: Duplicate identifiers for rows (3, 6), (9, 12)
But that way of reshaping returns the error Error: Duplicate identifiers for rows (3, 6), (9, 12)
我认为我在概念上做错了但无法完全找到它.我的错误在哪里?谢谢!
I think I do something conceptually wrong but can't quite find it. Where is my mistake? Thanks!
推荐答案
这里已经有了答案:https://stackoverflow.com/a/12466668/2371031
例如,
set.seed(123)
wide_df = data.frame('participant_id' = LETTERS[1:12]
, 'judgment_1' = round(rnorm(12)*100)
, 'correct_1' = round(rnorm(12)*100)
, 'text_id_1' = sample(1:12, 12, replace = F)
, 'judgment_2' = round(rnorm(12)*100)
, 'correct_2' = round(rnorm(12)*100)
, 'text_id_2' = sample(13:24, 12, replace = F)
)
dl <- reshape(data = wide_df,
idvar = "participant_id",
varying = list(judgment=c(2,5),correct=c(3,6),text_id=c(4,7)),
direction="long",
v.names = c("judgment","correct","text_id"),
sep="_")
结果:
participant_id time judgment correct text_id
A.1 A 1 -56 40 4
B.1 B 1 -23 11 10
C.1 C 1 156 -56 1
D.1 D 1 7 179 12
E.1 E 1 13 50 7
F.1 F 1 172 -197 11
G.1 G 1 46 70 9
H.1 H 1 -127 -47 2
I.1 I 1 -69 -107 8
J.1 J 1 -45 -22 3
K.1 K 1 122 -103 5
L.1 L 1 36 -73 6
A.2 A 2 43 -127 17
B.2 B 2 -30 217 14
C.2 C 2 90 121 22
D.2 D 2 88 -112 15
E.2 E 2 82 -40 13
F.2 F 2 69 -47 19
G.2 G 2 55 78 24
H.2 H 2 -6 -8 20
I.2 I 2 -31 25 21
J.2 J 2 -38 -3 16
K.2 K 2 -69 -4 23
L.2 L 2 -21 137 18
这篇关于具有多个变量和 ID R 的从宽到长的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!