整齐的数据框架,重复列名 [英] Tidy data.frame with repeated column names
问题描述
我有一个程序给我这种格式的数据
I have a program that gives me data in this format
toy
file_path Condition Trial.Num A B C ID A B C ID A B C ID
1 root/some.extension Baseline 1 2 3 5 car 2 1 7 bike 4 9 0 plane
2 root/thing.extension Baseline 2 3 6 45 car 5 4 4 bike 9 5 4 plane
3 root/else.extension Baseline 3 4 4 6 car 7 5 4 bike 68 7 56 plane
4 root/uniquely.extension Treatment 1 5 3 7 car 1 7 37 bike 9 8 7 plane
5 root/defined.extension Treatment 2 6 7 3 car 4 6 8 bike 9 0 8 plane
我的目标是将格式整理成至少可以更轻松地整理具有独特列名称的重新格式的东西
My goal is to tidy the format into something that at least can be easier to finally tidy with reshape having unique column names
tidy_toy
file_path Condition Trial.Num A B C ID
1 root/some.extension Baseline 1 2 3 5 car
2 root/thing.extension Baseline 2 3 6 45 car
3 root/else.extension Baseline 3 4 4 6 car
4 root/uniquely.extension Treatment 1 5 3 7 car
5 root/defined.extension Treatment 2 6 7 3 car
6 root/some.extension Baseline 1 2 1 7 bike
7 root/thing.extension Baseline 2 5 4 4 bike
8 root/else.extension Baseline 3 7 5 4 bike
9 root/uniquely.extension Treatment 1 1 7 37 bike
10 root/defined.extension Treatment 2 4 6 8 bike
11 root/some.extension Baseline 1 4 9 0 plane
12 root/thing.extension Baseline 2 9 5 4 plane
13 root/else.extension Baseline 3 68 7 56 plane
14 root/uniquely.extension Treatment 1 9 8 7 plane
15 root/defined.extension Treatment 2 9 0 8 plane
如果我尝试 melt
from toy
它不工作,因为onl您的第一个ID列将用于 id.vars
(因此所有内容都将被标记为汽车)。相同的变量将被删除。
If I try to melt
from toy
it doesn't work because only the first ID column will get used for id.vars
(hence everything will get tagged as cars). Identical variables will get dropped.
这是两个表的dput
Here's the dput of both tables
structure(list(file_path = structure(c(3L, 4L, 2L, 5L, 1L), .Label = c("root/defined.extension",
"root/else.extension", "root/some.extension", "root/thing.extension",
"root/uniquely.extension"), class = "factor"), Condition = structure(c(1L,
1L, 1L, 2L, 2L), .Label = c("Baseline", "Treatment"), class = "factor"),
Trial.Num = c(1L, 2L, 3L, 1L, 2L), A = 2:6, B = c(3L, 6L,
4L, 3L, 7L), C = c(5L, 45L, 6L, 7L, 3L), ID = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "car", class = "factor"), A = c(2L,
5L, 7L, 1L, 4L), B = c(1L, 4L, 5L, 7L, 6L), C = c(7L, 4L,
4L, 37L, 8L), ID = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "bike", class = "factor"),
A = c(4L, 9L, 68L, 9L, 9L), B = c(9L, 5L, 7L, 8L, 0L), C = c(0L,
4L, 56L, 7L, 8L), ID = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "plane", class = "factor")), .Names = c("file_path",
"Condition", "Trial.Num", "A", "B", "C", "ID", "A", "B", "C",
"ID", "A", "B", "C", "ID"), class = "data.frame", row.names = c(NA,
-5L))
structure(list(file_path = structure(c(3L, 4L, 2L, 5L, 1L, 3L,
4L, 2L, 5L, 1L, 3L, 4L, 2L, 5L, 1L), .Label = c("root/defined.extension",
"root/else.extension", "root/some.extension", "root/thing.extension",
"root/uniquely.extension"), class = "factor"), Condition = structure(c(1L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("Baseline",
"Treatment"), class = "factor"), Trial.Num = c(1L, 2L, 3L, 1L,
2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L), A = c(2L, 3L, 4L,
5L, 6L, 2L, 5L, 7L, 1L, 4L, 4L, 9L, 68L, 9L, 9L), B = c(3L, 6L,
4L, 3L, 7L, 1L, 4L, 5L, 7L, 6L, 9L, 5L, 7L, 8L, 0L), C = c(5L,
45L, 6L, 7L, 3L, 7L, 4L, 4L, 37L, 8L, 0L, 4L, 56L, 7L, 8L), ID = structure(c(2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L), .Label = c("bike",
"car", "plane"), class = "factor")), .Names = c("file_path",
"Condition", "Trial.Num", "A", "B", "C", "ID"), class = "data.frame", row.names = c(NA,
-15L))
推荐答案
您可以使用 make.unique
功能来创建唯一的列名。之后,您可以使用 data.table
-package中的 fusion
,这可以基于模式
在列名中:
You can use the make.unique
-function to create unique column names. After that you can use melt
from the data.table
-package which is able to create multiple value-columns based on patterns
in the columnnames:
# make the column names unique
names(toy) <- make.unique(names(toy))
# let the 'Condition' column start with a small letter 'c'
# so it won't be detected by the patterns argument from melt
names(toy)[2] <- tolower(names(toy)[2])
# load the 'data.table' package
library(data.table)
# tidy the data into long format
tidy_toy <- melt(setDT(toy),
measure.vars = patterns('^A','^B','^C','^ID'),
value.name = c('A','B','C','ID'))
其中:
> tidy_toy
file_path condition Trial.Num variable A B C ID
1: root/some.extension Baseline 1 1 2 3 5 car
2: root/thing.extension Baseline 2 1 3 6 45 car
3: root/else.extension Baseline 3 1 4 4 6 car
4: root/uniquely.extension Treatment 1 1 5 3 7 car
5: root/defined.extension Treatment 2 1 6 7 3 car
6: root/some.extension Baseline 1 2 2 1 7 bike
7: root/thing.extension Baseline 2 2 5 4 4 bike
8: root/else.extension Baseline 3 2 7 5 4 bike
9: root/uniquely.extension Treatment 1 2 1 7 37 bike
10: root/defined.extension Treatment 2 2 4 6 8 bike
11: root/some.extension Baseline 1 3 4 9 0 plane
12: root/thing.extension Baseline 2 3 9 5 4 plane
13: root/else.extension Baseline 3 3 68 7 56 plane
14: root/uniquely.extension Treatment 1 3 9 8 7 plane
15: root/defined.extension Treatment 2 3 9 0 8 plane
一种更复杂的方法,更好地区分为模式
参数:
# select the names that are not unique
idx <- which(names(toy) %in% names(tt)[tt > 1])
nms <- names(toy)[idx]
# make them unique
names(toy)[idx] <- paste(nms,
rep(seq(length(nms) / length(names(tt)[tt > 1])),
each = length(names(tt)[tt > 1])),
sep = '.')
# your columnnames are now unique:
> names(toy)
[1] "file_path" "Condition" "Trial.Num" "A.1" "B.1" "C.1" "ID.1" "A.2"
[9] "B.2" "C.2" "ID.2" "A.3" "B.3" "C.3" "ID.3"
# tidy the data into long format
tidy_toy <- melt(setDT(toy),
measure.vars = patterns('^A.\\d','^B.\\d','^C.\\d','^ID.\\d'),
value.name = c('A','B','C','ID'))
这将给出相同的最终结果。
which will give the same end-result.
如评论中所述, janitor
-package也可以帮助您解决此问题。 clean_names()
与 make.unique
函数类似。 看到这里的解释。
As mentioned in the comments, the janitor
-package can be helpful for this problem as well. The clean_names()
works similar as the make.unique
function. See here for an explanation.
这篇关于整齐的数据框架,重复列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!