如何在 R 中获取此数据结构? [英] How to get this data structure in R?
问题描述
我正在尝试从当前数据结构中找到想要的数据结构.我部分知道预期数据结构的原理图.想要的数据结构还包括一个 list(...)
和 factor
类.当前数据结构
I am trying to find Wanted data structure from the current data structure.
I know the schematics of the expected data structure partially.
The wanted data structure includes one more list(...)
and factor
class.
Current data structure
> print(dat.m)
[,1] [,2]
ave_max 150 61
ave 60 0
lepo 41 0
dat.m <- structure(c(150L, 60L, 41L, 61L, 0L, 0L), .Dim = c(3L, 2L), .Dimnames = list(
c("ave_max", "ave", "lepo"), NULL))
想要的数据结构
> print(dat.m)
Vars M1 M2
1 ave_max 150 61
2 ave 60 0
3 lepo 41 0
我知道它在原理上类似于以下内容,其中未知 structure(c(...)
和 row.names = c(...)
>
I know it is schematically something close to the following where unknown structure(c(...)
and row.names = c(...)
structure(list(Vars = structure(c(...), .Label = c("ave_max",
"ave", "lepo"), class = "factor"), M1 = c(150, 60,
41), M2 = c(61, 0, 0)), .Names = c("Vars", "ave_max", "ave",
"lepo"), class = "data.frame", row.names = c(...))
R:3.4.0(向后移植)
操作系统:Debian 8.7
R: 3.4.0 (backports)
OS: Debian 8.7
推荐答案
如果你不坚持M1
、M2
等. 作为列名,还有一个更短的 data.table
解决方案:
If you don't insist on M1
, M2
, etc. as column names, there is an even shorter data.table
solution:
library(data.table) # CRAN version 1.10.4 used
as.data.table(dat.m, keep.rownames = "Vars")
# Vars V1 V2
#1: ave_max 150 61
#2: ave 60 0
#3: lepo 41 0
<小时>
如果你do坚持M1
、M2
等作为列名和你的矩阵dat.m
有很多列,可以重命名列:
If you do insist on M1
, M2
, etc. as column names and your matrix dat.m
has many columns, the columns can be renamed:
DT <- as.data.table(dat.m, keep.rownames = "Vars")
setnames(DT, stringr::str_replace(names(DT), "^V(?=\d+$)", "M"))
DT
# Vars M1 M2
#1: ave_max 150 61
#2: ave 60 0
#3: lepo 41 0
正则表达式使用前瞻断言来确保只有以V
开头并紧随其后并以至少一位数字结尾的列被更改.其他如 Vars
、V
、V17b
、VV3
没有被触及.
The regular expression uses a look-ahead assertion to ensure that only columns starting with V
and immediately followed and ended by at least one digit are changed. Others like Vars
, V
, V17b
, VV3
aren't touched.
如果您的矩阵有很多列,并且您的操作目的不仅仅是为了打印漂亮的列标题,您可以考虑将数据从宽格式改成长格式.例如,ggplot
更喜欢长格式.
If your matrix has many columns and the purpose of your operation is not just to have nice column headers for printing, you may consider to reshape your data from wide to long form. The long form is preferred by ggplot
for instance.
DT_long <- melt(as.data.table(dat.m, keep.rownames = "Vars"), id.vars = "Vars")
DT_long
# Vars variable value
#1: ave_max V1 150
#2: ave V1 60
#3: lepo V1 41
#4: ave_max V2 61
#5: ave V2 0
#6: lepo V2 0
在长格式中,操作数据通常更容易,例如,重命名列:
In long form, it is often easier to manipulate your data, for instance, to rename the columns:
DT_long[, variable := stringr::str_replace(variable, "^V", "M")]
DT_long
# Vars variable value
#1: ave_max M1 150
#2: ave M1 60
#3: lepo M1 41
#4: ave_max M2 61
#5: ave M2 0
#6: lepo M2 0
最后,你可以再次从长形变宽形
Finally, you can reshape from long to wide form again
dcast(DT_long, Vars ~ ...)
# Vars M1 M2
#1: ave 60 0
#2: ave_max 150 61
#3: lepo 41 0
请注意,转换公式识别两个特殊变量:.
和 ...
..
代表无变量;...
表示公式
中未提及的所有变量.(详见?data.table::dcast
).
Note that the cast formula recognizes two special variables: .
and ...
. .
represents no variable; ...
represents all variables not otherwise mentioned in formula
. (See ?data.table::dcast
for details).
这篇关于如何在 R 中获取此数据结构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!