如何使数据框中的因子水平在所有列之间保持一致? [英] How to make the levels of a factor in a data frame consistent across all columns?
问题描述
我有一个包含5个不同列的数据框:
I have a data frame with 5 different columns:
Test1 Test2 Test3 Test4 Test5
Sample1 PASS PASS FAIL WARN WARN
Sample2 PASS PASS FAIL PASS WARN
Sample3 PASS FAIL FAIL PASS WARN
Sample4 PASS FAIL FAIL PASS WARN
Sample5 PASS WARN FAIL WARN WARN
在每一列中,为每个级别分配一个不同的因子. 在第1列中,"PASS"为1. 在第2栏中,通过"为2,失败"为1. 在第3列中,失败"为1. 在第4列中,"PASS"为1,"WARN"为2. 在第5列中,"WARN"为1.
In each column, each level is assigned a different factor. In column 1, "PASS" is 1. In column 2, "PASS" is 2 and "FAIL is 1. In column 3, "FAIL" is 1. In column 4, "PASS" is 1 and "WARN" is 2. In column 5, "WARN" IS 1.
按字母顺序进行 我需要在所有列中将"PASS"设置为1,在所有列中将"WARN"设置为2,并在所有列中将"FAIL"设置为3,以便随后可以将其转换为矩阵并将其转换为热图.
It is doing it by alphabetical order I need "PASS" be 1 in all columns, "WARN" to be 2 in all columns, and "FAIL" 3 in all columns, so that I can then convert into a matrix and turn it into a heatmap.
当前,它正在根据特定列中显示的因素以及字母顺序将因素分配给级别.
Currently it is assigning the factors to the levels depending on which ones show up in a specific column, and by alphabetical order.
如何在整个数据帧中保持恒定?
How can I keep it constant throughout the entire data frame?
推荐答案
您可以通过循环(lapply
)将数据集"df"的级别更改为相同的顺序,并使用指定levels
并将其分配回相应的列.
You could change the levels of the dataset "df" to be in the same order by looping (lapply
) and convert to factor
again with the specified levels
and assign it back to the corresponding columns.
lvls <- c('PASS', 'WARN', 'FAIL')
df[] <- lapply(df, factor, levels=lvls)
str(df)
# 'data.frame': 5 obs. of 5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2
如果您选择使用data.table
library(data.table)
setDT(df)[, names(df):= lapply(.SD, factor, levels=lvls)]
setDT
转换为"data.frame"转换为"data.table",将数据集的列名称分配(:=
)到重新转换的因子列(lapply(..)
). .SD
表示数据表的子集".
setDT
converts to "data.frame" to "data.table", assign (:=
) the column names of the dataset to the reconverted factor columns (lapply(..)
). .SD
denotes "Subset of Datatable".
df <- structure(list(Test1 = structure(c(1L, 1L, 1L, 1L, 1L),
.Label = "PASS", class = "factor"),
Test2 = structure(c(2L, 2L, 1L, 1L, 3L), .Label = c("FAIL",
"PASS", "WARN"), class = "factor"), Test3 = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "FAIL", class = "factor"), Test4 =
structure(c(2L, 1L, 1L, 1L, 2L), .Label = c("PASS", "WARN", "FAIL"),
class = "factor"), Test5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label =
"WARN", class = "factor")), .Names = c("Test1",
"Test2", "Test3", "Test4", "Test5"), row.names = c("Sample1",
"Sample2", "Sample3", "Sample4", "Sample5"), class = "data.frame")
这篇关于如何使数据框中的因子水平在所有列之间保持一致?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!