如何使数据框中的因子水平在所有列之间保持一致? [英] How to make the levels of a factor in a data frame consistent across all columns?

查看:124
本文介绍了如何使数据框中的因子水平在所有列之间保持一致?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含5个不同列的数据框:

I have a data frame with 5 different columns:

         Test1   Test2   Test3  Test4  Test5 
Sample1  PASS    PASS    FAIL    WARN   WARN
Sample2  PASS    PASS    FAIL    PASS   WARN
Sample3  PASS    FAIL    FAIL    PASS   WARN
Sample4  PASS    FAIL    FAIL    PASS   WARN
Sample5  PASS    WARN    FAIL    WARN   WARN

在每一列中,为每个级别分配一个不同的因子. 在第1列中,"PASS"为1. 在第2栏中,通过"为2,失败"为1. 在第3列中,失败"为1. 在第4列中,"PASS"为1,"WARN"为2. 在第5列中,"WARN"为1.

In each column, each level is assigned a different factor. In column 1, "PASS" is 1. In column 2, "PASS" is 2 and "FAIL is 1. In column 3, "FAIL" is 1. In column 4, "PASS" is 1 and "WARN" is 2. In column 5, "WARN" IS 1.

按字母顺序进行 我需要在所有列中将"PASS"设置为1,在所有列中将"WARN"设置为2,并在所有列中将"FAIL"设置为3,以便随后可以将其转换为矩阵并将其转换为热图.

It is doing it by alphabetical order I need "PASS" be 1 in all columns, "WARN" to be 2 in all columns, and "FAIL" 3 in all columns, so that I can then convert into a matrix and turn it into a heatmap.

当前,它正在根据特定列中显示的因素以及字母顺序将因素分配给级别.

Currently it is assigning the factors to the levels depending on which ones show up in a specific column, and by alphabetical order.

如何在整个数据帧中保持恒定?

How can I keep it constant throughout the entire data frame?

推荐答案

您可以通过循环(lapply)将数据集"df"的级别更改为相同的顺序,并使用指定levels并将其分配回相应的列.

You could change the levels of the dataset "df" to be in the same order by looping (lapply) and convert to factor again with the specified levels and assign it back to the corresponding columns.

lvls <- c('PASS', 'WARN', 'FAIL')
df[] <-  lapply(df, factor, levels=lvls)
str(df)
# 'data.frame': 5 obs. of  5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2

如果您选择使用data.table

library(data.table)
setDT(df)[, names(df):= lapply(.SD, factor, levels=lvls)]

setDT转换为"data.frame"转换为"data.table",将数据集的列名称分配(:=)到重新转换的因子列(lapply(..)). .SD表示数据表的子集".

setDT converts to "data.frame" to "data.table", assign (:=) the column names of the dataset to the reconverted factor columns (lapply(..)). .SD denotes "Subset of Datatable".

df <- structure(list(Test1 = structure(c(1L, 1L, 1L, 1L, 1L), 
.Label = "PASS", class = "factor"), 
  Test2 = structure(c(2L, 2L, 1L, 1L, 3L), .Label = c("FAIL", 
 "PASS", "WARN"), class = "factor"), Test3 = structure(c(1L, 
 1L, 1L, 1L, 1L), .Label = "FAIL", class = "factor"), Test4 = 
 structure(c(2L, 1L, 1L, 1L, 2L), .Label = c("PASS", "WARN", "FAIL"), 
 class = "factor"), Test5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = 
"WARN", class = "factor")), .Names = c("Test1", 
"Test2", "Test3", "Test4", "Test5"), row.names = c("Sample1", 
"Sample2", "Sample3", "Sample4", "Sample5"), class = "data.frame")

这篇关于如何使数据框中的因子水平在所有列之间保持一致?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆