在数据框中添加额外的因素 [英] Add extra level to factors in dataframe
问题描述
我有一个包含数字和有序因子列的数据框。我有很多NA值,因此没有分配任何级别。我将NA更改为 No Answer,但因子列的级别不包含该级别,因此这是我的开始方式,但我不知道如何以一种优雅的方式完成它:
I have a data frame with numeric and ordered factor columns. I have lot of NA values, so no level is assigned to them. I changed NA to "No Answer", but levels of the factor columns don't contain that level, so here is how I started, but I don't know how to finish it in an elegant way:
addNoAnswer = function(df) {
factorOrNot = sapply(df, is.factor)
levelsList = lapply(df[, factorOrNot], levels)
levelsList = lapply(levelsList, function(x) c(x, "No Answer"))
...
是否可以直接将新级别应用于因子列,例如:
Is there a way to directly apply new levels to factor columns, for example, something like this:
df[, factorOrNot] = lapply(df[, factorOrNot], factor, levelsList)
当然,这不能正常工作。
Of course, this doesn't work correctly.
我希望保留级别的顺序并将无答案级别添加到最后一位。
I want the order of levels preserved and "No Answer" level added to last place.
推荐答案
您可以定义一个将水平添加到因子上的函数,但只返回其他值:
You could define a function that adds the levels to a factor, but just returns anything else:
addNoAnswer <- function(x){
if(is.factor(x)) return(factor(x, levels=c(levels(x), "No Answer")))
return(x)
}
然后,您只需 lapply
此函数添加到您的列中
Then you just lapply
this function to your columns
df <- as.data.frame(lapply(df, addNoAnswer))
那应该返回您想要的东西。
That should return what you want.
这篇关于在数据框中添加额外的因素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!