改变多因素变量的水平 [英] Change level of multiple factor variables

查看:87
本文介绍了改变多因素变量的水平的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所有人-

我想通过说一下我已经看过此链接来尝试解决我的问题的开头:

I want to preface this by saying that I already looked at this link to try to solve my problem:

应用相同R数据帧中多个变量的因素水平

区别在于,在该问题中,OP希望更改所有具有相同水平的因素的水平.在我的实例中,我希望仅将设置为"的第一个级别更改为未知"之类,而将其余级别保留下来.我知道我可以使用"non-R"方式执行以下操作:

The difference is that in that problem, the OP wanted to change the levels of factors that all had the same levels. In my instance, I'm looking to change just the first level, which is set to ' ', to something like 'Unknown' and leave the rest of the levels alone. I know I could do this in a "non-R" way with something like this:

for (i in 64:88) {
  var.name <- colnames(df[i])
  levels(eval(parse(text=paste('df$', var.name, sep=''))))[levels(eval(parse(text=paste('df$', var.name, sep='')))) == ' '] <- 'Unknown'
}

但这是一种低效的方法.尝试使用上面链接的问题中提出的方法为我提供了以下代码:

But that's an inefficient way to do it. Trying to use the method proposed in the question linked above gave me this code:

df[64:88] <- lapply(df[64:88], factor, levels=c('Unknown', ??))

我不知道该用什么代替问号.我尝试仅使用"levels [-1]",但是很明显为什么不起作用.我也尝试过"levels(df [64:88])[-1]",但同样不好.因此,我尝试使用以下代码修改代码:

I don't know what to put in place of the question marks. I tried using just "levels[-1]" but it's obvious why that didn't work. I also tried "levels(df[64:88])[-1]" but again no good. So I tried to revamp the code with the following:

df[64:88] <- lapply(df[64:88], function(x) levels(x)[levels(x) == ' '] <- 'Unknown')

但是每当我调用levels $ transaction_type1(其中transaction_type1是df [64]的列名)时,我都会得到NULL.

but I get NULL whenever I call levels$transaction_type1 (where transaction_type1 is the column name of df[64]).

我在这里想念什么?

提前感谢您的帮助!

每两个请求,以下是我的数据示例:

Per a couple of requests, here is an example of my data:

df$transaction_type1[1:100]
  [1]                                                                                                                                                
 [13] HOME RENEW                                                                                                                                     
 [25]                                                                                                                                                
 [37]                                                                                                                                                
 [49]                                                                                                                                                
 [61] AUTO MANAGE                                                                                     AUTO RENEW                                     
 [73]             AUTO MANAGE                                                                                     AUTO RENEW                         
 [85]                                                                                                                                                
 [97]                                                
Levels:   AUTO CLAIM AUTO MANAGE AUTO PURCHASE AUTO RENEW HOME CLAIM HOME RENEW

如您所见,有很多等于''的值,所有25个变量看起来都像这样,但是级别不同.我的数据由222个变量和24,850行组成,所以我不知道用于提供示例数据的标准是什么.此外,此代码段也可能会有所帮助:

As you can see, there is a lot of values equal to ' ' and all 25 variables look just like this, but with different levels. My data consists of 222 variables and 24,850 rows, so I don't know what the standard is on SO for giving example data. Also, this snippet of code might help as well:

> levels(df$transaction_type1)
#[1] " "             "AUTO CLAIM"    "AUTO MANAGE"   "AUTO PURCHASE" "AUTO RENEW"    "HOME CLAIM"    "HOME RENEW"

> levels(df$transaction_type1)[levels(df$transaction_type1) == ' '] <- 'Unknown'
> levels(df$transaction_type1)
#[1] "Unknown"       "AUTO CLAIM"    "AUTO MANAGE"   "AUTO PURCHASE" "AUTO RENEW"    "HOME CLAIM"    "HOME RENEW"   

如果需要更多信息,请让我知道,以便我可以提供它,并了解寻求帮助的SO标准.谢谢!

If more information is needed, please let me know so I can provide it and also learn the SO standards of asking for help. Thanks!

推荐答案

像这样吗?

# it seems like your original data has a structure like this
df <- data.frame(x = factor(c("a", "", "b"), levels = c("", "a", "b")),
                 y = factor(c("c", "", "d"), levels = c("", "c", "d")))

lapply(df, levels)
# $x
# [1] ""  "a" "b"
# 
# $y
# [1] ""  "c" "d"    

# change the "" level to "unknown", and return the updated vector
df[] <- lapply(df, function(x){
 levels(x)[levels(x) == ""] <- "unknown"
 x
 })

lapply(df, levels)
# $x
# [1] "unknown" "a"       "b"      
# 
# $y
# [1] "unknown" "c"       "d"

这篇关于改变多因素变量的水平的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆