为什么在内部使用时as.factor返回一个字符? [英] Why does as.factor return a character when used inside apply?

查看:93
本文介绍了为什么在内部使用时as.factor返回一个字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用apply()将变量转换为因子:

I want to convert variables into factors using apply():

a <- data.frame(x1 = rnorm(100),
                x2 = sample(c("a","b"), 100, replace = T),
                x3 = factor(c(rep("a",50) , rep("b",50))))

a2 <- apply(a, 2,as.factor)
apply(a2, 2,class)

导致:

         x1          x2          x3 
"character" "character" "character" 

我不明白为什么这会导致字符向量而不是因子向量.

I don't understand why this results in character vectors instead of factor vectors.

推荐答案

apply将您的data.frame转换为字符矩阵.使用lapply:

apply converts your data.frame to a character matrix. Use lapply:

lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"

在第二条命令中,应用lapply将结果转换为字符矩阵:

In second command apply converts result to character matrix, using lapply:

a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"

但是对于简单的查找,您可以使用str:

But for simple lookout you could use str:

str(a)
# 'data.frame':   100 obs. of  3 variables:
#  $ x1: num  -1.79 -1.091 1.307 1.142 -0.972 ...
#  $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
#  $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...

根据评论的其他说明:

apply要做的第一件事是将参数转换为矩阵.因此apply(a)等同于apply(as.matrix(a)).如您所见,str(as.matrix(a))为您提供:

The first thing that apply does is to convert an argument to a matrix. So apply(a) is equivalent to apply(as.matrix(a)). As you can see str(as.matrix(a)) gives you:

chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:3] "x1" "x2" "x3"

没有更多因素了,因此class所有列返回"character".
lapply适用于列,因此可以为您提供所需的内容(每列都执行class(a$column_name)之类的操作).

There are no more factors, so class return "character" for all columns.
lapply works on columns so gives you what you want (it does something like class(a$column_name) for each column).

您可以在apply的帮助中看到为什么applyas.factor不起作用:

You can see in help to apply why apply and as.factor doesn't work :

在所有情况下,结果都被强制 as.vector到基本向量之一 设定尺寸之前的类型, 这样(例如)因素结果 将被强制为字符数组.

In all cases the result is coerced by as.vector to one of the basic vector types before the dimensions are set, so that (for example) factor results will be coerced to a character array.

为什么sapplyas.factor不起作用,您可以在sapply的帮助中看到:

Why sapply and as.factor doesn't work you can see in help to sapply:

值(...)原子向量或矩阵 或长度与X(...)相同的列表 如果进行简化,则输出 类型由最高 返回值的类型 层次NULL<原始<逻辑上 整数<实<复数<字符< 清单<强迫后的表情 配对列表.

Value (...) An atomic vector or matrix or list of the same length as X (...) If simplification occurs, the output type is determined from the highest type of the return values in the hierarchy NULL < raw < logical < integer < real < complex < character < list < expression, after coercion of pairlists to lists.

您永远不会获得因子矩阵或data.frame.

You never get matrix of factors or data.frame.

简单,在评论中使用as.data.frame:

a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame':   100 obs. of  3 variables:
 $ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
 $ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
 $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...

但是,如果要用factor替换选定的字符列,则有一个窍门:

But if you want to replace selected character columns with factor there is a trick:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: chr  "a" "b" "c" "d" ...
 $ x2: chr  "A" "B" "C" "D" ...
 $ x3: chr  "A" "B" "C" "D" ...

columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: chr  "A" "B" "C" "D" ...

您可以使用它替换以下所有列:

You could use it to replace all columns using:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...

这篇关于为什么在内部使用时as.factor返回一个字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆