为什么在内部使用时as.factor返回一个字符? [英] Why does as.factor return a character when used inside apply?
问题描述
我想使用apply()
将变量转换为因子:
I want to convert variables into factors using apply()
:
a <- data.frame(x1 = rnorm(100),
x2 = sample(c("a","b"), 100, replace = T),
x3 = factor(c(rep("a",50) , rep("b",50))))
a2 <- apply(a, 2,as.factor)
apply(a2, 2,class)
导致:
x1 x2 x3
"character" "character" "character"
我不明白为什么这会导致字符向量而不是因子向量.
I don't understand why this results in character vectors instead of factor vectors.
推荐答案
apply
将您的data.frame转换为字符矩阵.使用lapply
:
apply
converts your data.frame to a character matrix. Use lapply
:
lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
在第二条命令中,应用lapply
将结果转换为字符矩阵:
In second command apply converts result to character matrix, using lapply
:
a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
但是对于简单的查找,您可以使用str
:
But for simple lookout you could use str
:
str(a)
# 'data.frame': 100 obs. of 3 variables:
# $ x1: num -1.79 -1.091 1.307 1.142 -0.972 ...
# $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
# $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
根据评论的其他说明:
apply
要做的第一件事是将参数转换为矩阵.因此apply(a)
等同于apply(as.matrix(a))
.如您所见,str(as.matrix(a))
为您提供:
The first thing that apply
does is to convert an argument to a matrix. So apply(a)
is equivalent to apply(as.matrix(a))
. As you can see str(as.matrix(a))
gives you:
chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "x1" "x2" "x3"
没有更多因素了,因此class
所有列返回"character"
.
lapply
适用于列,因此可以为您提供所需的内容(每列都执行class(a$column_name)
之类的操作).
There are no more factors, so class
return "character"
for all columns.
lapply
works on columns so gives you what you want (it does something like class(a$column_name)
for each column).
您可以在apply
的帮助中看到为什么apply
和as.factor
不起作用:
You can see in help to apply
why apply
and as.factor
doesn't work :
在所有情况下,结果都被强制 as.vector到基本向量之一 设定尺寸之前的类型, 这样(例如)因素结果 将被强制为字符数组.
In all cases the result is coerced by as.vector to one of the basic vector types before the dimensions are set, so that (for example) factor results will be coerced to a character array.
为什么sapply
和as.factor
不起作用,您可以在sapply
的帮助中看到:
Why sapply
and as.factor
doesn't work you can see in help to sapply
:
值(...)原子向量或矩阵 或长度与X(...)相同的列表 如果进行简化,则输出 类型由最高 返回值的类型 层次NULL<原始<逻辑上 整数<实<复数<字符< 清单<强迫后的表情 配对列表.
Value (...) An atomic vector or matrix or list of the same length as X (...) If simplification occurs, the output type is determined from the highest type of the return values in the hierarchy NULL < raw < logical < integer < real < complex < character < list < expression, after coercion of pairlists to lists.
您永远不会获得因子矩阵或data.frame.
You never get matrix of factors or data.frame.
简单,在评论中使用as.data.frame
:
a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame': 100 obs. of 3 variables:
$ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
$ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
$ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
但是,如果要用factor
替换选定的字符列,则有一个窍门:
But if you want to replace selected character columns with factor
there is a trick:
a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: chr "a" "b" "c" "d" ...
$ x2: chr "A" "B" "C" "D" ...
$ x3: chr "A" "B" "C" "D" ...
columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x3: chr "A" "B" "C" "D" ...
您可以使用它替换以下所有列:
You could use it to replace all columns using:
a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
这篇关于为什么在内部使用时as.factor返回一个字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!