为什么 as.factor 在 apply 内部使用时会返回一个字符? [英] Why does as.factor return a character when used inside apply?

查看:14
本文介绍了为什么 as.factor 在 apply 内部使用时会返回一个字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 apply() 将变量转换为因子:

I want to convert variables into factors using apply():

a <- data.frame(x1 = rnorm(100),
                x2 = sample(c("a","b"), 100, replace = T),
                x3 = factor(c(rep("a",50) , rep("b",50))))

a2 <- apply(a, 2,as.factor)
apply(a2, 2,class)

结果:

         x1          x2          x3 
"character" "character" "character" 

我不明白为什么这会导致字符向量而不是因子向量.

I don't understand why this results in character vectors instead of factor vectors.

推荐答案

apply 将您的 data.frame 转换为字符矩阵.使用lapply:

apply converts your data.frame to a character matrix. Use lapply:

lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"

在第二个命令中应用将结果转换为字符矩阵,使用lapply:

In second command apply converts result to character matrix, using lapply:

a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"

但是对于简单的监视,您可以使用 str:

But for simple lookout you could use str:

str(a)
# 'data.frame':   100 obs. of  3 variables:
#  $ x1: num  -1.79 -1.091 1.307 1.142 -0.972 ...
#  $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
#  $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...

根据评论补充说明:

apply 做的第一件事是将参数转换为矩阵.所以apply(a)等价于apply(as.matrix(a)).正如你所看到的 str(as.matrix(a)) 给你:

The first thing that apply does is to convert an argument to a matrix. So apply(a) is equivalent to apply(as.matrix(a)). As you can see str(as.matrix(a)) gives you:

chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:3] "x1" "x2" "x3"

没有更多的因素,所以 class 为所有列返回 "character".
lapply 适用于列,因此为您提供所需的内容(它为每一列执行类似于 class(a$column_name) 的操作).

There are no more factors, so class return "character" for all columns.
lapply works on columns so gives you what you want (it does something like class(a$column_name) for each column).

您可以在帮助中查看 apply 为什么 applyas.factor 不起作用:

You can see in help to apply why apply and as.factor doesn't work :

在所有情况下,结果都是由as.vector 到基本向量之一设置尺寸之前的类型,以便(例如)因子结果将被强制转换为字符数组.

In all cases the result is coerced by as.vector to one of the basic vector types before the dimensions are set, so that (for example) factor results will be coerced to a character array.

为什么 sapplyas.factor 不起作用你可以在 sapply 的帮助中看到:

Why sapply and as.factor doesn't work you can see in help to sapply:

Value (...) 一个原子向量或矩阵或与 X (...) 相同长度的列表如果进行简化,则输出类型是从最高的返回值的类型层次结构 NULL <原始的逻辑 <整数<真正的复杂的字符 <列表<表示,强制后成对列表到列表.

Value (...) An atomic vector or matrix or list of the same length as X (...) If simplification occurs, the output type is determined from the highest type of the return values in the hierarchy NULL < raw < logical < integer < real < complex < character < list < expression, after coercion of pairlists to lists.

你永远不会得到因子矩阵或 data.frame.

You never get matrix of factors or data.frame.

简单,使用 as.data.frame 正如你在评论中写的:

Simple, use as.data.frame as you wrote in comment:

a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame':   100 obs. of  3 variables:
 $ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
 $ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
 $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...

但是如果你想用 factor 替换选定的字符列,有一个技巧:

But if you want to replace selected character columns with factor there is a trick:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: chr  "a" "b" "c" "d" ...
 $ x2: chr  "A" "B" "C" "D" ...
 $ x3: chr  "A" "B" "C" "D" ...

columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: chr  "A" "B" "C" "D" ...

您可以使用它来替换所有列:

You could use it to replace all columns using:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...

这篇关于为什么 as.factor 在 apply 内部使用时会返回一个字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆