我们可以在 R 中获得因子矩阵吗? [英] Can we get factor matrices in R?

查看:22
本文介绍了我们可以在 R 中获得因子矩阵吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎不可能在 R 中获得因子矩阵.这是真的吗?如果是,为什么?如果没有,我该怎么办?

f <- factor(sample(letters[1:5], 20, rep=TRUE), letters[1:5])m <- 矩阵(f,4,5)is.factor(m) # 失败.m <- 因子(m,letters[1:5])is.factor(m) # 哦,是吗?is.matrix(m) # 不.失败.暗淡(f)<- c(4,5)#啊哈?is.factor(f) # 是的..is.matrix(f) # 是的!# 但后来我得到了一个奇怪的行为cbind(f,f) # 不再是一个因素head(f,2) # 不给出前 2 行,而是给出 f 的前 2 个元素#我应该担心吗?

解决方案

在这种情况下,它可能像鸭子一样走路,甚至像鸭子一样嘎嘎叫,但是f来自:

f <- factor(sample(letters[1:5], 20, rep=TRUE), letters[1:5])暗淡(f) <- c(4,5)

确实不是一个矩阵,尽管 is.matrix() 声称它严格来说是一个.就 is.matrix() 而言,要成为矩阵,f 只需要是一个向量并具有 dim 属性.通过将属性添加到 f 您通过了测试.然而,正如您所见,一旦您开始使用 f 作为矩阵,它很快就会失去使其成为一个因素的特征(您最终会使用级别或维度丢失).>

原子向量类型实际上只有矩阵和数组:

  1. 逻辑,
  2. 整数,
  3. 真实的,
  4. 复杂,
  5. 字符串(或字符),和
  6. 原始

另外,正如@hadley 提醒我的那样,您还可以拥有列表矩阵和数组(通过在列表对象上设置 dim 属性.例如,参见 Matrices & Arrays 部分 Hadley 的书,Advanced Rem>.)

这些类型之外的任何内容都将通过 as.vector() 被强制转换为某种较低的类型.这发生在 matrix(f, nrow = 3) 不是因为 f 是原子的 according is.atomic()> (它为 f 返回 TRUE 因为它在内部存储为整数,而 typeof(f) 返回 "integer"code>),但因为它有一个 class 属性.这会在 f 的内部表示上设置 OBJECT 位,并且任何具有类的东西都应该通过 as.vector():

matrix <- function(data = NA, nrow = 1, ncol = 1, byrow = FALSE,暗名称 = NULL) {if (is.object(data) || !is.atomic(data))数据 <- as.vector(data)....

通过dim<-() 添加维度是一种无需复制对象即可创建数组的快速方法,但这绕过了 R 在强制 时会执行的一些检查和平衡f 通过其他方法转换为矩阵

matrix(f, nrow = 3) # 或as.matrix(f)

当您尝试使用适用于矩阵的基本函数或使用方法分派时,就会发现这一点.请注意,在为 f 分配维度后,f 仍然属于 "factor" 类:

>等级(f)[1]因素"

解释了 head() 行为;您没有得到 head.matrix 行为,因为 f 不是矩阵,至少就 S3 机制而言:

>调试(头矩阵)>head(f) # 我们不进入调试器[1] d c a d b d级别: a b c d e>取消调试(head.matrix)

head.default 方法调用 [ 有一个 factor 方法,因此观察到的行为:

>调试(`[.factor`)>头(女)调试:`[.factor`(x, seq_len(n))调试:{y <- NextMethod("[")attr(y, "contrasts") <- attr(x, "contrasts")attr(y, "levels") <- attr(x, "levels")class(y) <- oldClass(x)lev <- 级别(x)如果(下降)因子(y,排除 = if(anyNA(水平(x)))空值否则不适用)否则是}....

cbind() 行为可以从记录的行为中得到解释(来自 ?cbind,重点是我的):

<块引用>

函数cbindrbindS3 通用,...

....

在默认方法中,所有向量/矩阵必须是原子的(参见 vector)或列表.不允许使用表达式.语对象(例如公式和调用)和配对列表将被强制到列表:其他对象(如名称和外部指针)将作为元素包含在列表结果中.输入的任何类可能已经被丢弃(特别是,因素被替换为他们的内部代码).

同样,f 属于 "factor" 类的事实打败了你,因为默认的 cbind 方法将被调用,它会剥离级别信息并返回您观察到的内部整数代码.

在许多方面,您必须忽略或至少不完全相信 is.foo 函数告诉您的内容,因为它们只是使用简单的测试来判断某些东西是否是 foo 对象.is.matrix()is.atomic() 当涉及到 f(带有维度)从特定观点.他们在实施方面也是正确的,或者至少可以从实施中理解他们的行为;我认为 is.atomic(f) 是不正确的,但是如果 if 是原子类型" R Core 的意思是类型"是 typeof(f) 那么 is.atomic() 是对的.更严格的测试是 is.vector()f 失败了:

>is.vector(f)[1] 错误

因为它具有超出 names 属性的属性:

>属性(f)$levels[1] "a" "b" "c" "d" "e"$class[1]因素"$dim[1] 4 5

至于你应该如何获得因子矩阵,你不能,至少如果你希望它保留因子信息(级别的标签).一种解决方案是使用字符矩阵,它会保留标签:

>fl<-水平(f)>fm <-矩阵(f,ncol = 5)>调频[,1] [,2] [,3] [,4] [,5][1,] "c" "a" "a" "c" "b"[2,] "d" "b" "d" "b" "a"[3,] "e" "e" "e" "c" "e"[4,] "a" "b" "b" "a" "e"

并且我们存储 f 的级别以备将来使用,以防我们在此过程中丢失了矩阵的一些元素.

或者使用内部整数表示:

>(fm2 <- 矩阵(unclass(f), ncol = 5))[,1] [,2] [,3] [,4] [,5][1,] 3 1 1 3 2[2,] 4 2 4 2 1[3,] 5 5 5 3 5[4,] 1 2 2 1 5

并且您始终可以通过以下方式再次返回级别/标签:

>fm2[] <- fl[fm2]>FM2[,1] [,2] [,3] [,4] [,5][1,] "c" "a" "a" "c" "b"[2,] "d" "b" "d" "b" "a"[3,] "e" "e" "e" "c" "e"[4,] "a" "b" "b" "a" "e"

使用数据框似乎并不理想,因为数据框的每个组件都将被视为一个单独的因素,而您似乎希望将数组视为具有一组级别的单个因素.

如果你真的想做你想做的事情,那就是有一个因子矩阵,你很可能需要创建自己的 S3 类来做到这一点,加上所有的方法.例如,您可以将因子矩阵存储为字符矩阵,但使用 "factorMatrix" 类,在其中将级别与因子矩阵一起存储为一个额外的属性,比如.然后,您需要编写 [.factorMatrix,它将获取级别,然后在矩阵上使用默认的 [ 方法,然后再次添加级别属性.您也可以编写 cbindhead 方法.然而,所需方法的列表会快速增长,但一个简单的实现可能适合,如果你让你的对象具有类 c("factorMatrix", "matrix")(即继承自 "matrix" 类),您将获得 "matrix" 类的所有属性/方法(这将删除级别和其他属性),因此您至少可以使用对象并查看您需要在何处添加新方法来填充类的行为.

It seems not possible to get matrices of factor in R. Is it true? If yes, why? If not, how should I do?

f <- factor(sample(letters[1:5], 20, rep=TRUE), letters[1:5])
m <- matrix(f,4,5)
is.factor(m) # fail.

m <- factor(m,letters[1:5])
is.factor(m) # oh, yes? 
is.matrix(m) # nope. fail. 

dim(f) <- c(4,5) # aha?
is.factor(f) # yes.. 
is.matrix(f) # yes!

# but then I get a strange behavior
cbind(f,f) # is not a factor anymore
head(f,2) # doesn't give the first 2 rows but the first 2 elements of f
# should I worry about it?

解决方案

In this case, it may walk like a duck and even quack like a duck, but f from:

f <- factor(sample(letters[1:5], 20, rep=TRUE), letters[1:5])
dim(f) <- c(4,5)

really isn't a matrix, even though is.matrix() claims that it strictly is one. To be a matrix as far as is.matrix() is concerned, f only needs to be a vector and have a dim attribute. By adding the attribute to f you pass the test. As you have seen, however, once you start using f as a matrix, it quickly loses the features that make it a factor (you end up working with the levels or the dimensions get lost).

There are really only matrices and arrays for the atomic vector types:

  1. logical,
  2. integer,
  3. real,
  4. complex,
  5. string (or character), and
  6. raw

plus, as @hadley reminds me, you can also have list matrices and arrays (by setting the dim attribute on a list object. See, for example, the Matrices & Arrays section of Hadley's book, Advanced R.)

Anything outside those types would be coerced to some lower type via as.vector(). This happens in matrix(f, nrow = 3) not because f is atomic according to is.atomic() (which returns TRUE for f because it is internally stored as an integer and typeof(f) returns "integer"), but because it has a class attribute. This sets the OBJECT bit on the internal representation of f and anything that has a class is supposed to be coerced to one of the atomic types via as.vector():

matrix <- function(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
                   dimnames = NULL) {
    if (is.object(data) || !is.atomic(data)) 
        data <- as.vector(data)
....

Adding dimensions via dim<-() is a quick way to create an array without duplicating the object, but this bypasses some of the checks and balances that R would do if you coerced f to a matrix via the other methods

matrix(f, nrow = 3) # or
as.matrix(f)

This gets found out when you try to use basic functions that work on matrices or use method dispatch. Note that after assigning dimensions to f, f still is of class "factor":

> class(f)
[1] "factor"

which explains the head() behaviour; you are not getting the head.matrix behaviour because f is not a matrix, at least as far as the S3 mechanism is concerned:

> debug(head.matrix)
> head(f) # we don't enter the debugger
[1] d c a d b d
Levels: a b c d e
> undebug(head.matrix)

and the head.default method calls [ for which there is a factor method, and hence the observed behaviour:

> debugonce(`[.factor`)
> head(f)
debugging in: `[.factor`(x, seq_len(n))
debug: {
    y <- NextMethod("[")
    attr(y, "contrasts") <- attr(x, "contrasts")
    attr(y, "levels") <- attr(x, "levels")
    class(y) <- oldClass(x)
    lev <- levels(x)
    if (drop) 
        factor(y, exclude = if (anyNA(levels(x))) 
            NULL
        else NA)
    else y
}
....

The cbind() behaviour can be explained from the documented behaviour (from ?cbind, emphasis mine):

The functions cbind and rbind are S3 generic, ...

....

In the default method, all the vectors/matrices must be atomic (see vector) or lists. Expressions are not allowed. Language objects (such as formulae and calls) and pairlists will be coerced to lists: other objects (such as names and external pointers) will be included as elements in a list result. Any classes the inputs might have are discarded (in particular, factors are replaced by their internal codes).

Again, the fact that f is of class "factor" is defeating you because the default cbind method will get called and it will strip the levels information and return the internal integer codes as you observed.

In many respects, you have to ignore or at least not fully trust what the is.foo functions tell you, because they are just using simple tests to say whether something is or is not a foo object. is.matrix() and is.atomic() are clearly wrong when it comes to f (with dimensions) from a particular point of view. They are also right in terms of their implementation or at least their behaviour can be understood from the implementation; I think is.atomic(f) is not correct, but if by "if is of an atomic type" R Core mean "type" to be the thing returned by typeof(f) then is.atomic() is right. A more strict test is is.vector(), which f fails:

> is.vector(f)
[1] FALSE

because it has attributes beyond a names attribute:

> attributes(f)
$levels
[1] "a" "b" "c" "d" "e"

$class
[1] "factor"

$dim
[1] 4 5

As to how should you get a factor matrix, well you can't, at least if you want it to retain the factor information (the labels for the levels). One solution would be to use a character matrix, which would retain the labels:

> fl <- levels(f)
> fm <- matrix(f, ncol = 5)
> fm
     [,1] [,2] [,3] [,4] [,5]
[1,] "c"  "a"  "a"  "c"  "b" 
[2,] "d"  "b"  "d"  "b"  "a" 
[3,] "e"  "e"  "e"  "c"  "e" 
[4,] "a"  "b"  "b"  "a"  "e"

and we store the levels of f for future use incase we lose some elements of the matrix along the way.

Or work with the internal integer representation:

> (fm2 <- matrix(unclass(f), ncol = 5))
     [,1] [,2] [,3] [,4] [,5]
[1,]    3    1    1    3    2
[2,]    4    2    4    2    1
[3,]    5    5    5    3    5
[4,]    1    2    2    1    5

and you can always get back to the levels/labels again via:

> fm2[] <- fl[fm2]
> fm2
     [,1] [,2] [,3] [,4] [,5]
[1,] "c"  "a"  "a"  "c"  "b" 
[2,] "d"  "b"  "d"  "b"  "a" 
[3,] "e"  "e"  "e"  "c"  "e" 
[4,] "a"  "b"  "b"  "a"  "e"

Using a data frame would seem to be not ideal for this as each component of the data frame would be treated as a separate factor whereas you seem to want to treat the array as a single factor with one set of levels.

If you really wanted to do what you want, which is have a factor matrix, you would most likely need to create your own S3 class to do this, plus all the methods to go with it. For example, you might store the factor matrix as a character matrix but with class "factorMatrix", where you stored the levels alongside the factor matrix as an extra attribute say. Then you would need to write [.factorMatrix, which would grab the levels, then use the default [ method on the matrix, and then add the levels attribute back on again. You could write cbindand head methods as well. The list of required method would grow quickly however, but a simple implementation may suit and if you make your objects have class c("factorMatrix", "matrix") (i.e inherit from the "matrix" class), you'll pick up all the properties/methods of the "matrix" class (which will drop the levels and other attributes) so you can at least work with the objects and see where you need to add new methods to fill out the behaviour of the class.

这篇关于我们可以在 R 中获得因子矩阵吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆