我们可以在R中得到因子矩阵吗? [英] Can we get factor matrices in R?

查看:143
本文介绍了我们可以在R中得到因子矩阵吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎不可能在R中获得因子矩阵。是真的吗?如果是,为什么?如果没有,我该怎么办?

It seems not possible to get matrices of factor in R. Is it true? If yes, why? If not, how should I do?

f <- factor(sample(letters[1:5], 20, rep=TRUE), letters[1:5])
m <- matrix(f,4,5)
is.factor(m) # fail.

m <- factor(m,letters[1:5])
is.factor(m) # oh, yes? 
is.matrix(m) # nope. fail. 

dim(f) <- c(4,5) # aha?
is.factor(f) # yes.. 
is.matrix(f) # yes!

# but then I get a strange behavior
cbind(f,f) # is not a factor anymore
head(f,2) # doesn't give the first 2 rows but the first 2 elements of f
# should I worry about it?


推荐答案

在这种情况下,甚至庸俗的鸭,但 f 从:

In this case, it may walk like a duck and even quack like a duck, but f from:

f <- factor(sample(letters[1:5], 20, rep=TRUE), letters[1:5])
dim(f) <- c(4,5)

真的不是一个矩阵,即使 is.matrix()它严格是一个。为了成为 is.matrix()的矩阵, f 只需要是一个向量, a dim 属性。通过将属性添加到 f ,您将通过测试。然而,你已经看到,一旦你开始使用 f 作为一个矩阵,它很快失去的功能,使其成为一个因素(你最终使用的水平或维度

really isn't a matrix, even though is.matrix() claims that it strictly is one. To be a matrix as far as is.matrix() is concerned, f only needs to be a vector and have a dim attribute. By adding the attribute to f you pass the test. As you have seen, however, once you start using f as a matrix, it quickly loses the features that make it a factor (you end up working with the levels or the dimensions get lost).

实际上只有原子向量类型的矩阵和数组:

There are really only matrices and arrays for the atomic vector types:


  1. 逻辑,

  2. 整数,

  3. 真实,

  4. 复杂,

  5. 字符串(或字符)和

  6. raw

  1. logical,
  2. integer,
  3. real,
  4. complex,
  5. string (or character), and
  6. raw

作为@hadley提醒我,你也可以有列表矩阵和数组(通过在列表对象上设置 dim 属性),参见例如 Matrices& Arrays section Hadley's book, Advanced R 。)

plus, as @hadley reminds me, you can also have list matrices and arrays (by setting the dim attribute on a list object. See, for example, the Matrices & Arrays section of Hadley's book, Advanced R.)

这些类型之外的任何东西都会通过 as.vector()强制转换为某种低级类型 。这发生在 matrix(f,nrow = 3)不是因为 f 是原子 > is.atomic()(为 f TRUE >因为它在内部存储为整数,并且 typeof(f)返回integer), a class 属性。这会在 f 的内部表示上设置 OBJECT 位,并且任何具有类的内容应被强制转换为一个的原子类型通过 as.vector()

Anything outside those types would be coerced to some lower type via as.vector(). This happens in matrix(f, nrow = 3) not because f is atomic according to is.atomic() (which returns TRUE for f because it is internally stored as an integer and typeof(f) returns "integer"), but because it has a class attribute. This sets the OBJECT bit on the internal representation of f and anything that has a class is supposed to be coerced to one of the atomic types via as.vector():

matrix <- function(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
                   dimnames = NULL) {
    if (is.object(data) || !is.atomic(data)) 
        data <- as.vector(data)
....

通过 dim添加维度< - ()是一种快速创建数组而不复制对象的方法,但是这绕过了R将执行的一些检查和平衡,您通过其他方法将 f 强制转换为矩阵

Adding dimensions via dim<-() is a quick way to create an array without duplicating the object, but this bypasses some of the checks and balances that R would do if you coerced f to a matrix via the other methods

matrix(f, nrow = 3) # or
as.matrix(f)

当你尝试使用基础的矩阵或使用方法调度的基本功能。注意,在将维度分配给 f f 后仍然是factor code>:

This gets found out when you try to use basic functions that work on matrices or use method dispatch. Note that after assigning dimensions to f, f still is of class "factor":

> class(f)
[1] "factor"

c> head() behavior;你没有得到 head.matrix 行为,因为 f 不是一个矩阵,至少到S3机制涉及:

which explains the head() behaviour; you are not getting the head.matrix behaviour because f is not a matrix, at least as far as the S3 mechanism is concerned:

> debug(head.matrix)
> head(f) # we don't enter the debugger
[1] d c a d b d
Levels: a b c d e
> undebug(head.matrix)

head.default 方法调用 [,因为有一个因子方法,因此观察到的行为:

and the head.default method calls [ for which there is a factor method, and hence the observed behaviour:

> debugonce(`[.factor`)
> head(f)
debugging in: `[.factor`(x, seq_len(n))
debug: {
    y <- NextMethod("[")
    attr(y, "contrasts") <- attr(x, "contrasts")
    attr(y, "levels") <- attr(x, "levels")
    class(y) <- oldClass(x)
    lev <- levels(x)
    if (drop) 
        factor(y, exclude = if (anyNA(levels(x))) 
            NULL
        else NA)
    else y
}
....

cbind()行为可以从记录的行为中解释(从?cbind

The cbind() behaviour can be explained from the documented behaviour (from ?cbind, emphasis mine):


函数 cbind rbind S3 generic ,...

....

在默认方法中,所有的向量/矩阵必须是原子
(参见向量)或列表。不允许表达式。语言
对象(例如公式和调用)和对表将被强制
到列表:其他对象(例如名称和外部指针)将
作为元素包含在列表结果中。 任何类别的输入
可能已丢弃(特别是,因素由
的内部代码替换)

In the default method, all the vectors/matrices must be atomic (see vector) or lists. Expressions are not allowed. Language objects (such as formulae and calls) and pairlists will be coerced to lists: other objects (such as names and external pointers) will be included as elements in a list result. Any classes the inputs might have are discarded (in particular, factors are replaced by their internal codes).

再次, f 是类factor的事实正在打败你因为默认的 cbind 方法将被调用,它将剥离级别信息并返回内部整数代码,如您所观察到的。

Again, the fact that f is of class "factor" is defeating you because the default cbind method will get called and it will strip the levels information and return the internal integer codes as you observed.

在许多方面,你必须忽略或至少不完全信任 is.foo 函数告诉你的东西,因为他们只是使用简单的测试来说明是或不是 foo 对象。 is.matrix() is.atomic() f (带尺寸)从特定的角度看。它们在实现方面也是正确的,或者至少其行为可以从实现中理解;我认为 is.atomic(f)是不正确的,但如果通过if是一个原子类型 R核心意味着类型是由 typeof(f),然后 is.atomic()返回的东西是正确的。更严格的测试是 is.vector(),其中 f 失败:

In many respects, you have to ignore or at least not fully trust what the is.foo functions tell you, because they are just using simple tests to say whether something is or is not a foo object. is.matrix() and is.atomic() are clearly wrong when it comes to f (with dimensions) from a particular point of view. They are also right in terms of their implementation or at least their behaviour can be understood from the implementation; I think is.atomic(f) is not correct, but if by "if is of an atomic type" R Core mean "type" to be the thing returned by typeof(f) then is.atomic() is right. A more strict test is is.vector(), which f fails:

> is.vector(f)
[1] FALSE

code> names 属性:

because it has attributes beyond a names attribute:

> attributes(f)
$levels
[1] "a" "b" "c" "d" "e"

$class
[1] "factor"

$dim
[1] 4 5

至于如何得到一个因子矩阵,你不能,至少如果你想要保留因子信息(水平的标签)。一个解决方案是使用字符矩阵,其将保留标签:

As to how should you get a factor matrix, well you can't, at least if you want it to retain the factor information (the labels for the levels). One solution would be to use a character matrix, which would retain the labels:

> fl <- levels(f)
> fm <- matrix(f, ncol = 5)
> fm
     [,1] [,2] [,3] [,4] [,5]
[1,] "c"  "a"  "a"  "c"  "b" 
[2,] "d"  "b"  "d"  "b"  "a" 
[3,] "e"  "e"  "e"  "c"  "e" 
[4,] "a"  "b"  "b"  "a"  "e"

f 的级别,以备将来使用,但我们会失去一些矩阵的元素。

and we store the levels of f for future use incase we lose some elements of the matrix along the way.

内部整数表示:

> (fm2 <- matrix(unclass(f), ncol = 5))
     [,1] [,2] [,3] [,4] [,5]
[1,]    3    1    1    3    2
[2,]    4    2    4    2    1
[3,]    5    5    5    3    5
[4,]    1    2    2    1    5

,您随时可以通过以下方式重新回到级别/标签:

and you can always get back to the levels/labels again via:

> fm2[] <- fl[fm2]
> fm2
     [,1] [,2] [,3] [,4] [,5]
[1,] "c"  "a"  "a"  "c"  "b" 
[2,] "d"  "b"  "d"  "b"  "a" 
[3,] "e"  "e"  "e"  "c"  "e" 
[4,] "a"  "b"  "b"  "a"  "e"

使用数据框看起来不太理想,因为数据框架的每个组件都将被视为一个单独的因素,而你似乎想将数组作为一个因素与一组级别。

Using a data frame would seem to be not ideal for this as each component of the data frame would be treated as a separate factor whereas you seem to want to treat the array as a single factor with one set of levels.

如果你真的想做你想要的,它有一个因子矩阵,你很可能需要创建自己的S3类来做这个,加上所有的方法去。例如,您可以将因子矩阵存储为字符矩阵,但是类factorMatrix,其中您将因子矩阵旁边的级别作为额外的属性存储。然后你需要写 [。factorMatrix ,这将抓取级别,然后使用默认的 [矩阵,然后再次添加levels属性。您可以写 cbind 方法。但是一个简单的实现可能适合,如果你让你的对象有类 c(factorMatrix,matrix)matrix类),您将选择矩阵的所有属性/方法 (这将删除级别和其他属性),所以你可以至少使用对象,并看到你需要添加新的方法来填充类的行为。

If you really wanted to do what you want, which is have a factor matrix, you would most likely need to create your own S3 class to do this, plus all the methods to go with it. For example, you might store the factor matrix as a character matrix but with class "factorMatrix", where you stored the levels alongside the factor matrix as an extra attribute say. Then you would need to write [.factorMatrix, which would grab the levels, then use the default [ method on the matrix, and then add the levels attribute back on again. You could write cbindand head methods as well. The list of required method would grow quickly however, but a simple implementation may suit and if you make your objects have class c("factorMatrix", "matrix") (i.e inherit from the "matrix" class), you'll pick up all the properties/methods of the "matrix" class (which will drop the levels and other attributes) so you can at least work with the objects and see where you need to add new methods to fill out the behaviour of the class.

这篇关于我们可以在R中得到因子矩阵吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆