强制`table`包含R中两个数组的所有因子 [英] Force `table` to include all factors from both arrays in R

查看:56
本文介绍了强制`table`包含R中两个数组的所有因子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下R代码生成一个混淆矩阵,将一些数据的真实标签与神经网络的输出进行比较.

I am using the following R code to produce a confusion matrix comparing the true labels of some data to the output of a neural network.

t <- table(as.factor(test.labels), as.factor(nnetpredict))

但是,有时神经网络不能预测某个类的任何一个,因此表不是正方形的(例如,test.labels因子中有5个级别,而nnetpredict中只有3个级别因素).我想通过添加任何必要的因子水平并使它们的计数设置为零来使表平方.

However, sometimes the neural network doesn't predict any of a certain class, so the table isn't square (as, for example, there are 5 levels in the test.labels factor, but only 3 levels in the nnetpredict factor). I want to make the table square by adding in any factor levels necessary, and setting their counts to zero.

我应该怎么做?

示例:

> table(as.factor(a), as.factor(b))

    1 2 3 4 5 6 7 8 9 10
  1 1 0 0 0 0 0 0 1 0  0
  2 0 1 0 0 0 0 0 0 1  0
  3 0 0 1 0 0 0 0 0 0  1
  4 0 0 0 1 0 0 0 0 0  0
  5 0 0 0 0 1 0 0 0 0  0
  6 0 0 0 0 0 1 0 0 0  0
  7 0 0 0 0 0 0 1 0 0  0

您可以在上表中看到7行,但10列,因为a因子只有7个级别,而b因子却有10个级别.我想做的是用零填充表格,以便行标签和列标签相同,并且矩阵为正方形.在上面的示例中,这将产生:

You can see in the table above that there are 7 rows, but 10 columns, because the a factor only has 7 levels, whereas the b factor has 10 levels. What I want to do is to pad the table with zeros so that the row labels and the column labels are the same, and the matrix is square. From the example above, this would produce:

    1 2 3 4 5 6 7 8 9 10
  1  1 0 0 0 0 0 0 1 0  0
  2  0 1 0 0 0 0 0 0 1  0
  3  0 0 1 0 0 0 0 0 0  1
  4  0 0 0 1 0 0 0 0 0  0
  5  0 0 0 0 1 0 0 0 0  0
  6  0 0 0 0 0 1 0 0 0  0
  7  0 0 0 0 0 0 1 0 0  0
  8  0 0 0 0 0 0 0 0 0  0
  9  0 0 0 0 0 0 0 0 0  0
  10 0 0 0 0 0 0 0 0 0  0

我需要这样做的原因有两个:

The reason I need to do this is two-fold:

  • 向用户显示/在报告中显示
  • 因此我可以使用一个函数来计算Kappa统计信息,该统计信息需要一个格式如下的表格(正方形,相同的行和col标签)

推荐答案

编辑-第二轮以解决问题中的其他详细信息.我删除了第一个答案,因为它不再重要了.

这为我给出的测试用例提供了所需的输出,但是我绝对建议您使用真实数据进行彻底的测试.这里的方法是查找表中两个输入的级别的完整列表,并将该完整列表设置为生成表之前的级别.

This has produced the desired output for the test cases I've given it, but I definitely advise testing thoroughly with your real data. The approach here is to find the full list of levels for both inputs into the table and set that full list as the levels before generating the table.

squareTable <- function(x,y) {
    x <- factor(x)
    y <- factor(y)

    commonLevels <- sort(unique(c(levels(x), levels(y))))

    x <- factor(x, levels = commonLevels)
    y <- factor(y, levels = commonLevels)

    table(x,y)

}

两个测试用例:

> #Test case 1
> set.seed(1)
> x <- factor(sample(0:9, 100, TRUE))
> y <- factor(sample(3:7, 100, TRUE))
> 
> table(x,y)
   y
x   3 4 5 6 7
  0 2 1 3 1 0
  1 1 0 2 3 0
  2 1 0 3 4 3
  3 0 3 6 3 2
  4 4 4 3 2 1
  5 2 2 0 1 0
  6 1 2 3 2 3
  7 3 3 3 4 2
  8 0 4 1 2 4
  9 2 1 0 0 3
> squareTable(x,y)
   y
x   0 1 2 3 4 5 6 7 8 9
  0 0 0 0 2 1 3 1 0 0 0
  1 0 0 0 1 0 2 3 0 0 0
  2 0 0 0 1 0 3 4 3 0 0
  3 0 0 0 0 3 6 3 2 0 0
  4 0 0 0 4 4 3 2 1 0 0
  5 0 0 0 2 2 0 1 0 0 0
  6 0 0 0 1 2 3 2 3 0 0
  7 0 0 0 3 3 3 4 2 0 0
  8 0 0 0 0 4 1 2 4 0 0
  9 0 0 0 2 1 0 0 3 0 0
> squareTable(y,x)
   y
x   0 1 2 3 4 5 6 7 8 9
  0 0 0 0 0 0 0 0 0 0 0
  1 0 0 0 0 0 0 0 0 0 0
  2 0 0 0 0 0 0 0 0 0 0
  3 2 1 1 0 4 2 1 3 0 2
  4 1 0 0 3 4 2 2 3 4 1
  5 3 2 3 6 3 0 3 3 1 0
  6 1 3 4 3 2 1 2 4 2 0
  7 0 0 3 2 1 0 3 2 4 3
  8 0 0 0 0 0 0 0 0 0 0
  9 0 0 0 0 0 0 0 0 0 0
> 
> #Test case 2
> set.seed(1)
> xx <- factor(sample(0:2, 100, TRUE))
> yy <- factor(sample(3:5, 100, TRUE))
> 
> table(xx,yy)
   yy
xx   3  4  5
  0  4 14  9
  1 14 15  9
  2 11 11 13
> squareTable(xx,yy)
   y
x    0  1  2  3  4  5
  0  0  0  0  4 14  9
  1  0  0  0 14 15  9
  2  0  0  0 11 11 13
  3  0  0  0  0  0  0
  4  0  0  0  0  0  0
  5  0  0  0  0  0  0
> squareTable(yy,xx)
   y
x    0  1  2  3  4  5
  0  0  0  0  0  0  0
  1  0  0  0  0  0  0
  2  0  0  0  0  0  0
  3  4 14 11  0  0  0
  4 14 15 11  0  0  0
  5  9  9 13  0  0  0

这篇关于强制`table`包含R中两个数组的所有因子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆