在R中,我可以使table()函数返回命名元素中NA值的数量吗? [英] In R, can I make the table() function return the number of NA values in a named element?

查看:443
本文介绍了在R中,我可以使table()函数返回命名元素中NA值的数量吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R汇总报告的大量数据.我希望能够使用lapply()table()函数生成表列表,从中可以提取所需的统计信息.其中有很多,所以我编写了一个函数来实现.我的问题是,即使我在每个表中都有该值,我也很难返回缺失的值(NA)的数量,因为我无法弄清楚如何告诉R我想要table()中的元素NA值的数量.据我所知,R正在命名"该元素NA ...而我不能称呼它.

我试图避免写一些复杂的语句,而我却说类似which(is.na(names(element[1]))) | names(element[1])=="var_I_want"之类的东西,因为我觉得那真的很罗word.我希望可以通过某种方式告诉R在每个表中用字符名称标记NA变量,或者告诉它选择一个标记为NA的变量,但是我还没有走运. /p>

最小示例:

example <- data.frame(ID=c(10,20,30,40,50),
                      V1=c("A","B","A",NA,"C"),
                      V2=c("Dog","Cat",NA,"Cat","Bunny"),
                      V3=c("Yes","No","No","Yes","No"),
                      V4=c("No",NA,"No","No","Yes"),
                      V5=c("No","Yes","Yes",NA,"No"))

varlist <- c("V1","V2","V3","V4","V5")

list_o_tables <- lapply(X=example[varlist],FUN=table,useNA="always")

list(V1=list_o_tables[["V1"]]["A"],
     V2=list_o_tables[["V2"]]["Cat"],
     V3=list_o_tables[["V3"]]["Yes"],
     V4=list_o_tables[["V4"]]["Yes"],
     V5=list_o_tables[["V5"]]["Yes"])

我得到的东西:

$V1
A 
2 

$V2
Cat 
  2 

$V3
Yes 
  2 

$V4
Yes 
  1 

$V5
Yes 
  2

我想要的东西:

$V1
A     <NA>
2       1

$V2
Cat   <NA>
  2     1

$V3
Yes   <NA> 
  2     0

$V4
Yes   <NA> 
  1     1

$V5
Yes   <NA> 
  2     1

解决方案

这很丑(IMHO),但有效:

my_table <- function(x){
    setNames(table(x,useNA = "always"),c(sort(unique(x[!is.na(x)])),'NA'))
}

因此,您可以改为lapply,然后就可以访问NA列.

更仔细地看,这源于factor的行为:

levels(factor(c(1,NA,2),exclude = NULL))
[1] "1" "2" NA 

我的回忆是,过去NA"NA"的因子水平之间的区别至少是造成R混乱的根源.我觉得我已经在r-devel上看到过一些关于此优缺点的争论,但目前我还不确定.

所以问题是,如果您有一个具有NA值的因子,那么您将其称为水平吗?从技术上讲,这是正确的,其中一个级别是丢失",而不是字面上的"NA".但是,如果table并没有严格遵守此规定,那将是很好的(IMHO).

I am using R to summarize a large amount of data for a report. I want to be able to use lapply() to generate a list of tables from the table() function, from which I can extract my desired statistics. There are a lot of these, so I've written a function to do it. My issue is that I am having difficulty returning the number of missing (NA) values even though I have that in each table, because I can't figure out how to tell R that I want the element from table() that holds the number of NA values. As far as I can tell, R is "naming" that element NA...and I can't call that.

I'm trying to avoid writing some complex statement where I say something like which(is.na(names(element[1]))) | names(element[1])=="var_I_want" because I feel like that's just really wordy. I was hoping there was some way to either tell R to label the NA variable in each table with a character name, or to tell it to pick the one labeled NA, but I haven't had much luck yet.

Minimal example:

example <- data.frame(ID=c(10,20,30,40,50),
                      V1=c("A","B","A",NA,"C"),
                      V2=c("Dog","Cat",NA,"Cat","Bunny"),
                      V3=c("Yes","No","No","Yes","No"),
                      V4=c("No",NA,"No","No","Yes"),
                      V5=c("No","Yes","Yes",NA,"No"))

varlist <- c("V1","V2","V3","V4","V5")

list_o_tables <- lapply(X=example[varlist],FUN=table,useNA="always")

list(V1=list_o_tables[["V1"]]["A"],
     V2=list_o_tables[["V2"]]["Cat"],
     V3=list_o_tables[["V3"]]["Yes"],
     V4=list_o_tables[["V4"]]["Yes"],
     V5=list_o_tables[["V5"]]["Yes"])

What I get:

$V1
A 
2 

$V2
Cat 
  2 

$V3
Yes 
  2 

$V4
Yes 
  1 

$V5
Yes 
  2

What I'd like:

$V1
A     <NA>
2       1

$V2
Cat   <NA>
  2     1

$V3
Yes   <NA> 
  2     0

$V4
Yes   <NA> 
  1     1

$V5
Yes   <NA> 
  2     1

解决方案

This is ugly (IMHO) but it works:

my_table <- function(x){
    setNames(table(x,useNA = "always"),c(sort(unique(x[!is.na(x)])),'NA'))
}

So you'd lapply this instead, and then you'd have access to the NA column.

Looking more closely, this is rooted in the behavior of factor:

levels(factor(c(1,NA,2),exclude = NULL))
[1] "1" "2" NA 

My recollection is that the distinction between a factor level of NA versus "NA" has been at the very least a source of confusion in R in the past. I feel like I've seen some debates about the merits of this on r-devel, but I can't recall for sure at the moment.

So the issue is, if you have a factor with NA values, what do you call the levels? Technically, this is correct, one of the levels is "missing" not literally "NA". It would be nice (IMHO) if table didn't adhere to this quite so strictly, though.

这篇关于在R中,我可以使table()函数返回命名元素中NA值的数量吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆