将因子级别应用于缺少因子级别的多个列 [英] Apply factor levels to multiple columns with missing factor levels

查看:61
本文介绍了将因子级别应用于缺少因子级别的多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含许多因素的数据框,并希望创建统计表以显示每个因素的分布,包括零观测值的因素水平.例如,这些数据:

I have a dataframe with many factors and want to create statistical tables that show the distribution for each factor, including factor levels with zero observations. For instance, these data:

structure(list(engag11 = structure(c(5L, 4L, 4L), .Label = c("Strongly Disagree", "Disagree", "Neither A or D", "Agree", "Strongly Agree"), class = "factor"), encor11 = structure(c(1L, 1L, 1L), .Label = c("Agree", "Neither Agree or Disagree", "Strongly Agree"), class = "factor"), know11 = structure(c(3L, 
1L, 1L), .Label = c("Agree", "Neither Agree or Disagree", "Strongly Agree"), class = "factor")), .Names = c("engag11", "encor11", "know11"), row.names = c(NA, 3L), class = "data.frame")

显示6行,但每列仅观察到一些因子水平.生成表格时,我不仅要显示观察到的水平的计数,还要显示未观察到的水平的计数(例如强烈不同意").像这样:

show 6 rows, but only some of the factor levels are observed for each column. When I produce a table, I'd like to display not only counts for the levels observed, but also levels NOT observed (such as "Strongly Disagree"). Like this:

# define the factor and levels
library(dplyr);library(pander);library(forcats)
eLevels<-factor(c(1,2,3,4,5), levels=1:5, labels=c("Strongly    Disagree","Disagree","Neither A or D","Agree","Strongly Agree"),ordered =TRUE )

# apply the factor to one variable
csc2$engag11<-factor(csc2$engag11,eLevels)

t1<-table(csc2$engag11)
pander(t1)

这将产生一个频率表,该表显示每个级别的计数,包括未报告/未观察到的级别的零.

Which results in a frequency table that shows counts for each level, including zeroes for levels not reported / observed.

但是我有几十个要转换的变量.在Stackoverflow上推荐的一个简单的 lapply 函数似乎不起作用,例如以下代码:

But I have dozens of variables to convert. A simple lapply function recommended on Stackoverflow doesn't seem to work, such as this one:

csc2[1:3]<-lapply(csc[1:3],eLevels)

我也为此尝试了一个简单的函数(n =列列表),但失败了:

I also tried a simple function (n=list of columns) for this, but failed:

facConv<-function(df,n)
{   df$n<-factor(c(1,2,3,4,5), levels=1:5, labels=c("Strongly 
Disagree","Disagree","Neither A or D","Agree","Strongly Agree") )
return(result)   }

有人可以提供解决方案吗?

Can someone offer a solution?

推荐答案

lapply 应该可以正常工作,您只需要指定 factor()函数:

An lapply should work fine, you just need to specify the factor() function:

csc2[1:3] <- lapply(csc2[1:3], function(x) factor(x, eLevels))

然后您可以像这样调用表:

Then you can call table like:

table(csc2[1])

#Strongly    Disagree             Disagree       Neither A or D                Agree       Strongly Agree 
#                   0                    0                    0                    2                    1 
table(csc2[2])

#Strongly    Disagree             Disagree       Neither A or D                Agree       Strongly Agree 
#                   0                    0                    0                    3                    0 

这篇关于将因子级别应用于缺少因子级别的多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆