因子水平和因子标签之间的混淆 [英] Confusion between factor levels and factor labels

查看:225
本文介绍了因子水平和因子标签之间的混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

水平和R的因子标签之间似乎有所不同. 到目前为止,我一直认为水平是因子水平的真实"名称,标签是用于输出的名称(例如表格和曲线图).显然,情况并非如此,如以下示例所示:

There seems to be a difference between levels and labels of a factor in R. Up to now, I always thought that levels were the 'real' name of factor levels, and labels were the names used for output (such as tables and plots). Obviously, this is not the case, as the following example shows:

df <- data.frame(v=c(1,2,3),f=c('a','b','c'))
str(df)
'data.frame':   3 obs. of  2 variables:
 $ v: num  1 2 3
 $ f: Factor w/ 3 levels "a","b","c": 1 2 3

df$f <- factor(df$f, levels=c('a','b','c'),
  labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))
levels(df$f)
[1] "Treatment A: XYZ" "Treatment B: YZX" "Treatment C: ZYX"

我认为编写脚本时仍可以访问级别('a','b','c'),但这不起作用:

I thought that the levels ('a','b','c') could somehow still be accessed when scripting, but this doesn't work:

> df$f=='a'
[1] FALSE FALSE FALSE

但是确实如此:

> df$f=='Treatment A: XYZ' 
[1]  TRUE FALSE FALSE

所以,我的问题由两部分组成:

So, my question consists of two parts:

  • 级别和标签之间有什么区别?

  • What's the difference between levels and labels?

是否可以为脚本和输出的因子级别使用不同的名称?

Is it possible to have different names for factor levels for scripting and output?

背景:对于较长的脚本,使用短因子级别的脚本似乎要容易得多.但是,对于报告和地块,此短因子水平可能不够用,应使用更精确的名称代替.

Background: For longer scripts, scripting with short factor levels seems to be much easier. However, for reports and plots, this short factor levels may not be adequate and should be replaced with preciser names.

推荐答案

非常简短:在factor()函数中,级别是输入,标签是输出.一个因子只有一个level属性,该属性由factor()函数中的labels参数设置.这与统计软件包(如SPSS)中的标签概念不同,一开始可能会造成混淆.

Very short : levels are the input, labels are the output in the factor() function. A factor has only a level attribute, which is set by the labels argument in the factor() function. This is different from the concept of labels in statistical packages like SPSS, and can be confusing in the beginning.

您在此代码行中所做的工作

What you do in this line of code

df$f <- factor(df$f, levels=c('a','b','c'),
  labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))

告诉R有一个向量df$f

  • 您想将其转化为因子
  • 其中不同级别被编码为a,b和c
  • ,并且您希望将其级别标记为处理A"等.

因子函数将查找值a,b和c,将它们转换为数值因子类,并将标签值添加到因子的level属性.此属性用于将内部数值转换为正确的标签.但是如您所见,没有label属性.

The factor function will look for the values a, b and c, convert them to numerical factor classes, and add the label values to the level attribute of the factor. This attribute is used to convert the internal numerical values to the correct labels. But as you see, there is no label attribute.

> df <- data.frame(v=c(1,2,3),f=c('a','b','c'))    
> attributes(df$f)
$levels
[1] "a" "b" "c"

$class
[1] "factor"

> df$f <- factor(df$f, levels=c('a','b','c'),
+   labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))    
> attributes(df$f)
$levels
[1] "Treatment A: XYZ" "Treatment B: YZX" "Treatment C: ZYX"

$class
[1] "factor"

这篇关于因子水平和因子标签之间的混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆