R:为数据框列分配变量标签 [英] R: Assign variable labels of data frame columns

查看:555
本文介绍了R:为数据框列分配变量标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为data.frame列的可变标签苦苦挣扎.假设我有以下数据框(是更大数据框的一部分):

I am struggling with variable labels of data.frame columns. Say I have the following data frame (part of much larger data frame):

data <- data.frame(age = c(21, 30, 25, 41, 29, 33), sex = factor(c(1, 2, 1, 2, 1, 2), labels = c("Female", "Male")))
#

我也有一个带有此数据帧变量标签的命名向量:

I also have a named vector with the variable labels for this data frame:

var.labels <- c(age = "Age in Years", sex = "Sex of the participant")

我想使用Hmisc包中的函数labelvar.labels中的变量标签分配给数据框data中的列.我可以像这样一个接一个地做它们,然后检查结果:

I want to assign the variable labels in var.labels to the columns in the data frame data using the function label from the Hmisc package. I can do them one by one like this and check the result afterwards:

> label(data[["age"]]) <- "Age in years"
> label(data[["sex"]]) <- "Sex of the participant"
> label(data)
                 age                      sex
      "Age in years" "Sex of the participant"

变量标签被分配为列的属性:

The variable labels are assigned as attributes of the columns:

> attr(data[["age"]], "label")
[1] "Age in years"
> attr(data[["sex"]], "label")
[1] "Sex of the participant"

很棒.但是,对于较大的数据帧(例如100列或更多列),这将不方便也不高效.另一种选择是直接将它们分配为属性:

Wonderful. However, with a larger data frame, say 100 or more columns, this will not be convenient or efficient. Another option is to assign them as attributes directly:

> attr(data, "variable.labels") <- var.labels

没有帮助.变量标签未分配给列:

Does not help. The variable labels are not assigned to the columns:

> label(data)
age sex
 ""  ""

相反,它们被分配为数据框本身的属性(请参见列表的最后一个组成部分):

Instead, they are assigned as an attribute of the data frame itself (see the last component of the list):

> attributes(data)
$names
[1] "age" "sex"

$row.names
[1] 1 2 3 4 5 6

$class
[1] "data.frame"

$variable.labels
                 age                      sex
      "Age in Years" "Sex of the participant"

这不是我想要的.我需要变量标签作为列的属性.我试图编写以下函数(以及许多其他函数):

And this is not what I want. I need the variable labels as attributes of the columns. I tried to write the following function (and many others):

set.var.labels <- function(dataframe, label.vector){
  column.names <- names(dataframe)
  dataframe <- mapply(label, column.names, label.vector)
  return(dataframe)
}

然后执行它:

> set.var.labels(data, var.labels)

没有帮助.它返回向量var.labels的值,但不分配变量标签.如果我尝试将其分配给新对象,它将仅包含变量标签的值作为矢量.

Did not help. It returns the values of the vector var.labels but does not assign the variable labels. If I try to assign it to a new object, it just contains the values of the variable labels as a vector.

推荐答案

您可以通过从var.labels的命名向量创建列表并将其分配给label值来实现此目的.我已经使用match确保var.labels的值分配给data中的相应列,即使var.labels的顺序不同于data列的顺序.

You can do this by creating a list from the named vector of var.labels and assigning that to the label values. I've used match to ensure that values of var.labels are assigned to their corresponding column in data even if the order of var.labels is different from the order of the data columns.

library(Hmisc)

var.labels = c(age="Age in Years", sex="Sex of the participant")

label(data) = as.list(var.labels[match(names(data), names(var.labels))])

label(data)
                     age                      sex 
          "Age in Years" "Sex of the participant" 

原始答案

我最初的答案使用的是lapply,这实际上不是必需的.这是出于存档目的的原始答案:

My original answer used lapply, which isn't actually necessary. Here's the original answer for archival purposes:

您可以使用lapply分配标签:

label(data) = lapply(names(data), function(x) var.labels[match(x, names(var.labels))])

lapply将函数应用于列表或向量的每个元素.在这种情况下,该函数将应用于names(data)的每个值,并从var.labels中选择与names(data)的当前值相对应的标签值.

lapply applies a function to each element of a list or vector. In this case the function is applied to each value of names(data) and it picks out the label value from var.labels that corresponds to the current value of names(data).

通读一些教程是了解一般想法的一种好方法,但是,如果您开始在不同情况下使用lapply并查看其行为,您将真正理解它.

Reading through a few tutorials is a good way to get the general idea, but you'll really get the hang of it if you start using lapply in different situations and see how it behaves.

这篇关于R:为数据框列分配变量标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆