R:分配数据框列的变量标签 [英] R: Assign variable labels of data frame columns

查看:31
本文介绍了R:分配数据框列的变量标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力处理 data.frame 列的可变标签.假设我有以下数据框(更大数据框的一部分):

I am struggling with variable labels of data.frame columns. Say I have the following data frame (part of much larger data frame):

data <- data.frame(age = c(21, 30, 25, 41, 29, 33), sex = factor(c(1, 2, 1, 2, 1, 2), labels = c("Female", "Male")))
#

我还有一个命名向量,其中包含此数据框的变量标签:

I also have a named vector with the variable labels for this data frame:

var.labels <- c(age = "Age in Years", sex = "Sex of the participant")

我想使用label 中的函数将var.labels 中的变量标签分配给数据框data 中的列code>Hmisc 包.我可以像这样一个一个地做,然后检查结果:

I want to assign the variable labels in var.labels to the columns in the data frame data using the function label from the Hmisc package. I can do them one by one like this and check the result afterwards:

> label(data[["age"]]) <- "Age in years"
> label(data[["sex"]]) <- "Sex of the participant"
> label(data)
                 age                      sex
      "Age in years" "Sex of the participant"

变量标签被分配为列的属性:

The variable labels are assigned as attributes of the columns:

> attr(data[["age"]], "label")
[1] "Age in years"
> attr(data[["sex"]], "label")
[1] "Sex of the participant"

太棒了.但是,对于更大的数据框,比如 100 列或更多列,这将不方便或高效.另一种选择是直接将它们分配为属性:

Wonderful. However, with a larger data frame, say 100 or more columns, this will not be convenient or efficient. Another option is to assign them as attributes directly:

> attr(data, "variable.labels") <- var.labels

没有帮助.变量标签未分配给列:

Does not help. The variable labels are not assigned to the columns:

> label(data)
age sex
 ""  ""

相反,它们被分配为数据框本身的属性(参见列表的最后一个组件):

Instead, they are assigned as an attribute of the data frame itself (see the last component of the list):

> attributes(data)
$names
[1] "age" "sex"

$row.names
[1] 1 2 3 4 5 6

$class
[1] "data.frame"

$variable.labels
                 age                      sex
      "Age in Years" "Sex of the participant"

这不是我想要的.我需要变量标签作为列的属性.我尝试编写以下函数(以及许多其他函数):

And this is not what I want. I need the variable labels as attributes of the columns. I tried to write the following function (and many others):

set.var.labels <- function(dataframe, label.vector){
  column.names <- names(dataframe)
  dataframe <- mapply(label, column.names, label.vector)
  return(dataframe)
}

然后执行它:

> set.var.labels(data, var.labels)

没有帮助.它返回向量 var.labels 的值,但不分配变量标签.如果我尝试将它分配给一个新对象,它只包含变量标签的值作为向量.

Did not help. It returns the values of the vector var.labels but does not assign the variable labels. If I try to assign it to a new object, it just contains the values of the variable labels as a vector.

推荐答案

您可以通过从 var.labels 的命名向量创建列表并将其分配给 label 值.我使用 match 来确保 var.labels 的值被分配到 data 中的相应列,即使 的顺序var.labelsdata 列的顺序不同.

You can do this by creating a list from the named vector of var.labels and assigning that to the label values. I've used match to ensure that values of var.labels are assigned to their corresponding column in data even if the order of var.labels is different from the order of the data columns.

library(Hmisc)

var.labels = c(age="Age in Years", sex="Sex of the participant")

label(data) = as.list(var.labels[match(names(data), names(var.labels))])

label(data)
                     age                      sex 
          "Age in Years" "Sex of the participant" 

原答案

我的原始答案使用了 lapply,这实际上并不是必需的.这是用于存档目的的原始答案:

My original answer used lapply, which isn't actually necessary. Here's the original answer for archival purposes:

您可以使用 lapply 分配标签:

You can assign the labels using lapply:

label(data) = lapply(names(data), function(x) var.labels[match(x, names(var.labels))])

lapply 将函数应用于列表或向量的每个元素.在这种情况下,该函数应用于 names(data) 的每个值,并从 var.labels 中挑选出与 的当前值相对应的标签值名称(数据).

lapply applies a function to each element of a list or vector. In this case the function is applied to each value of names(data) and it picks out the label value from var.labels that corresponds to the current value of names(data).

通读一些教程是获得总体思路的好方法,但如果您开始在不同情况下使用 lapply 并了解它的行为,您将真正掌握它的窍门.

Reading through a few tutorials is a good way to get the general idea, but you'll really get the hang of it if you start using lapply in different situations and see how it behaves.

这篇关于R:分配数据框列的变量标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆