遍历字符向量以在函数中使用 [英] Loop through a character vector to use in a function

查看:119
本文介绍了遍历字符向量以在函数中使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在进行方法比较研究,比较来自两个不同系统的测量结果.我的数据集中有许多列,这些列包含来自两个系统之一的测量值的可变信息.

aX和bX都是X的度量,但来自系统a和b.我大约有80对这样的水獭.

我的数据的简化版本如下:

set.seed(1)
df <- data.frame(
  ID = as.factor(rep(1:2, each=10)),
  aX = rep(1:10+rnorm(10,mean=1,sd=0.5),2),
  bX = rep(1:10+rnorm(10,mean=1,sd=0.5),2),
  aY = rep(1:10+rnorm(10,mean=1,sd=0.5), 2),
  bY = rep(1:10-rnorm(10,mean=1,sd=0.5),2))

head(df)

  ID       aX       bX       aY         bY
1  1 1.686773 2.755891 2.459489 -0.6793398
2  1 3.091822 3.194922 3.391068  1.0513939
3  1 3.582186 3.689380 4.037282  1.8061642
4  1 5.797640 3.892650 4.005324  3.0269025
5  1 6.164754 6.562465 6.309913  4.6885298
6  1 6.589766 6.977533 6.971936  5.2074973

我试图遍历字符向量的元素,并使用这些元素指向数据帧中的列.但是,当我尝试使用循环中生成的变量名调用函数时,总是收到错误消息.

为简单起见,我将循环更改为包括线性模型,因为这会产生与原始脚本中相同类型的错误.

#This line is only included to show that
#the formula used in the loop works when
#called with directly with the "real" column names

(broom::glance(lm(aX~bX, data = df)))$r.squared

[1] 0.9405218

#Now I try the loop

varlist <- c("X", "Y")

for(i in 1:length(varlist)){
  aVAR <- paste0("a", varlist[i])
  bVAR <- paste0("b", varlist[i]) 

  #VAR and cVAR appear to have names identical column names in the df dataframe
  print(c(aVAR, bVAR))

  #Try the formula with the loop variable names
  print((broom::glance(lm(aVAR~bVAR, data = df)))$r.squared)
  }

从循环内部调用函数时收到的错误消息因调用的函数而异,所有错误的共同点是当我尝试使用字符向量(varlist)进行选择时发生特定列.

错误消息示例:

rmcorr(ID, aVAR, bVAR, df)

Error in rmcorr(ID, aVAR, bVAR, df) : 
  'Measure 1' and 'Measure 2' must be numeric

broom::glance(lm(aVAR~bVAR, data = df))

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion

您能帮助我了解循环中出了什么问题吗?或提出建议并展示另一种方法来完成我想做的事情.

解决方案

变量不在公式(带有~的东西)中求值.

您可以输入

bert ~ ernie

并且即使命名为berternie的变量不存在也不会出错.公式存储符号/名称之间的关系,并且不会尝试对其进行评估.还要注意,我们在这里不使用引号.变量名(或符号)不能与字符值互换(即aX"aX"有很大不同).

因此,当根据字符串值组合公式时,建议您使用reformualte()函数.它在右侧使用名称向量,在左侧使用可选值.因此,您可以使用

来创建相同的公式

reformulate("ernie", "bert")
# bert ~ ernie

您可以将lm与lm一起使用

lm(reformulate(bVAR, aVAR), data = df)

I am conducting a methodcomparison study, comparing measurements from two different systems. My dataset has a large number of columns with variabels containing measurements from one of the two systems.

aX and bX are both measures of X, but from system a and b. I have about 80 pairs of variabels like this.

A simplified version of my data looks like this:

set.seed(1)
df <- data.frame(
  ID = as.factor(rep(1:2, each=10)),
  aX = rep(1:10+rnorm(10,mean=1,sd=0.5),2),
  bX = rep(1:10+rnorm(10,mean=1,sd=0.5),2),
  aY = rep(1:10+rnorm(10,mean=1,sd=0.5), 2),
  bY = rep(1:10-rnorm(10,mean=1,sd=0.5),2))

head(df)

  ID       aX       bX       aY         bY
1  1 1.686773 2.755891 2.459489 -0.6793398
2  1 3.091822 3.194922 3.391068  1.0513939
3  1 3.582186 3.689380 4.037282  1.8061642
4  1 5.797640 3.892650 4.005324  3.0269025
5  1 6.164754 6.562465 6.309913  4.6885298
6  1 6.589766 6.977533 6.971936  5.2074973

I am trying to loop through the elements of a character vector, and use the elements to point to columns in the dataframe. But I keep getting error messages when I try to call functions with variable names generated in the loop.

For simplicity, I have changed the loop to include a linear model as this produces the same type of error as I have in my original script.

#This line is only included to show that
#the formula used in the loop works when
#called with directly with the "real" column names

(broom::glance(lm(aX~bX, data = df)))$r.squared

[1] 0.9405218

#Now I try the loop

varlist <- c("X", "Y")

for(i in 1:length(varlist)){
  aVAR <- paste0("a", varlist[i])
  bVAR <- paste0("b", varlist[i]) 

  #VAR and cVAR appear to have names identical column names in the df dataframe
  print(c(aVAR, bVAR))

  #Try the formula with the loop variable names
  print((broom::glance(lm(aVAR~bVAR, data = df)))$r.squared)
  }

The error messages I get when calling the functions from inside the loop vary according to the function I am calling, the common denominator for all the errors is that the occur when I try to use the character vector (varlist) to pick out specific columns.

Example of error messages:

rmcorr(ID, aVAR, bVAR, df)

Error in rmcorr(ID, aVAR, bVAR, df) : 
  'Measure 1' and 'Measure 2' must be numeric

or

broom::glance(lm(aVAR~bVAR, data = df))

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion

Can you help me understand what goes wrong in the loop? Or suggest and show another way to acomplish what I am trying to do.

解决方案

Variables aren't evaluated in formulas (the things with ~).

You can type

bert ~ ernie

and not get an error even if variables named bert and ernie do not exist. Formula store relationships between symbols/names and does not attempt to evaulate them. Also note we are not using quotes here. Variable names (or symbols) are not interchangeable with character values (ie aX is very different from "aX").

So when putting together a formula from string values, I suggest you use the reformualte() function. It takes a vector of names for the right-hand side and an optional value for the left hand side. So you would create the same formula with

reformulate("ernie", "bert")
# bert ~ ernie

And you can use the with your lm

lm(reformulate(bVAR, aVAR), data = df)

这篇关于遍历字符向量以在函数中使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆