R:在通用(通用)功能的功能参数中指定变量名称 [英] R: specifying variable name in function parameter for a function of general (universal) use

查看:177
本文介绍了R:在通用(通用)功能的功能参数中指定变量名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的小功能和数据。请注意,我想设计一个不是个人用的一般功能的功能。

Here is my small function and data. Please note that I want to design a function not personal use for general use.

dataf <- data.frame (A= 1:10, B= 21:30, C= 51:60, D = 71:80)

myfun <- function (dataframe, varA, varB) {
              daf2 <- data.frame (A = dataframe$A*dataframe$B, 
              B= dataframe$C*dataframe$D)
              anv1 <- lm(varA ~ varB, daf2)
              print(anova(anv1)) 
             }             

myfun (dataframe = dataf, varA = A, varB = B)

Error in eval(expr, envir, enclos) : object 'A' not found

当我指定数据$变量名称时,它可以正常工作,但是我不想制定这样的规范,因此要求用户同时写入数据和变量名称在函数中。

It works with when I specify data$variable name, but I do not want to make such specification so that it requires the user to write both data and variable name in the function.

 myfun (dataframe = dataf, varA = dataf$A, varB = dataf$B)
Analysis of Variance Table

Response: varA
          Df Sum Sq Mean Sq    F value    Pr(>F)    
varB       1   82.5    82.5 1.3568e+33 < 2.2e-16 ***
Residuals  8    0.0     0.0                         
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Warning message:
In anova.lm(anv1) :
  ANOVA F-tests on an essentially perfect fit are unreliable

在这种情况下最好的做法是什么?我可以把数据框附加在函数内吗?这可能是弊端还是潜在的冲突/危险?请参阅输出中的蒙版语句。我相信一旦附加会继续附上会议提醒的权利?这里提供的功能只是示例,我需要更多的下游分析,其中来自不同数据帧的变量的名称可以是/应该是相同的。我期待着一个程序员的解决方案。

what is best practice in this situation? can I put attach the dataframe inside the function? what could be disadvantages or potential conflicts / dangers on doing so? See the masked statement in output. I believe once it is attached will remain attached reminder of session right? THe function provided here is just example, I need more downstream analysis where names of variables from different dataframe can be / should be identical. I am expecting a programmer solution on this.

myfun <- function (dataframe, varA, varB) {
              attach(dataframe)
                 daf2 <- data.frame (A = A*B, B= C*D)
              anv1 <- lm(varA ~ varB, daf2)
              return(anova(anv1))
             }             

myfun (dataframe = dataf, varA = A, varB = B)

The following object(s) are masked from 'dataframe (position 3)':

    A, B, C, D
Analysis of Variance Table

Response: varA
          Df Sum Sq Mean Sq    F value    Pr(>F)    
varB       1   82.5    82.5 1.3568e+33 < 2.2e-16 ***
Residuals  8    0.0     0.0                         
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Warning message:
In anova.lm(anv1) :
  ANOVA F-tests on an essentially perfect fit are unreliable


推荐答案

(见我添加的评论)你原来的功能和调用,假设你的意思是将你感兴趣的列的名称传递给函数:

Let's investigate (see the comments I added) you original function and call, assuming you mean to pass the names of you columns of interest to the function:

myfun <- function (dataframe, varA, varB) {
              #on this next line, you use A and B. But this should be what is
              #passed in as varA and varB, no?
              daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
              #so, as a correction, we need:
              colnames(daf2)<-c(varA, varB)
              #the first argument to lm is a formula. If you use it like this,
              #it refers to columns with _names_ varA and varB, not as names
              #the _contents_ of varA and varB!!
              anv1 <- lm(varA ~ varB, daf2)
              #so, what we really want, is to build a formula with the contents
              #of varA and varB: we have to this by building up a character string:
              frm<-paste(varA, varB, sep="~")
              anv1 <- lm(formula(frm), daf2)
              print(anova(anv1)) 
             }             
#here, you pass A and B, because you are used to being able to do that in a formula
#(like in lm). But in a formula, there is a great deal of work done to make that
#happen, that doesn't work for most of the rest of R, so you need to pass the names
#again as character strings:
myfun (dataframe = dataf, varA = A, varB = B)
#becomes:
myfun (dataframe = dataf, varA = "A", varB = "B")

注意:在上面,我留下了原始代码,所以你可能需要删除一些以避免你最初得到的错误。您的问题的本质在于您应该始终将列名称作为字符传递,并将其用于此。这是R中公式的句法糖让人们变得不好的习惯和误解的地方之一...

Note: in the above, I left the original code in place, so you may have to remove some of that to avoid the errors you were originally getting. The essence of your problems is that you should always pass column names as characters, and use them as such. This is one of the places where the syntactic sugar of formulas in R gets people into bad habits and misunderstandings...

现在,作为替代方案:唯一的地方实际使用变量名,在公式中。因此,如果您不介意稍后可以清理的结果中的轻微化妆品差异,您可以进一步简化事项:不需要传递列名称。

Now, as for an alternative: the only place the variable names are actually used, are in the formula. As such, you can simplify matters further if you don't mind some slight cosmetic differences in the results that you can clean up later: there is no need for you to pass along the column names!!

myfun <- function (dataframe) {
              daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
              #now we know that columns A and B simply exist in data.frame daf2!!
              anv1 <- lm(A ~ B, daf2)
              print(anova(anv1))
             }             

作为最后一条建议:我不会在最后一条语句中打印:如果没有,但直接从R命令行使用这种方法,它将执行无论如何打印给你另外,您可以使用方法返回的对象执行进一步的工作。

As a final piece of advice: I would refrain from calling print on your last statement: if you don't, but use this method directly from the R command line, it will perform the print for you anyway. As an added advantage, you can perform further work with the object returned from your method.

已审核的已清理功能:

Cleaned Function with trial:

dataf <- data.frame (A= 1:10, B= 21:30, C= 51:60, D = 71:80)
myfun <- function (dataframe, varA, varB) {
               frm<-paste(varA, varB, sep="~")
               anv1 <- lm(formula(frm), dataframe)
               anova(anv1)
             }
 myfun (dataframe = dataf, varA = "A", varB = "B")
  myfun (dataframe = dataf, varA = "A", varB = "D")
    myfun (dataframe = dataf, varA = "B", varB = "C")

这篇关于R:在通用(通用)功能的功能参数中指定变量名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆