使用字符串名称功能性地创建变量 [英] Functionally creating variables using string names

查看:136
本文介绍了使用字符串名称功能性地创建变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图生成一个函数来在数据框上创建一堆具有相同命名约定并使用相同逻辑的列。不幸的是,我在创建变量时遇到了一些奇怪的行为,我希望别人能够解释这里发生了什么。

  df < -  data.frame(var1 = c(1,2,3),var2 = c(3,4,5),var3 = c(foo,bar,baz))

DoesNotWork< - 函数(df,varname){
df [paste(varname,_square,sep =)] < - df [varname] ^ 2
返回(df)
}

dfBad< - DoesNotWork(df,var1)

dfBad
var1 var2 var3 var1
1 1 3 foo 1
2 2 4 bar 4
3 3 5 baz 9



dfBad 这里有两个名为 var1 的变量,而不是一个名为 var1 和一个名为 var1_squared 的变量,正如我所希望的那样。

下面的函数通过将原始变量的所有值分配给新变量名称,然后仅对新变量执行相同的操作,这是令人讨厌的,我不确定如果我需要使用来自多个变量的逻辑会发生什么。

  Works <  -  function(df,varname){
df [paste(varname,_square,sep =)] < - df [varname]
df [paste(varname,_square ,sep =)]< - df [paste(varname,_square,sep =)] ^ 2
return(df)
}

dfGood < - Works(df,var1)

dfGood
var1 var2 var3 var1_square
1 1 3 foo 1
2 2 4 bar 4
3 3 5 baz 9

如果有更好的方法切换,字符串之间用于变量名称和引用列对象。



  df < -  

data.frame(var1 = c(1,2,3),var2 = c(3,4,5),var3 = c(foo,bar,baz))

NowItWorks< - function(df,varname){
df [,paste(varname,_square,sep =)] < - df [,varname] ^ 2
return(df )
}

NowItWorks(df,var1)

> var1 var2 var3 var1_square
1 1 3 foo 1
2 2 4 bar 4
3 3 5 baz 9

编辑:好的,所以我的上面的答案确实有效,但它并没有真正回答第二个问题的原因。



例如:

pre $ MultiplicationWorks< - function(df,varname){
df [paste(varname, _square,sep =)]< - df [varname] * 2
return(df)
}

与其他所有非指数运算符一样。如果我们看一下data.frame运营商的源代码,我们会在底部看到这个有趣的位:

  Ops.data.frame 
$ b ...
if(.Generic%in%c(+, - ,*,/,%%,%/% )){
names(value)< - cn
data.frame(value,row.names = rn,check.names = FALSE,
check.rows = FALSE)
}
else矩阵(unlist(value,recursive = FALSE,use.names = FALSE),
nrow = nr,dimnames = list(rn,cn))
...

基本上这就是说如果运算符是列出的运算符之一,那么返回一个data.frame与给定的名称,否则返回具有给定名称的矩阵。出于某种原因,^运算符是唯一没有列出的运算符。我们可以很容易地证实这一点:

  df < -  data.frame(var1 = c(1,2,3), var2 = c(3,4,5),var3 = c(foo,bar,baz))

class(df [var1] * 2)

> [1]data.frame

class(df [var1] ^ 2)

> [1]matrix

使用指数exponentiaton和 only 指数,矩阵的变形名称当您分配它时,data.frame的新列名称。 R很奇怪。这意味着你也可以通过在你的指数部分包装一个 as.data.frame()来获得你的代码。



如果您想使用您的初始函数来真的奇怪:

 ❥名称(dfBad)
[1]var1var2var3var1_square
❥dfBad
var1 var2 var3 var1
1 1 3 foo 1
2 2 4 bar 4
3 3 5 baz 9
❥str(dfBad)
'data.frame':3 obs。 4个变量:
$ var1:num 1 2 3
$ var2:num 3 4 5
$ var3:因子w / 3等级bar,baz,foo: 3 1 2
$ var1_square:num [1:3,1] 1 4 9
..- attr(*,dimnames)= 2
.. .. $: NULL
.. $:$ chrvar1

/ em>列的正确名称,但会显示您插入的矩阵的名称。


I'm trying to generate a function to create a bunch of columns on a data frame that have the same naming conventions and use the same logic. Unfortunately, I've bumped into some weird behavior when creating the variables, and I am hopeful someone else can explain what's going on here.

df <- data.frame(var1 = c(1,2,3), var2 = c(3,4,5), var3 = c("foo", "bar", "baz"))

DoesNotWork <- function(df, varname){
  df[paste(varname, "_square", sep = "")] <- df[varname]^2
  return(df)
}

dfBad <- DoesNotWork(df, "var1")

dfBad
      var1 var2 var3 var1
  1    1    3  foo    1
  2    2    4  bar    4
  3    3    5  baz    9

dfBad here has two variables called var1 rather than one variable called var1 and one variable called var1_squared as I had hoped.

The function below hacks around this problem by assigning all of the values of the original variable to the new variable name, then performing the same operation on only the new variable, but this is sort of obnoxious, and I'm not sure what would happen if I needed to use logic from multiple variables.

Works <- function(df, varname){
   df[paste(varname, "_square", sep = "")] <- df[varname]
   df[paste(varname, "_square", sep = "")] <- df[paste(varname, "_square", sep = "")]^2
   return(df)
}

dfGood <- Works(df, "var1")

dfGood
      var1 var2 var3 var1_square
  1    1    3  foo           1
  2    2    4  bar           4
  3    3    5  baz           9

Any guidance here would be greatly appreciated, especially if there's a nicer way to switch between strings for variable names and references to the column-objects.

解决方案

You're missing the commas.

df <- data.frame(var1 = c(1,2,3), var2 = c(3,4,5), var3 = c("foo", "bar", "baz"))

NowItWorks <- function(df, varname){
  df[,paste(varname, "_square", sep = "")] <- df[,varname]^2
  return(df)
}

NowItWorks(df, "var1")

>  var1 var2 var3 var1_square
 1    1    3  foo           1
 2    2    4  bar           4
 3    3    5  baz           9

EDIT: Ok so my above answer does work, but it does not really answer the question as to why the second one works.

For example:

MultiplicationWorks <- function(df, varname){
  df[paste(varname, "_square", sep = "")] <- df[varname]*2
  return(df)
}

As do all the other non exponential operators. If we look at the data.frame Operators source code, we see this interesting bit at the bottom:

Ops.data.frame

...
if (.Generic %in% c("+", "-", "*", "/", "%%", "%/%")) {
    names(value) <- cn
    data.frame(value, row.names = rn, check.names = FALSE,
        check.rows = FALSE)
}
else matrix(unlist(value, recursive = FALSE, use.names = FALSE),
    nrow = nr, dimnames = list(rn, cn))
...

Basically this is saying that if the operator is one of those listed, then return a data.frame with the given names, otherwise return a matrix with the given names. For some reason, the "^" operator is the only one not listed. We can confirm this pretty easily:

df <- data.frame(var1 = c(1,2,3), var2 = c(3,4,5), var3 = c("foo", "bar", "baz"))

class(df["var1"]*2)

>[1] "data.frame"

class(df["var1"]^2)

>[1] "matrix"

With exponentiaton, and only with exponentiation, the dimnames of the matrix overrule the new column name of your data.frame when you assign it. R is weird. Comically this means that you could also get your code to work by wrapping an as.data.frame() around your exponentiation part.

If you want to see something really strange using your initial function:

❥ names(dfBad)
[1] "var1"        "var2"        "var3"        "var1_square"
❥ dfBad
  var1 var2 var3 var1
1    1    3  foo    1
2    2    4  bar    4
3    3    5  baz    9
❥ str(dfBad)
'data.frame':   3 obs. of  4 variables:
 $ var1       : num  1 2 3
 $ var2       : num  3 4 5
 $ var3       : Factor w/ 3 levels "bar","baz","foo": 3 1 2
 $ var1_square: num [1:3, 1] 1 4 9
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "var1"

R knows the column's correct name, but shows you the name of the matrix you stuck into it.

这篇关于使用字符串名称功能性地创建变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆