R:遍历列名 [英] R: looping through column names

查看:238
本文介绍了R:遍历列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Stata用户,她试图切换到R,并经常遇到初学者的困扰。我一直在尝试(并且失败)做循环几天,现在我投降了。
我想做的事情(循环):

I am a Stata user trying to switch to R and having the usual beginner's struggle. I have been trying (and failing) to do a loop for a few days and I now surrender. What I want to do (in a loop):


  • 从变量名列表开始

  • start from a list of variable names

创建一个新变量

根据现有值重新编码该新变量变量

recode that new variable(s) based on the value of existing variables

可能使用dplyr语法这样做,但这不是必需的,
仅是为了与我的其余代码保持一致。

possibly do so using the dplyr syntax, but this is not essential, only for consistency with the rest of my code.

这是我要做的事的典型示例。在我的实际数据中,x.x和x.y变量源自应用于2个现有数据帧的join函数。

Here is a stylised example of what I am trying to do. In my actual data, the x.x and x.y variables originate from the join function applied to 2 existing data frames.

N <- 1000
  df  <- data.frame(x1 = rnorm(N),
x2.x = rnorm(N)+2,x2.y = rnorm(N)-2,
x3.x = rnorm(N)+3,x3.y = rnorm(N)-3)

varlist <- c("x2","x3")
lapply(varlist, function(x) {
   df <- df %>% mutate(x = ifelse(x1 < 0, paste0(x,".y"),paste0(x,".x")) # generate varialble "x" values from existing x.x and x.y
  })

运行代码的lapply部分时出现错误消息

When I run the lapply part of the code I get the error message


错误:
中的意外}'df<-df%>%mutate(x = ifelse(x1< 0,paste0(x,)。 y),paste0(x,。x))#根据现有xx和xy生成变量 x值
}

Error: unexpected '}' in: " df <- df %>% mutate(x = ifelse(x1 < 0, paste0(x,".y"),paste0(x,".x")) # generate varialble "x" values from existing x.x and x.y }"

即使应该预期...我确定我的代码中存在许多错误,部分原因是我习惯于Stata中的宏,而R中没有直接等效的宏。无论如何,如果你可以指出正确的方向

even though it should be expected... I am sure there a number of mistakes in my code, and that's partly because I am used to macros in Stata for which there is no direct equivalent in R. Anyway, if you can point me in the right direction it would be fantastic!

推荐答案

代码不起作用的原因是您的 paste0(x, .y)实际上是将 x .y 粘贴在一起。就是这样,您并不是要告诉它按该列对数据进行子集化。

The reason your code doesn't work is that your paste0(x, ".y") is literally pasting the x with .y. And that's it, you're not telling it to subset the data by that column.

您实际上应该做的是根据生成的列名对数据进行子集设置通过 paste0(x, .y)。因此,例如,要获取数据列 x2.y ,您可以

What you actually should be doing is subsetting the data according to the column name that's generated by paste0(x, ".y"). So for example, to get the column of data x2.y you can go

df[, paste0(varlist[1], ".y")]
## and of course the same can be done for second item of varlist
# df[, paste0(varlist[2], ".y")]

现在我们知道如何通过变量名对列进行子集化了,并且由于您想学习如何循环编写,我们可以替换 varlist [1] (和 varlist [2] )并带有'looping'变量

Now we know how to subset columns by a variable name, and because you want to learn how to write it in a loop, we can replace the numbers in varlist[1] (and varlist[2]) with a 'looping' variable

这里有两种方法,一种是使用 for 循环,另一种是另一个使用 sapply

Here are two ways to do it, one using a for loop, and the other using sapply

for(i in varlist){
  df[, i] <- ifelse(df[, "x1"] < 0, df[, paste0(i, ".y")], df[, paste0(i, ".x")])
}

head(df)
#            x1       x2.x       x2.y     x3.x       x3.y         x2        x3
# 1 -0.56047565  1.0042013 -2.5116037 2.849693 -2.8034502 -2.5116037 -2.803450
# 2 -0.23017749  0.9600450 -1.7630621 2.672243 -2.3498868 -1.7630621 -2.349887
# 3  1.55870831  1.9820198 -2.5415892 1.551835 -2.3289958  1.9820198  1.551835
# 4  0.07050839  1.8678249 -0.7807724 2.302715 -4.2841578  1.8678249  2.302715
# 5  0.12928774 -0.5493428 -1.8258641 5.598490 -5.0261096 -0.5493428  5.598490
# 6  1.71506499  3.0405735 -2.6152683 2.962585 -0.7946739  3.0405735  2.962585



< h3> sapply

您也可以使用 * apply 进行此操作,在这种情况下,我正在使用 sapply ,以便简化结果(而 lapply 会返回列表)

sapply

You can also do this using an *apply, and in this instance I'm using sapply so that it 'simplifies' the result (whereas an lapply would return lists)

df[, varlist] <- sapply(varlist, function(x){
   ifelse(df[, "x1"] < 0, df[, paste0(x, ".y")], df[, paste0(x, ".x")])
})

head(df)
#            x1       x2.x       x2.y     x3.x       x3.y         x2        x3
# 1 -0.56047565  1.0042013 -2.5116037 2.849693 -2.8034502 -2.5116037 -2.803450
# 2 -0.23017749  0.9600450 -1.7630621 2.672243 -2.3498868 -1.7630621 -2.349887
# 3  1.55870831  1.9820198 -2.5415892 1.551835 -2.3289958  1.9820198  1.551835
# 4  0.07050839  1.8678249 -0.7807724 2.302715 -4.2841578  1.8678249  2.302715
# 5  0.12928774 -0.5493428 -1.8258641 5.598490 -5.0261096 -0.5493428  5.598490
# 6  1.71506499  3.0405735 -2.6152683 2.962585 -0.7946739  3.0405735  2.962585






数据




Data

set.seed(123)   ## setting the seed as we're sampling
N <- 1000
df  <- data.frame(x1 = rnorm(N),
                  x2.x = rnorm(N)+2,x2.y = rnorm(N)-2,
                  x3.x = rnorm(N)+3,x3.y = rnorm(N)-3)

这篇关于R:遍历列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆