解析 R 与 Stata 中的变量名称 [英] Parse variable names in R vs Stata

查看:71
本文介绍了解析 R 与 Stata 中的变量名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个变量名称族,它只在最后四个位置(年)发生变化,我想一次创建所有变量来寻址这个族.

I have a variable name family that only changes in the last four positions (years) and I would like to create variables addressing this family all at once.

在 Stata 中,我会这样做:

In Stata I would simply do this:

forvalues n=1991(1)1995 {
gen comp`n’== (year_begin<`n’ & (year_end>`n’ | year_end==.))
}

这是我在 R 中所做的:

Here’s what I’m doing in R:

data$comp1991<-ifelse(year(data$date_begin)<1991 & (year(data$date_end)>1991|is.na(data$date_end)),1,0)

data$comp1992<-ifelse(year(data$date_begin)<1992 & (year(data$date_end)>1992|is.na(data$date_end)),1,0)

data$comp1993<-ifelse(year(data$date_begin)<1993 & (year(data$date_end)>1993|is.na(data$date_end)),1,0)

data$comp1994<-ifelse(year(data$date_begin)<1994 & (year(data$date_end)>1994|is.na(data$date_end)),1,0)

data$comp1995<-ifelse(year(data$date_begin)<1995 & (year(data$date_end)>1995|is.na(data$date_end)),1,0)

所以在 Stata 中,我只有一行代码,而在 R 中,我需要一遍又一遍地重复这一行,手动更改 `n'.

So in Stata, I only have really one line of code, whereas in R, I need to repeat this line over and over, changing the `n’ manually.

有没有办法在 R 中更有效地做到这一点?(我正在考虑将循环与 eval(parse()) 结合,但不确定.任何想法将不胜感激:

Is there a way to do this more efficiently in R? (I am thinking some combination of a loop with eval(parse()) but not sure. Any ideas will be appreciated:

推荐答案

要详细说明一些评论,最接近您提供的 Stata 循环的等效项是:

To elaborate on some of the comments, closest equivalent of the Stata loop you provided would be:

for(n in seq(1991, 1995)) {
    data[[paste0('comp', n)]] <- year(data$date_begin)<1991 & (year(data$date_end)>1991 | is.na(data$date_end))
}

条件语句在 Stata 中将返回 0 和 1,但在 R 中返回 FALSE 和 TRUE.不过两者之间没有实际区别;你仍然可以对它们进行同样的操作.

The conditional statement will return zero and one in Stata, but FALSE and TRUE in R. There's no practical difference between the two though; you can still operate on them the same.

如果你想让循环更更多类似于Stata代码,你可以使用清除一些对对象data的重复引用>data.table 包:

If you want to make the loop even more similar to the Stata code, you could clean up some of the repetitive references to the object data by using the data.table package:

library(data.table)
data <- data.table(data)
for(n in seq(1991, 1995)) {
    data[, paste0('comp',n) := year(date_begin)<1991 & (year(date_end)>1991 | is.na(date_end)]
}

这篇关于解析 R 与 Stata 中的变量名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆