R等效于Stata的for宏循环 [英] R equivalent of Stata's for-loop over macros

查看:50
本文介绍了R等效于Stata的for宏循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个变量 x ,它在0到1之间,或(0,1].我想为变量 x 的10个十分位数生成10个虚拟变量.例如,如果 x 介于 0 和 0.1 之间,x_0_10 取值为 1,如果 x 介于 0.1 和 0.2 之间,x_10_20 取值 1,...

I have a variable x that is between 0 and 1, or (0,1]. I want to generate 10 dummy variables for 10 deciles of variable x. For example x_0_10 takes value 1 if x is between 0 and 0.1, x_10_20 takes value 1 if x is between 0.1 and 0.2, ...

上面要做的Stata代码是这样的:

The Stata code to do above is something like this:

forval p=0(10)90 {
    local Next=`p'+10
    gen x_`p'_`Next'=0
    replace x_`p'_`Next'=1 if x<=`Next'/100 & x>`p'/100
}

现在,我是R的新手,我想知道如何在R上做到上述几点?

Now, I am new at R and I wonder how I can do above in R?

推荐答案

cut 是您的朋友;它的输出是一个 factor ,在模型中使用时,R会自动将其扩展为10个虚拟变量.

cut is your friend here; its output is a factor, which, when used in models, R will auto-expand into the 10 dummy variables.

set.seed(2932)

x = runif(1e4)
y = 3 + 4 * x + rnorm(1e4)

x_cut = cut(x, 0:10/10, include.lowest = TRUE)

summary(lm(y ~ x_cut))
# Call:
# lm(formula = y ~ x_cut)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -3.7394 -0.6888  0.0028  0.6864  3.6742 
# 
# Coefficients:
#                Estimate Std. Error t value Pr(>|t|)    
# (Intercept)     3.16385    0.03243  97.564   <2e-16 ***
# x_cut(0.1,0.2]  0.43932    0.04551   9.654   <2e-16 ***
# x_cut(0.2,0.3]  0.85555    0.04519  18.933   <2e-16 ***
# x_cut(0.3,0.4]  1.26441    0.04588  27.556   <2e-16 ***
# x_cut(0.4,0.5]  1.66181    0.04495  36.970   <2e-16 ***
# x_cut(0.5,0.6]  2.04538    0.04574  44.714   <2e-16 ***
# x_cut(0.6,0.7]  2.44771    0.04533  53.999   <2e-16 ***
# x_cut(0.7,0.8]  2.80875    0.04591  61.182   <2e-16 ***
# x_cut(0.8,0.9]  3.22323    0.04545  70.919   <2e-16 ***
# x_cut(0.9,1]    3.60092    0.04564  78.897   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 1.011 on 9990 degrees of freedom
# Multiple R-squared:  0.5589,  Adjusted R-squared:  0.5585 
# F-statistic:  1407 on 9 and 9990 DF,  p-value: < 2.2e-16

有关更多自定义内容,请参见?cut

See ?cut for more customizations

您还可以直接在公式的RHS中传递 cut ,这将使使用 predict 更加容易:

You can also pass cut directly in the RHS of the formula, which would make using predict a bit easier:

reg = lm(y ~ cut(x, 0:10/10, include.lowest = TRUE))
idx = sample(length(x), 500)
plot(x[idx], y[idx])

x_grid = seq(0, 1, length.out = 500L)
lines(x_grid, predict(reg, data.frame(x = x_grid)), 
      col = 'red', lwd = 3L, type = 's')

这篇关于R等效于Stata的for宏循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆