R等效于Stata的for宏循环 [英] R equivalent of Stata's for-loop over macros
问题描述
我有一个变量 x
,它在0到1之间,或(0,1].我想为变量 x
的10个十分位数生成10个虚拟变量.例如,如果 x 介于 0 和 0.1 之间,x_0_10
取值为 1,如果 x 介于 0.1 和 0.2 之间,x_10_20
取值 1,...
I have a variable x
that is between 0 and 1, or (0,1].
I want to generate 10 dummy variables for 10 deciles of variable x
. For example x_0_10
takes value 1 if x is between 0 and 0.1, x_10_20
takes value 1 if x is between 0.1 and 0.2, ...
上面要做的Stata代码是这样的:
The Stata code to do above is something like this:
forval p=0(10)90 {
local Next=`p'+10
gen x_`p'_`Next'=0
replace x_`p'_`Next'=1 if x<=`Next'/100 & x>`p'/100
}
现在,我是R的新手,我想知道如何在R上做到上述几点?
Now, I am new at R and I wonder how I can do above in R?
推荐答案
cut
是您的朋友;它的输出是一个 factor
,在模型中使用时,R会自动将其扩展为10个虚拟变量.
cut
is your friend here; its output is a factor
, which, when used in models, R will auto-expand into the 10 dummy variables.
set.seed(2932)
x = runif(1e4)
y = 3 + 4 * x + rnorm(1e4)
x_cut = cut(x, 0:10/10, include.lowest = TRUE)
summary(lm(y ~ x_cut))
# Call:
# lm(formula = y ~ x_cut)
#
# Residuals:
# Min 1Q Median 3Q Max
# -3.7394 -0.6888 0.0028 0.6864 3.6742
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 3.16385 0.03243 97.564 <2e-16 ***
# x_cut(0.1,0.2] 0.43932 0.04551 9.654 <2e-16 ***
# x_cut(0.2,0.3] 0.85555 0.04519 18.933 <2e-16 ***
# x_cut(0.3,0.4] 1.26441 0.04588 27.556 <2e-16 ***
# x_cut(0.4,0.5] 1.66181 0.04495 36.970 <2e-16 ***
# x_cut(0.5,0.6] 2.04538 0.04574 44.714 <2e-16 ***
# x_cut(0.6,0.7] 2.44771 0.04533 53.999 <2e-16 ***
# x_cut(0.7,0.8] 2.80875 0.04591 61.182 <2e-16 ***
# x_cut(0.8,0.9] 3.22323 0.04545 70.919 <2e-16 ***
# x_cut(0.9,1] 3.60092 0.04564 78.897 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 1.011 on 9990 degrees of freedom
# Multiple R-squared: 0.5589, Adjusted R-squared: 0.5585
# F-statistic: 1407 on 9 and 9990 DF, p-value: < 2.2e-16
有关更多自定义内容,请参见?cut
See ?cut
for more customizations
您还可以直接在公式的RHS中传递 cut
,这将使使用 predict
更加容易:
You can also pass cut
directly in the RHS of the formula, which would make using predict
a bit easier:
reg = lm(y ~ cut(x, 0:10/10, include.lowest = TRUE))
idx = sample(length(x), 500)
plot(x[idx], y[idx])
x_grid = seq(0, 1, length.out = 500L)
lines(x_grid, predict(reg, data.frame(x = x_grid)),
col = 'red', lwd = 3L, type = 's')
这篇关于R等效于Stata的for宏循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!