从数据框中提取公式中的变量 [英] extract variables in formula from a data frame
问题描述
我有一个包含一些术语的公式,以及一个包含所有这些术语以及更多术语的数据框(较早的model.frame()
调用的输出).我希望模型框架的子集仅包含出现在公式中的变量.
I have a formula that contains some terms and a data frame (the output of an earlier model.frame()
call) that contains all of those terms and some more. I want the subset of the model frame that contains only the variables that appear in the formula.
ff <- log(Reaction) ~ log(1+Days) + x + y
fr <- data.frame(`log(Reaction)`=1:4,
`log(1+Days)`=1:4,
x=1:4,
y=1:4,
z=1:4,
check.names=FALSE)
期望的结果是fr
减去z
列(fr[,1:4]
正在作弊-我需要一个程序化的解决方案...)
The desired result is fr
minus the z
column (fr[,1:4]
is cheating -- I need a programmatic solution ...)
一些不起作用的策略:
fr[all.vars(ff)]
## Error in `[.data.frame`(fr, all.vars(ff)) : undefined columns selected
(因为all.vars()
获得"Reaction"
,而不是log("Reaction")
)
stripwhite <- function(x) gsub("(^ +| +$)","",x)
vars <- stripwhite(unlist(strsplit(as.character(ff)[-1],"\\+")))
fr[vars]
## Error in `[.data.frame`(fr, vars) : undefined columns selected
(因为在+
上进行拆分会虚假地拆分log(1+Days)
项).
(because splitting on +
spuriously splits the log(1+Days)
term).
我一直在考虑走公式的解析树:
I've been thinking about walking down the parse tree of the formula:
ff[[3]] ## log(1 + Days) + x + y
ff[[3]][[1]] ## `+`
ff[[3]][[2]] ## log(1 + Days) + x
但是我还没有一个解决方案,好像我要去钻一个兔子洞了.想法?
but I haven't got a solution put together, and it seems like I'm going down a rabbit hole. Ideas?
推荐答案
这应该有效:
> fr[gsub(" ","",rownames(attr(terms.formula(ff), "factors")))]
log(Reaction) log(1+Days) x y
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
还有向罗马·卢斯特里克(RomanLuštrik)指示我正确方向的道具.
And props to Roman Luštrik for pointing me in the right direction.
看起来您也可以将其从变量"属性中拉出:
Looks like you could pull it out off the "variables" attribute as well:
fr[gsub(" ","",attr(terms(ff),"variables")[-1])]
发现第一个问题案例,涉及I()
或offset()
:
Edit 2: Found first problem case, involving I()
or offset()
:
ff <- I(log(Reaction)) ~ I(log(1+Days)) + x + y
fr[gsub(" ","",attr(terms(ff),"variables")[-1])]
但是,使用正则表达式可以很容易地纠正这些问题.但是,如果您遇到这样的情况,例如在问题中调用了一个变量,例如log(x)
,并且该变量在公式中与I(log(y))
一起用于变量y
,则会变得非常混乱.
Those would be pretty easy to correct with regex, though. BUT, if you had situations like in the question where a variable is called, e.g., log(x)
and is used in a formula alongside something like I(log(y))
for variable y
, this will get really messy.
这篇关于从数据框中提取公式中的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!