从数据框中提取公式中的变量 [英] extract variables in formula from a data frame

查看:75
本文介绍了从数据框中提取公式中的变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一些术语的公式,以及一个包含所有这些术语以及更多术语的数据框(较早的model.frame()调用的输出).我希望模型框架的子集仅包含出现在公式中的变量.

I have a formula that contains some terms and a data frame (the output of an earlier model.frame() call) that contains all of those terms and some more. I want the subset of the model frame that contains only the variables that appear in the formula.

ff <- log(Reaction) ~ log(1+Days) + x + y
fr <- data.frame(`log(Reaction)`=1:4,
                 `log(1+Days)`=1:4,
                 x=1:4,
                 y=1:4,
                 z=1:4,
                 check.names=FALSE)

期望的结果是fr减去z列(fr[,1:4]正在作弊-我需要一个程序化的解决方案...)

The desired result is fr minus the z column (fr[,1:4] is cheating -- I need a programmatic solution ...)

一些不起作用的策略:

fr[all.vars(ff)]
## Error in `[.data.frame`(fr, all.vars(ff)) : undefined columns selected

(因为all.vars()获得"Reaction",而不是log("Reaction"))

stripwhite <- function(x) gsub("(^ +| +$)","",x)
vars <- stripwhite(unlist(strsplit(as.character(ff)[-1],"\\+")))
fr[vars]
## Error in `[.data.frame`(fr, vars) : undefined columns selected

(因为在+上进行拆分会虚假地拆分log(1+Days)项).

(because splitting on + spuriously splits the log(1+Days) term).

我一直在考虑走公式的解析树:

I've been thinking about walking down the parse tree of the formula:

ff[[3]]       ## log(1 + Days) + x + y
ff[[3]][[1]]  ## `+`
ff[[3]][[2]]  ## log(1 + Days) + x

但是我还没有一个解决方案,好像我要去钻一个兔子洞了.想法?

but I haven't got a solution put together, and it seems like I'm going down a rabbit hole. Ideas?

推荐答案

这应该有效:

> fr[gsub(" ","",rownames(attr(terms.formula(ff), "factors")))]
  log(Reaction) log(1+Days) x y
1             1           1 1 1
2             2           2 2 2
3             3           3 3 3
4             4           4 4 4

还有向罗马·卢斯特里克(RomanLuštrik)指示我正确方向的道具.

And props to Roman Luštrik for pointing me in the right direction.

看起来您也可以将其从变量"属性中拉出:

Looks like you could pull it out off the "variables" attribute as well:

fr[gsub(" ","",attr(terms(ff),"variables")[-1])]

发现第一个问题案例,涉及I()offset():

Edit 2: Found first problem case, involving I() or offset():

ff <- I(log(Reaction)) ~ I(log(1+Days)) + x + y
fr[gsub(" ","",attr(terms(ff),"variables")[-1])]

但是,使用正则表达式可以很容易地纠正这些问题.但是,如果您遇到这样的情况,例如在问题中调用了一个变量,例如log(x),并且该变量在公式中与I(log(y))一起用于变量y,则会变得非常混乱.

Those would be pretty easy to correct with regex, though. BUT, if you had situations like in the question where a variable is called, e.g., log(x) and is used in a formula alongside something like I(log(y)) for variable y, this will get really messy.

这篇关于从数据框中提取公式中的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆