R 生成所有可能的交互变量 [英] R generate all possible interaction variables
问题描述
我有一个带有变量的数据框,比如 a,b,c,d
I have a dataframe with variables, say a,b,c,d
dat <- data.frame(a=runif(1e5), b=runif(1e5), c=runif(1e5), d=runif(1e5))
并希望生成每列之间所有可能的双向交互项,即:ab、ac、ad、bc、bd、cd.实际上,我的数据框有 100 多列,因此我无法手动编码.执行此操作的最有效方法是什么(注意我不想要 ab 和 ba)?
and would like to generate all possible two-way interaction terms between each of the columns, that is: ab, ac, ad, bc, bd, cd. In reality my dataframe has over 100 columns, so I cannot code this manually. What is the most efficient way to do this (noting that I do not want both ab and ba)?
推荐答案
您打算如何处理所有这些交互术语?有多种选择,哪种选择最好取决于您要尝试做什么.
What do you plan to do with all these interaction terms? There are several options, which is best will depend on what you are trying to do.
如果你想将交互传递给像 lm
或 aov
这样的建模函数,那么它非常简单,只需使用 .^2
语法:
If you want to pass the interactions to a modeling function like lm
or aov
then it is very simple, just use the .^2
syntax:
fit <- lm( y ~ .^2, data=mydf )
上面将调用 lm
并告诉它适合 mydf
中不包括 y
的变量的所有主效应和所有 2 向交互.
The above will call lm
and tell it to fit all the main effects and all 2 way interaction for the variables in mydf
excluding y
.
如果出于某种原因你真的想计算所有的交互,那么你可以使用model.matrix
:
If for some reason you really want to calculate all the interactions then you can use model.matrix
:
tmp <- model.matrix( ~.^2, data=iris)
这将包括截距列和主效应列,但如果您不想要它们,您可以删除它们.
This will include a column for the intercept and columns for the main effects, but you can drop those if you don't want them.
如果您需要与建模不同的东西,那么您可以使用 combn
函数,正如@akrun 在评论中提到的那样.
If you need something different from the modeling then you can use the combn
function as @akrun mentions in the comments.
这篇关于R 生成所有可能的交互变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!