如何将变量名和另一个变量的数据框与数据进行匹配以进行回归? [英] How to match a data frame of variable names and another with data for a regression?
问题描述
我有两个数据框:
x = data.frame(Var1= c("A", "B", "C", "D","E"),Var2=c("F","G","H","I","J"),
Value= c(11, 12, 13, 14,18))
y = data.frame(A= c(11, 12, 13, 14,18), B= c(15, 16, 17, 14,18),C= c(17, 22, 23, 24,18), D= c(11, 12, 13, 34,18),E= c(11, 5, 13, 55,18), F= c(8, 12, 13, 14,18),G= c(7, 5, 13, 14,18),
H= c(8, 12, 13, 14,18), I= c(9, 5, 13, 14,18), J= c(11, 12, 13, 14,18))
Var3 <- rep("time", each=length(x$Var1))
x=cbind(x,Var3)
time=seq(1:length(y[,1]))
y=cbind(y,time)
> x
Var1 Var2 Value Var3
1 A F 11 time
2 B G 12 time
3 C H 13 time
4 D I 14 time
5 E J 18 time
> y
A B C D E F G H I J time
1 11 15 17 11 11 8 7 8 9 11 1
2 12 16 22 12 5 12 5 12 5 12 2
3 13 17 23 13 13 13 13 13 13 13 3
4 14 14 24 34 55 14 14 14 14 14 4
5 18 18 18 18 18 18 18 18 18 18 5
看着x
DF,我将变量A
和F
作为第一行.我想在y
DF中选择这两个变量并实现简单的回归:lm(A ~ F, data = y)
,然后将结果保存在列表的第一个位置.我将对x
DF的第二行执行相同的操作,以实现回归lm(B ~ G, data = y)
.
Looking at x
DF, I have variable A
and F
as the first row. I want to select these two variables in y
DF and implement a simple regression: lm(A ~ F, data = y)
, and save the result in the first position of a list. I will do the same with the second row of x
DF implementing a regression lm(B ~ G, data = y)
.
如何将x
中的变量名称与y
中的数据匹配以进行回归?
How could I match variables names in x
to data in y
for a regression?
经修订的问题:更复杂的回归Var1 ~ Var2 + Var3
如何?
Revised question: how about a more complicated regression Var1 ~ Var2 + Var3
?
推荐答案
x = data.frame(Var1= c("A", "B", "C", "D","E"),
Var2=c("F","G","H","I","J"),
Value= c(11, 12, 13, 14,18))
y = data.frame(A= c(11, 12, 13, 14,18),
B= c(15, 16, 17, 14,18),
C= c(17, 22, 23, 24,18),
D= c(11, 12, 13, 34,18),
E= c(11, 5, 13, 55,18),
F= c(8, 12, 13, 14,18),
G= c(7, 5, 13, 14,18),
H= c(8, 12, 13, 14,18),
I= c(9, 5, 13, 14,18),
J= c(11, 12, 13, 14,18))
我们可以使用
fitmodel <- function (RHS, LHS) do.call("lm", list(formula = reformulate(RHS, LHS),
data = quote(y)))
modList <- Map(fitmodel, as.character(x$Var2), as.character(x$Var1))
modList[[1]] ## for example
#Call:
#lm(formula = A ~ F, data = y)
#
#Coefficients:
#(Intercept) F
# 4.3500 0.7115
备注:
-
do.call
的使用是为了确保在传递给lm
时对reformulate
进行求值.这是需要的,因为它允许update
之类的功能在模型对象上正常工作.请参见在公式中显示字符串,而不是在lm fit中显示为变量.为了进行比较:
The use of
do.call
is to ensure thatreformulate
is evaluated when passed tolm
. This is desired as it allows functions likeupdate
to work correctly on the model object. See Showing string in formula and not as variable in lm fit. For a comparison:
oo <- Map(function (RHS, LHS) lm(reformulate(RHS, LHS), data = y),
as.character(x$Var2), as.character(x$Var1))
oo[[1]]
#Call:
#lm(formula = reformulate(RHS, LHS), data = y)
#
#Coefficients:
#(Intercept) F
# 4.3500 0.7115
x$Var1
和x$Var2
上的as.character
是必需的,因为这两个变量当前是因数"变量而不是字符串,并且reformulate
不能使用它们.如果在构建x
时将stringsAsFactors = FALSE
放在data.frame
中,则不会出现此类问题.
The as.character
on x$Var1
and x$Var2
is necessary, as these two variables are currently "factor" variables not strings and reformulate
can't use them. If you put stringsAsFactors = FALSE
in data.frame
when you build your x
, there is no such issue.
对您有用吗?不是应该有一个"for"循环吗?
It works for you? It's not suppose to have a "for" loop?
Map
函数隐藏该"for"循环.它是mapply
函数的包装. R中的*apply
系列功能是语法糖.
The Map
function hides that "for" loop. It is a wrapper of the mapply
function. The *apply
family functions in R are a syntactic sugar.
您最初的问题是将模型公式构造为Var1 ~ Var2
.
Your original question is constructs a model formula as Var1 ~ Var2
.
您的新问题需要Var1 ~ Var2 + Var3
.
x$Var3 <- rep("time", each=length(x$Var1))
y$time <- seq(1:length(y[,1]))
## collect multiple RHS variables (using concatenation function `c`)
RHS <- Map(base::c, as.character(x$Var2), as.character(x$Var3))
#str(RHS)
#List of 5 ## oh this list has names! annoying!!
# $ F: chr [1:2] "F" "time"
# $ G: chr [1:2] "G" "time"
# $ H: chr [1:2] "H" "time"
# $ I: chr [1:2] "I" "time"
# $ J: chr [1:2] "J" "time"
LHS <- as.character(x$Var1)
modList <- Map(fitmodel, RHS, LHS) ## `fitmodel` function unchanged
modList[[1]] ## for example
#Call:
#lm(formula = A ~ F + time, data = y)
#
#Coefficients:
#(Intercept) F time
# 5.6 0.5 0.5
这篇关于如何将变量名和另一个变量的数据框与数据进行匹配以进行回归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!