如何将变量名和另一个变量的数据框与数据进行匹配以进行回归? [英] How to match a data frame of variable names and another with data for a regression?

查看:98
本文介绍了如何将变量名和另一个变量的数据框与数据进行匹配以进行回归?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框:

x = data.frame(Var1= c("A", "B", "C", "D","E"),Var2=c("F","G","H","I","J"),
    Value= c(11, 12, 13, 14,18))

y = data.frame(A= c(11, 12, 13, 14,18), B= c(15, 16, 17, 14,18),C= c(17, 22, 23, 24,18), D= c(11, 12, 13, 34,18),E= c(11, 5, 13, 55,18),  F= c(8, 12, 13, 14,18),G= c(7, 5, 13, 14,18),
    H= c(8, 12, 13, 14,18), I= c(9, 5, 13, 14,18), J= c(11, 12, 13, 14,18))

Var3 <- rep("time", each=length(x$Var1))

x=cbind(x,Var3)

time=seq(1:length(y[,1]))
y=cbind(y,time)


> x
  Var1 Var2 Value Var3
1    A    F    11 time
2    B    G    12 time
3    C    H    13 time
4    D    I    14 time
5    E    J    18 time
> y
   A  B  C  D  E  F  G  H  I  J time
1 11 15 17 11 11  8  7  8  9 11    1
2 12 16 22 12  5 12  5 12  5 12    2
3 13 17 23 13 13 13 13 13 13 13    3
4 14 14 24 34 55 14 14 14 14 14    4
5 18 18 18 18 18 18 18 18 18 18    5

看着x DF,我将变量AF作为第一行.我想在y DF中选择这两个变量并实现简单的回归:lm(A ~ F, data = y),然后将结果保存在列表的第一个位置.我将对x DF的第二行执行相同的操作,以实现回归lm(B ~ G, data = y).

Looking at x DF, I have variable A and F as the first row. I want to select these two variables in y DF and implement a simple regression: lm(A ~ F, data = y), and save the result in the first position of a list. I will do the same with the second row of x DF implementing a regression lm(B ~ G, data = y).

如何将x中的变量名称与y中的数据匹配以进行回归?

How could I match variables names in x to data in y for a regression?

经修订的问题:更复杂的回归Var1 ~ Var2 + Var3如何?

Revised question: how about a more complicated regression Var1 ~ Var2 + Var3?

推荐答案

x = data.frame(Var1= c("A", "B", "C", "D","E"),
               Var2=c("F","G","H","I","J"),
               Value= c(11, 12, 13, 14,18))

y = data.frame(A= c(11, 12, 13, 14,18),
               B= c(15, 16, 17, 14,18),
               C= c(17, 22, 23, 24,18),
               D= c(11, 12, 13, 34,18),
               E= c(11, 5, 13, 55,18),
               F= c(8, 12, 13, 14,18),
               G= c(7, 5, 13, 14,18),
               H= c(8, 12, 13, 14,18), 
               I= c(9, 5, 13, 14,18),
               J= c(11, 12, 13, 14,18))

我们可以使用

fitmodel <- function (RHS, LHS) do.call("lm", list(formula = reformulate(RHS, LHS),
                                              data = quote(y)))

modList <- Map(fitmodel, as.character(x$Var2), as.character(x$Var1))

modList[[1]]  ## for example
#Call:
#lm(formula = A ~ F, data = y)
#
#Coefficients:
#(Intercept)            F  
#     4.3500       0.7115  

备注:

  1. do.call的使用是为了确保在传递给lm时对reformulate进行求值.这是需要的,因为它允许update之类的功能在模型对象上正常工作.请参见在公式中显示字符串,而不是在lm fit中显示为变量.为了进行比较:

  1. The use of do.call is to ensure that reformulate is evaluated when passed to lm. This is desired as it allows functions like update to work correctly on the model object. See Showing string in formula and not as variable in lm fit. For a comparison:

oo <- Map(function (RHS, LHS) lm(reformulate(RHS, LHS), data = y),
          as.character(x$Var2), as.character(x$Var1))
oo[[1]]
#Call:
#lm(formula = reformulate(RHS, LHS), data = y)
#
#Coefficients:
#(Intercept)            F  
#     4.3500       0.7115  

  • x$Var1x$Var2上的as.character是必需的,因为这两个变量当前是因数"变量而不是字符串,并且reformulate不能使用它们.如果在构建x时将stringsAsFactors = FALSE放在data.frame中,则不会出现此类问题.

  • The as.character on x$Var1 and x$Var2 is necessary, as these two variables are currently "factor" variables not strings and reformulate can't use them. If you put stringsAsFactors = FALSE in data.frame when you build your x, there is no such issue.

    对您有用吗?不是应该有一个"for"循环吗?

    It works for you? It's not suppose to have a "for" loop?

    Map函数隐藏该"for"循环.它是mapply函数的包装. R中的*apply系列功能是语法糖.

    The Map function hides that "for" loop. It is a wrapper of the mapply function. The *apply family functions in R are a syntactic sugar.

    您最初的问题是将模型公式构造为Var1 ~ Var2.

    Your original question is constructs a model formula as Var1 ~ Var2.

    您的新问题需要Var1 ~ Var2 + Var3.

    x$Var3 <- rep("time", each=length(x$Var1))
    y$time <- seq(1:length(y[,1]))
    
    ## collect multiple RHS variables (using concatenation function `c`)
    RHS <- Map(base::c, as.character(x$Var2), as.character(x$Var3))
    #str(RHS)
    #List of 5  ## oh this list has names! annoying!!
    # $ F: chr [1:2] "F" "time"
    # $ G: chr [1:2] "G" "time"
    # $ H: chr [1:2] "H" "time"
    # $ I: chr [1:2] "I" "time"
    # $ J: chr [1:2] "J" "time"
    LHS <- as.character(x$Var1)
    modList <- Map(fitmodel, RHS, LHS)  ## `fitmodel` function unchanged
    modList[[1]]  ## for example
    #Call:
    #lm(formula = A ~ F + time, data = y)
    #
    #Coefficients:
    #(Intercept)            F         time  
    #        5.6          0.5          0.5  
    

    这篇关于如何将变量名和另一个变量的数据框与数据进行匹配以进行回归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    相关文章
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆