如何自动化循环中多行的线性回归和使用R进行绘图 [英] How to automate linear regression of multiple rows in a loop and plot using R

查看:247
本文介绍了如何自动化循环中多行的线性回归和使用R进行绘图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理2个数据帧,并试图以我目前的方式自动化。 (ID101,ID102,ID103,ID104,ID105,ID105, (420,440,490,413,446,466,454,433,401,414)
B <-c(230,240,295,253,266,286,254,233,201,214)
C< C(ID106,ID107,ID108,ID109,ID110 ; -c(20,40,90,13,46,66,54,33,61,14)
D <-c(120,140,​​190,113,146,166,154,133,101,114)
E -c(38,34, 33,56,87,31,12,44,68,91)
F < - c(938,934,973,956,987,931,962,944,918,921)
df1 < - data.frame(ID,A,B,C,D, E,F)

上游< -c(A,C,E)
下游< -c(B,D,F )
df2< - data.frame(上游,下游)

我目前在上游和下游数据之间运行简单的线性回归并绘制其残差。手动执行的方式是

pre $ fit <-lm(A〜B,data = df)
$ (a-B,df);其中,b(b)和b(b)
eq < - 替代(italic(y)== a + b%。%italic(x)*,~~ italic(R)^ 2〜=〜r2 *,~~ RMSE〜=〜rmse,
列表(a =格式(coef(m)[1],digits = 2),
b =格式(coef(m)[2],digits = 2) ,
r2 =格式(summary(m)$ r.squared,digits = 3),
rmse = round(sqrt(均值(resid(m)^ 2,na.rm = TRUE)), 3)))
as.character(as.expression(eq));

$ b $ library(ggplot2)
library(grid)
library(gridExtra)

p1 < - ggplot(df,aes (x = A,y = B))+ geom_point(color =red,size = 3)+ geom_smooth(method = lm)+ geom_text(aes(size = 10),x = -Inf,hjust = y = Inf,vjust = 1,label = lm_eqn(df),parse = TRUE,show.legend = F)
p2 < - ggplot(df,aes(x = B,y = resid(fit)) )+ ylab(Residuals)+ geom_point(shape = 1,color =red,size = 3)+ geom_smooth(method =lm)
grid.arrange(p1,p2,ncol = 2, top = textGrob(回归数据,
gp = gpar(cex = 1.5,fontface =bold)))

我得到这个图



I手动为df2中的下一行重做此操作,即C& D,然后再次手动改变下一行的参数,即E& F.



如何使用函数或自动执行此逻辑,以便我只运行一次并获得3个图,每个图(A& B),(C& amp; ; D),(E& F)。



如果我不清楚我想要什么,请告诉我。理想情况下,我正在寻找一种编码方式,以便每次运行时都不需要在各个位置手动输入值(A,B,C,D,E,F)。请提供一些关于如何解决这个问题的指导。

解决方案您可以在每个<$ c $上使用 apply() c> df2 s行,使用 as.formula() aes_string()

  apply(df2,1,function(d)
{

fit< - lm(as.formula(paste(d [Upstream],〜,d [Downstream])),data = df1)

lm_eqn < - function(df){ (as.formula(paste(d [Upstream),〜,d [Downstream])),df);
eq< - 替代(斜体( y)== a + b%。%italic(x)*,~~ italic(R)^ 2〜=〜r2 *,〜RMSE〜=〜rmse,
列表(a =格式(coef(m)[1],digits = 2),
b =格式(coef(m)[2],digits = 2),
r2 =格式)
rmse = round(sqrt(mean(resid(m)^ 2,na.rm = TRUE)),3)))
as.character( as.expres锡永(当量));

$ b $ library(ggplot2)
library(grid)
library(gridExtra)

p1 < - ggplot(df1,aes_string (x = d [Upstream],y = d [Downstream]))+ geom_point(color =red,size = 3)+ geom_smooth(method = lm)+ geom_text(aes(size = 10), x = -Inf,hjust = -1,y = Inf,vjust = 1,label = lm_eqn(df1),parse = TRUE,show.legend = FALSE)
p2 < - ggplot(df1,aes_string(x = d [Downstream],y = resid(fit)))+ ylab(Residuals)+ geom_point(shape = 1,color =red,size = 3)+ geom_smooth(method =lm)
grid.arrange(p1,p2,ncol = 2,top = textGrob(回归数据,
gp = gpar(cex = 1.5,fontface =bold)))
})


I am working with 2 data frames and trying to automate the way I currently do.

ID <- c("ID101","ID102","ID103","ID104","ID105","ID106","ID107","ID108","ID109","ID110")
A <- c(420,440,490,413,446,466,454,433,401,414)
B <- c(230,240,295,253,266,286,254,233,201,214)
C <- c(20,40,90,13,46,66,54,33,61,14)
D <- c(120,140,190,113,146,166,154,133,101,114)
E <- c(38,34,33,56,87,31,12,44,68,91)
F <- c(938,934,973,956,987,931,962,944,918,921)
df1 <- data.frame(ID,A,B,C,D,E,F)

Upstream <- c("A","C","E")
Downstream <- c("B","D","F")
df2 <- data.frame(Upstream,Downstream)

I am currently running a simple linear regression between upstream and downstream data and plot its residuals along with it. The way I do it manually is

fit <- lm(A ~ B, data=df)

lm_eqn <- function(df){
  m <- lm(A ~ B, df);
  eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(R)^2~"="~r2* "," ~~ RMSE ~"="~rmse, 
                   list(a = format(coef(m)[1], digits = 2), 
                        b = format(coef(m)[2], digits = 2),
                        r2 = format(summary(m)$r.squared, digits = 3),
                        rmse = round(sqrt(mean(resid(m)^2,na.rm=TRUE)), 3)))
  as.character(as.expression(eq));
}

library(ggplot2)
library(grid)
library(gridExtra)

p1 <- ggplot(df, aes(x=A, y=B)) + geom_point(colour="red",size = 3) + geom_smooth(method=lm) + geom_text(aes(size=10),x = -Inf, hjust = -1, y = Inf, vjust = 1, label = lm_eqn(df), parse = TRUE,show.legend = F)
p2 <-  ggplot(df, aes(x=B, y=resid(fit))) + ylab("Residuals") + geom_point(shape=1,colour="red",size = 3) + geom_smooth(method = "lm")
grid.arrange(p1, p2, ncol=2,top=textGrob("Regression data", 
                                         gp=gpar(cex=1.5, fontface="bold")))

I get this plot

I redo this manually for the next row in df2 which is C & D and then manually change the parameters again for the next row which is E & F.

How can I use functions or automate this logic so that I run only one time and get the 3 plots, one for each (A&B), (C&D), (E&F).

Please let me know if I am not clear on what I want. Ideally I am looking a way to code up so that I don't manually need to enter the values (A,B,C,D,E,F) at the respective places every time I run. Kindly please provide some directions on how to solve this.

解决方案

You can use apply() on each df2s row, using as.formula() and aes_string():

apply(df2, 1, function(d)
        {

        fit <- lm(as.formula(paste(d["Upstream"], " ~ ", d["Downstream"])), data=df1)

        lm_eqn <- function(df){
                m <- lm(as.formula(paste(d["Upstream"], " ~ ", d["Downstream"])), df);
                eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(R)^2~"="~r2* "," ~~ RMSE ~"="~rmse, 
                                 list(a = format(coef(m)[1], digits = 2), 
                                      b = format(coef(m)[2], digits = 2),
                                      r2 = format(summary(m)$r.squared, digits = 3),
                                      rmse = round(sqrt(mean(resid(m)^2,na.rm=TRUE)), 3)))
                as.character(as.expression(eq));
        }

        library(ggplot2)
        library(grid)
        library(gridExtra)

        p1 <- ggplot(df1, aes_string(x=d["Upstream"], y=d["Downstream"])) + geom_point(colour="red",size = 3) + geom_smooth(method=lm) + geom_text(aes(size=10),x = -Inf, hjust = -1, y = Inf, vjust = 1, label = lm_eqn(df1), parse = TRUE,show.legend = FALSE)
        p2 <-  ggplot(df1, aes_string(x=d["Downstream"], y=resid(fit))) + ylab("Residuals") + geom_point(shape=1,colour="red",size = 3) + geom_smooth(method = "lm")
        grid.arrange(p1, p2, ncol=2,top=textGrob("Regression data", 
                                                 gp=gpar(cex=1.5, fontface="bold")))
        })

这篇关于如何自动化循环中多行的线性回归和使用R进行绘图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆