如何自动化循环中多行的线性回归和使用R进行绘图 [英] How to automate linear regression of multiple rows in a loop and plot using R

查看：247 发布时间：2018/4/17 17:58:29 r function loops

本文介绍了如何自动化循环中多行的线性回归和使用R进行绘图的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理2个数据帧，并试图以我目前的方式自动化。（ID101，ID102，ID103，ID104，ID105，ID105，（420,440,490,413,446,466,454,433,401,414）
B <-c（230,240,295,253,266,286,254,233,201,214）
C< C（ID106，ID107，ID108，ID109，ID110 ; -c（20,40,90,13,46,66,54,33,61,14）
D <-c（120,140,190,113,146,166,154,133,101,114）
E -c（38,34， 33,56,87,31,12,44,68,91）
F < - c（938,934,973,956,987,931,962,944,918,921）
df1 < - data.frame（ID，A，B，C，D， E，F）

上游< -c（A，C，E）
下游< -c（B，D，F ）
df2< - data.frame（上游，下游）

我目前在上游和下游数据之间运行简单的线性回归并绘制其残差。手动执行的方式是

pre $ fit <-lm（A〜B，data = df） $ （a-B，df）;其中，b（b）和b（b） eq < - 替代（italic（y）== a + b％。％italic（x）*，~~ italic（R）^ 2〜=〜r2 *，~~ RMSE〜=〜rmse，列表（a =格式（coef（m）[1]，digits = 2）， b =格式（coef（m）[2]，digits = 2）， r2 =格式（summary（m）$ r.squared，digits = 3）， rmse = round（sqrt（均值（resid（m）^ 2，na.rm = TRUE））， 3））） as.character（as.expression（eq））; $ b $ library（ggplot2） library（grid） library（gridExtra） p1 < - ggplot（df，aes （x = A，y = B））+ geom_point（color =red，size = 3）+ geom_smooth（method = lm）+ geom_text（aes（size = 10），x = -Inf，hjust = y = Inf，vjust = 1，label = lm_eqn（df），parse = TRUE，show.legend = F） p2 < - ggplot（df，aes（x = B，y = resid（fit）））+ ylab（Residuals）+ geom_point（shape = 1，color =red，size = 3）+ geom_smooth（method =lm） grid.arrange（p1，p2，ncol = 2， top = textGrob（回归数据， gp = gpar（cex = 1.5，fontface =bold）））

我得到这个图

I手动为df2中的下一行重做此操作，即C& D，然后再次手动改变下一行的参数，即E& F.

如何使用函数或自动执行此逻辑，以便我只运行一次并获得3个图，每个图（A& B），（C& amp; ; D），（E& F）。

如果我不清楚我想要什么，请告诉我。理想情况下，我正在寻找一种编码方式，以便每次运行时都不需要在各个位置手动输入值（A，B，C，D，E，F）。请提供一些关于如何解决这个问题的指导。

解决方案您可以在每个<$ c $上使用 apply（） c> df2 s行，使用 as.formula（）和 aes_string（）：

  apply（df2,1，function（d）
 {
 
 fit< -  lm（as.formula（paste（d [Upstream]，〜，d [Downstream]）），data = df1）
 
 lm_eqn < -  function（df）{ （as.formula（paste（d [Upstream），〜，d [Downstream]）），df）; 
 eq<  - 替代（斜体（ y）== a + b％。％italic（x）*，~~ italic（R）^ 2〜=〜r2 *，〜RMSE〜=〜rmse，
列表（a =格式（coef（m）[1]，digits = 2），
b =格式（coef（m）[2]，digits = 2），
 r2 =格式）
 rmse = round（sqrt（mean（resid（m）^ 2，na.rm = TRUE）），3）））
 as.character（ as.expres锡永（当量））; 
 
 $ b $ library（ggplot2）
 library（grid）
 library（gridExtra）
 
 p1 < -  ggplot（df1，aes_string （x = d [Upstream]，y = d [Downstream]））+ geom_point（color =red，size = 3）+ geom_smooth（method = lm）+ geom_text（aes（size = 10）， x = -Inf，hjust = -1，y = Inf，vjust = 1，label = lm_eqn（df1），parse = TRUE，show.legend = FALSE）
 p2 < -  ggplot（df1，aes_string（x = d [Downstream]，y = resid（fit）））+ ylab（Residuals）+ geom_point（shape = 1，color =red，size = 3）+ geom_smooth（method =lm）
 grid.arrange（p1，p2，ncol = 2，top = textGrob（回归数据，
 gp = gpar（cex = 1.5，fontface =bold）））
}）

I am working with 2 data frames and trying to automate the way I currently do.

ID <- c("ID101","ID102","ID103","ID104","ID105","ID106","ID107","ID108","ID109","ID110")
A <- c(420,440,490,413,446,466,454,433,401,414)
B <- c(230,240,295,253,266,286,254,233,201,214)
C <- c(20,40,90,13,46,66,54,33,61,14)
D <- c(120,140,190,113,146,166,154,133,101,114)
E <- c(38,34,33,56,87,31,12,44,68,91)
F <- c(938,934,973,956,987,931,962,944,918,921)
df1 <- data.frame(ID,A,B,C,D,E,F)

Upstream <- c("A","C","E")
Downstream <- c("B","D","F")
df2 <- data.frame(Upstream,Downstream)

I am currently running a simple linear regression between upstream and downstream data and plot its residuals along with it. The way I do it manually is

fit <- lm(A ~ B, data=df)

lm_eqn <- function(df){
  m <- lm(A ~ B, df);
  eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(R)^2~"="~r2* "," ~~ RMSE ~"="~rmse, 
                   list(a = format(coef(m)[1], digits = 2), 
                        b = format(coef(m)[2], digits = 2),
                        r2 = format(summary(m)$r.squared, digits = 3),
                        rmse = round(sqrt(mean(resid(m)^2,na.rm=TRUE)), 3)))
  as.character(as.expression(eq));
}

library(ggplot2)
library(grid)
library(gridExtra)

p1 <- ggplot(df, aes(x=A, y=B)) + geom_point(colour="red",size = 3) + geom_smooth(method=lm) + geom_text(aes(size=10),x = -Inf, hjust = -1, y = Inf, vjust = 1, label = lm_eqn(df), parse = TRUE,show.legend = F)
p2 <-  ggplot(df, aes(x=B, y=resid(fit))) + ylab("Residuals") + geom_point(shape=1,colour="red",size = 3) + geom_smooth(method = "lm")
grid.arrange(p1, p2, ncol=2,top=textGrob("Regression data", 
                                         gp=gpar(cex=1.5, fontface="bold")))

I get this plot

I redo this manually for the next row in df2 which is C & D and then manually change the parameters again for the next row which is E & F.

How can I use functions or automate this logic so that I run only one time and get the 3 plots, one for each (A&B), (C&D), (E&F).

Please let me know if I am not clear on what I want. Ideally I am looking a way to code up so that I don't manually need to enter the values (A,B,C,D,E,F) at the respective places every time I run. Kindly please provide some directions on how to solve this.

解决方案

You can use apply() on each df2s row, using as.formula() and aes_string():

apply(df2, 1, function(d)
        {

        fit <- lm(as.formula(paste(d["Upstream"], " ~ ", d["Downstream"])), data=df1)

        lm_eqn <- function(df){
                m <- lm(as.formula(paste(d["Upstream"], " ~ ", d["Downstream"])), df);
                eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(R)^2~"="~r2* "," ~~ RMSE ~"="~rmse, 
                                 list(a = format(coef(m)[1], digits = 2), 
                                      b = format(coef(m)[2], digits = 2),
                                      r2 = format(summary(m)$r.squared, digits = 3),
                                      rmse = round(sqrt(mean(resid(m)^2,na.rm=TRUE)), 3)))
                as.character(as.expression(eq));
        }

        library(ggplot2)
        library(grid)
        library(gridExtra)

        p1 <- ggplot(df1, aes_string(x=d["Upstream"], y=d["Downstream"])) + geom_point(colour="red",size = 3) + geom_smooth(method=lm) + geom_text(aes(size=10),x = -Inf, hjust = -1, y = Inf, vjust = 1, label = lm_eqn(df1), parse = TRUE,show.legend = FALSE)
        p2 <-  ggplot(df1, aes_string(x=d["Downstream"], y=resid(fit))) + ylab("Residuals") + geom_point(shape=1,colour="red",size = 3) + geom_smooth(method = "lm")
        grid.arrange(p1, p2, ncol=2,top=textGrob("Regression data", 
                                                 gp=gpar(cex=1.5, fontface="bold")))
        })

这篇关于如何自动化循环中多行的线性回归和使用R进行绘图的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何自动化循环中多行的线性回归和使用R进行绘图 [英] How to automate linear regression of multiple rows in a loop and plot using R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何自动化循环中多行的线性回归和使用R进行绘图 [英] How to automate linear regression of multiple rows in a loop and plot using R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭