如何自动化循环中多行的线性回归和使用R进行绘图 [英] How to automate linear regression of multiple rows in a loop and plot using R
问题描述
B <-c(230,240,295,253,266,286,254,233,201,214)
C< C(ID106,ID107,ID108,ID109,ID110 ; -c(20,40,90,13,46,66,54,33,61,14)
D <-c(120,140,190,113,146,166,154,133,101,114)
E -c(38,34, 33,56,87,31,12,44,68,91)
F < - c(938,934,973,956,987,931,962,944,918,921)
df1 < - data.frame(ID,A,B,C,D, E,F)
上游< -c(A,C,E)
下游< -c(B,D,F )
df2< - data.frame(上游,下游)
我目前在上游和下游数据之间运行简单的线性回归并绘制其残差。手动执行的方式是
pre $ fit <-lm(A〜B,data = df)
$ (a-B,df);其中,b(b)和b(b)
eq < - 替代(italic(y)== a + b%。%italic(x)*,~~ italic(R)^ 2〜=〜r2 *,~~ RMSE〜=〜rmse,
列表(a =格式(coef(m)[1],digits = 2),
b =格式(coef(m)[2],digits = 2) ,
r2 =格式(summary(m)$ r.squared,digits = 3),
rmse = round(sqrt(均值(resid(m)^ 2,na.rm = TRUE)), 3)))
as.character(as.expression(eq));
$ b $ library(ggplot2)
library(grid)
library(gridExtra)
p1 < - ggplot(df,aes (x = A,y = B))+ geom_point(color =red,size = 3)+ geom_smooth(method = lm)+ geom_text(aes(size = 10),x = -Inf,hjust = y = Inf,vjust = 1,label = lm_eqn(df),parse = TRUE,show.legend = F)
p2 < - ggplot(df,aes(x = B,y = resid(fit)) )+ ylab(Residuals)+ geom_point(shape = 1,color =red,size = 3)+ geom_smooth(method =lm)
grid.arrange(p1,p2,ncol = 2, top = textGrob(回归数据,
gp = gpar(cex = 1.5,fontface =bold)))
我得到这个图
I手动为df2中的下一行重做此操作,即C& D,然后再次手动改变下一行的参数,即E& F.
如何使用函数或自动执行此逻辑,以便我只运行一次并获得3个图,每个图(A& B),(C& amp; ; D),(E& F)。
如果我不清楚我想要什么,请告诉我。理想情况下,我正在寻找一种编码方式,以便每次运行时都不需要在各个位置手动输入值(A,B,C,D,E,F)。请提供一些关于如何解决这个问题的指导。
apply()
c> df2 s行,使用 as.formula()
和 aes_string()
: apply(df2,1,function(d)
{
fit< - lm(as.formula(paste(d [Upstream],〜,d [Downstream])),data = df1)
lm_eqn < - function(df){ (as.formula(paste(d [Upstream),〜,d [Downstream])),df);
eq< - 替代(斜体( y)== a + b%。%italic(x)*,~~ italic(R)^ 2〜=〜r2 *,〜RMSE〜=〜rmse,
列表(a =格式(coef(m)[1],digits = 2),
b =格式(coef(m)[2],digits = 2),
r2 =格式)
rmse = round(sqrt(mean(resid(m)^ 2,na.rm = TRUE)),3)))
as.character( as.expres锡永(当量));
$ b $ library(ggplot2)
library(grid)
library(gridExtra)
p1 < - ggplot(df1,aes_string (x = d [Upstream],y = d [Downstream]))+ geom_point(color =red,size = 3)+ geom_smooth(method = lm)+ geom_text(aes(size = 10), x = -Inf,hjust = -1,y = Inf,vjust = 1,label = lm_eqn(df1),parse = TRUE,show.legend = FALSE)
p2 < - ggplot(df1,aes_string(x = d [Downstream],y = resid(fit)))+ ylab(Residuals)+ geom_point(shape = 1,color =red,size = 3)+ geom_smooth(method =lm)
grid.arrange(p1,p2,ncol = 2,top = textGrob(回归数据,
gp = gpar(cex = 1.5,fontface =bold)))
})
I am working with 2 data frames and trying to automate the way I currently do.
ID <- c("ID101","ID102","ID103","ID104","ID105","ID106","ID107","ID108","ID109","ID110")
A <- c(420,440,490,413,446,466,454,433,401,414)
B <- c(230,240,295,253,266,286,254,233,201,214)
C <- c(20,40,90,13,46,66,54,33,61,14)
D <- c(120,140,190,113,146,166,154,133,101,114)
E <- c(38,34,33,56,87,31,12,44,68,91)
F <- c(938,934,973,956,987,931,962,944,918,921)
df1 <- data.frame(ID,A,B,C,D,E,F)
Upstream <- c("A","C","E")
Downstream <- c("B","D","F")
df2 <- data.frame(Upstream,Downstream)
I am currently running a simple linear regression between upstream and downstream data and plot its residuals along with it. The way I do it manually is
fit <- lm(A ~ B, data=df)
lm_eqn <- function(df){
m <- lm(A ~ B, df);
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(R)^2~"="~r2* "," ~~ RMSE ~"="~rmse,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3),
rmse = round(sqrt(mean(resid(m)^2,na.rm=TRUE)), 3)))
as.character(as.expression(eq));
}
library(ggplot2)
library(grid)
library(gridExtra)
p1 <- ggplot(df, aes(x=A, y=B)) + geom_point(colour="red",size = 3) + geom_smooth(method=lm) + geom_text(aes(size=10),x = -Inf, hjust = -1, y = Inf, vjust = 1, label = lm_eqn(df), parse = TRUE,show.legend = F)
p2 <- ggplot(df, aes(x=B, y=resid(fit))) + ylab("Residuals") + geom_point(shape=1,colour="red",size = 3) + geom_smooth(method = "lm")
grid.arrange(p1, p2, ncol=2,top=textGrob("Regression data",
gp=gpar(cex=1.5, fontface="bold")))
I get this plot
I redo this manually for the next row in df2 which is C & D and then manually change the parameters again for the next row which is E & F.
How can I use functions or automate this logic so that I run only one time and get the 3 plots, one for each (A&B), (C&D), (E&F).
Please let me know if I am not clear on what I want. Ideally I am looking a way to code up so that I don't manually need to enter the values (A,B,C,D,E,F) at the respective places every time I run. Kindly please provide some directions on how to solve this.
You can use apply()
on each df2
s row, using as.formula()
and aes_string()
:
apply(df2, 1, function(d)
{
fit <- lm(as.formula(paste(d["Upstream"], " ~ ", d["Downstream"])), data=df1)
lm_eqn <- function(df){
m <- lm(as.formula(paste(d["Upstream"], " ~ ", d["Downstream"])), df);
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(R)^2~"="~r2* "," ~~ RMSE ~"="~rmse,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3),
rmse = round(sqrt(mean(resid(m)^2,na.rm=TRUE)), 3)))
as.character(as.expression(eq));
}
library(ggplot2)
library(grid)
library(gridExtra)
p1 <- ggplot(df1, aes_string(x=d["Upstream"], y=d["Downstream"])) + geom_point(colour="red",size = 3) + geom_smooth(method=lm) + geom_text(aes(size=10),x = -Inf, hjust = -1, y = Inf, vjust = 1, label = lm_eqn(df1), parse = TRUE,show.legend = FALSE)
p2 <- ggplot(df1, aes_string(x=d["Downstream"], y=resid(fit))) + ylab("Residuals") + geom_point(shape=1,colour="red",size = 3) + geom_smooth(method = "lm")
grid.arrange(p1, p2, ncol=2,top=textGrob("Regression data",
gp=gpar(cex=1.5, fontface="bold")))
})
这篇关于如何自动化循环中多行的线性回归和使用R进行绘图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!