多重绘图功能R [英] Multiple plot function R

查看:42
本文介绍了多重绘图功能R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R的新手,并且过去一周一直在尝试通过Google和论坛找到解决方案.我的问题:我有一个数据集,需要针对年龄进行绘制.在40个不同条件下,有1000多个变量具有不同的测量值.看起来像这样:

I am new to R and have been trying to find a solution to this for the past week via google and forums. My problem: I have a data set which I need to plot against age. There are over a 1000 variables with different measurement during in 40 different conditions. Looks like this:

Age   Variables1  Variable2 (....) Variable1000 > 
 |        |
 |        |
 v        v

我需要做的是针对变量的每一列绘制条件(年龄)并以不同的图形式输出(所有这些只是散点图).而且,我希望将输出限制为仅具有正趋势线系数的变量.

What I need to do is plot the condition(age) against each of the columns of variables and output as different plots (all of this is just scatterplots). What is more, I want the output to be limited to only those variables that have a positive trend line coefficient.

因此,目前我有这个非常丑陋的代码,本质上是我真正需要的草稿.

So currently I have this very ugly code that is essentially a rough draft of what I really need.

plotest <- function(lung){
  # need to add the condition of abline function coefficient > 0 before plotting    
  plot(lung$Age, lung$hsa.let.7a.1, xlab = "Age", ylab = "miRNA")
  abline(lm(lung$hsa.let.7a.1 ~ lung$Age), col= "red")
  return(plot)
}
par(mfrow=c(2,2))
for (i in lung{plotest(i)})

我知道这主要是错误的.非常抱歉给您带来可怕的一切.

I know this is mostly wrong. So sorry for the horrendous everything about it.

任何人都可以将我定向到任何来源,而在如何指定如此大的数据集中的范围时,这些来源可能被我忽略了?和功能语法?我做了一些Python,但是发现R在这方面更加令人困惑...

Could anyone direct me to any sources, which I might have overlooked in how to specify ranges in such large datasets? And function grammar? I have done some Python but found R to be much more confusing in this regard...

谢谢,保罗

推荐答案

这应该很接近您的要求,尽管您要处理的1000张图超出了我的范围.

This should come pretty close to what you're asking for, although what you're going to do with 1000 graphs is beyond me.

# make up some data
x <- seq(1,10,len=100)
set.seed(1)    # for reproducible example
df <- data.frame(x,y1=1+2*x+rnorm(100), 
                   y2=3-4*x+rnorm(100),
                   y3=2+0.001*x+rnorm(100))

# this does the work...
lapply(colnames(df)[-1],function(col){
  form <- formula(paste(col,"x",sep="~"))
  fit  <- lm(form,df)
  if (coef(fit)[2] >0) {
    plot(form,df)
    abline(fit)
  }
})

您的代码并不是那么遥远.此示例采用除第一个列名称( colnames(df)[1] )以外的所有列名称,并将它们一次传递给函数.该函数使用列名和第一列的名称创建一个公式变量,调用 lm(...),检查 x 的系数是否> 0,如果是这样,则绘制数据和最佳拟合线.

Your code was not that far off. This example takes all the column names except the first one (colnames(df)[1]) and passes them one at a time to the function. The function creates a formula variable using the column name and the name of the first column, calls lm(...), checks that the coefficient of x is > 0, and if so plots the data and the best fit line.

查找有关 formula(...) lm(...) coef(...)的文档.请注意,此示例具有变量 y3 ,其斜率为正,但与0的斜率没有显着差异.您应该考虑如何处理这种情况.

Look up the documentation on formula(...), lm(...), and coef(...). Note that this example has a variable, y3 with a slope that is positive, but not significantly different from 0. You should think about how you want to deal with that situation.

这篇关于多重绘图功能R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆