在R中,用ggplot2或底图绘制宽格式的数据。有没有办法使用ggplot2而不融化宽数据框? [英] In R, plotting wide form data with ggplot2 or base plot. Is there a way to use ggplot2 without melting wide form data frame?

查看:109
本文介绍了在R中,用ggplot2或底图绘制宽格式的数据。有没有办法使用ggplot2而不融化宽数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

I have a data frame that looks like this (though thousands of times larger).

我有一个看起来像这样的数据框(虽然数千倍大)。 (样品(1:100,10,替换= F),样品(1:100,10,替换= F),runif(10,0,1),runif(10,0,1),runif(10,0 ,1),rep(c(none,summer,winter,sping,allyear),2))
names(df)< -c(Mother ID波长1,波长2,波长3,水处理)
df
母亲ID波长1波长2波长3水处理
1 2 34 0.9143670 0.03077356 0.82859497无
2 24 75 0.6173382 0.05958151 0.66552338夏季
3 62 77 0.2655572 0.63731302 0.30267893冬季
4 30 98 0.9823510 0.45690437 0.40818031 sping
5 4 11 0.7503750 0.93737900 0.24909228 allyear
6 55 76 0.6451885 0.60138475 0.86044856 none
7 97 21 0.5711019 0.99732068 0.04706894夏季
8 87 14 0.7699293 0.81617911 0.18940531冬季
9 92 30 0.5855559 0.70152698 0.73375917 sping
10 93 44 0.1040359 0.85259166 0.37882469 allyear

df<-data.frame(sample(1:100,10,replace=F),sample(1:100,10,replace=F),runif(10,0,1),runif(10,0,1),runif(10,0,1), rep(c("none","summer","winter","sping","allyear"),2)) names(df)<-c("Mother","ID","Wavelength1","Wavelength2","Wavelength3","WaterTreatment") df Mother ID Wavelength1 Wavelength2 Wavelength3 WaterTreatment 1 2 34 0.9143670 0.03077356 0.82859497 none 2 24 75 0.6173382 0.05958151 0.66552338 summer 3 62 77 0.2655572 0.63731302 0.30267893 winter 4 30 98 0.9823510 0.45690437 0.40818031 sping 5 4 11 0.7503750 0.93737900 0.24909228 allyear 6 55 76 0.6451885 0.60138475 0.86044856 none 7 97 21 0.5711019 0.99732068 0.04706894 summer 8 87 14 0.7699293 0.81617911 0.18940531 winter 9 92 30 0.5855559 0.70152698 0.73375917 sping 10 93 44 0.1040359 0.85259166 0.37882469 allyear

我想要在y轴上绘制波长值,并在x上绘制波长。我有两种方法可以做到这一点:

I want to plot wavelength values on the y axis, and wavelength on the x. I have two ways of doing this:

第一种方法可行,但使用底图,需要更多的代码:

First method which works, but uses base plot and requires more code than should be necessary:

colors=c("red","blue","green","orange","yellow")
plot(0,0,xlim=c(1,3),ylim=c(0,1),type="l")
for (i in 1:10) {
  if      (df$WaterTreatment[i]=="none"){
    a<-1
  } else if (df$WaterTreatment[i]=="allyear") {
    a<-2
  }else if (df$WaterTreatment[i]=="summer") {
    a<-3
  }else if (df$WaterTreatment[i]=="winter") {
    a<-4
  }else if (df$WaterTreatment[i]=="spring") {
    a<-5
  }
  lines(seq(1,3,1),df[i,3:5],type="l",col=colors[a])
}

第二种方法:我尝试将数据融合成长格式,然后使用ggplot2。它产生的情节是不正确的,因为每个水处理都有一条线,而不是每个母亲ID(唯一标识符,原始数据框中的行是什么)的一条线。

Second method: I attempt to melt the data to put it in long form, then use ggplot2. The plot it produces is not correct because there is a line for each water treatment, rather than a line for each "Mother" "ID" (the unique identifier, what were the rows in the original data frame).

require(reshape2)
require(data.table)
df_m<-melt(df,id.var=c("Mother","ID","WaterTreatment"))
df_m$variable<-as.numeric(df_m$variable)  #sets wavelengths to numeric
qplot(x=df_m$variable,y=df_m$value,data=df_m,color=df_m$WaterTreatment,geom = 'line')

可能很简单,我错过了ggplot2,它可以修复线条的绘制。我是一个ggplot的新手,但我正在努力更熟悉它,并希望在此应用程序中使用它。

There is probably something simple I'm missing about ggplot2 that would fix the plotting of the lines. I'm a newbie with ggplot, but am working to get more familiar with it and would like to use it in this application.

但更广泛地说,是否有效在ggplot2中绘制这种类型的宽格式数据的方法?转换/融化数据所需的时间非常庞大,我想知道它是否值得,或者是否有某种解决方法可以消除融化时产生的冗余单元。

But more broadly, is there an efficient way to plot this type of wide form data in ggplot2? The time it takes to transform/melt the data is enormous and I'm wondering if it is worth it, or if there is some kind of work around that can eliminate the redundant cells created when melting.

感谢您的帮助,如果您需要更清晰地解决此问题,请让我知道,我可以编辑。

Thanks for your help, if you need more clarity on this question please let me know and I can edit.

推荐答案

看起来你想为每个ID分开一行,但你希望根据WaterTreatment的值来着色这些行。如果是这样,你可以在ggplot中这样做:

It looks like you want a separate line for each ID, but you want the lines colored based on the value of WaterTreatment. If so, you can do it like this in ggplot:

ggplot(df_m, aes(x=variable, y=value, group=ID, colour=WaterTreatment)) + 
       geom_line() + geom_point()

你也可以使用faceting来更容易地看到WaterTreatment的不同级别。

You can also use faceting to make it easier to see the different levels of WaterTreatment

ggplot(df_m, aes(x=variable, y=value, group=ID, colour=WaterTreatment)) + 
    geom_line() + geom_point() + 
    facet_grid(WaterTreatment ~ .)

回答您的一般问题:ggplot设置为使用长(即,融化)数据框最容易和有效地工作。我想你可以使用一个宽的数据框架,为每一个你想要绘制的因素组合绘制单独的图层。但是,与单个 melt 命令相比,这需要做很多额外的工作,才能将数据转换为正确的格式。

To answer your general question: ggplot is set up to work most easily and powerfully with a "long" (i.e., melted) data frame. I guess you could work with a "wide" data frame and plot separate layers for each combination of factors you want to plot. But that would be a lot of extra work compared to a single melt command to get your data into the right format.

这篇关于在R中,用ggplot2或底图绘制宽格式的数据。有没有办法使用ggplot2而不融化宽数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆