如何在同一组轴上的不同范围内融合和绘制多个数据集? [英] How to melt and plot multiple datasets over different ranges on the same set of axis?

查看:44
本文介绍了如何在同一组轴上的不同范围内融合和绘制多个数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我第一次在这里发帖,希望我的问题清楚,恰当.我有一组数据,其头看起来像这样:

 <代码> wl ex421 wl ex309 wl ex284 wl ex3471431 0.6168224 321 0.1267943 301 0.06392694 361 0.152204842432 0.6687435 322 0.2416268 302 0.05631659 362 0.089615933433 0.6583593 323 0.4665072 303 0.05327245 363 0.131341874434 0.6832814 324 0.3576555 304 0.00000000 364 0.324324325435 0.6427830 325 0.2194976 305 0.12328767 365 0.503082036436 0.7393562 326 0.1866029 306 0.08675799 366 0.34660977 

,依此类推."wl"列表示波长,并且有四个不同的范围.其他四个列代表测量值(归一化)接管WL"范围.范围也有不同的长度.它们全部都部分重叠在数据集中的某个位置.我需要实现的图是在同一组轴上显示所有四组'ex ###'数据,并在它们各自的范围内绘制.X轴需要容纳所有四个"wl"范围.但是,我还没有成功.

过去,当我不得不像这样绘制多组数据时,我只是融化了数据,并且它始终有效.像这样:

  df_melt<-熔体(df,id.var ='wl') 

然后我将其绘制如下:

  fluor_plt<-ggplot(fluor_ref2_melt,aes(x = wl,y = value,color = variable))+geom_point(形状= 1,填充= NA)+ geom_path(数据= fluor_ref2_melt,大小= 1)+主题(panel.grid.major = element_blank(),panel.grid.minor = element_blank())+scale_colour_manual(值= colvec) 

但是,由于我有多个名为"wl"的列,它们的范围也不同,所以发生的情况是R仅采用了第一个"wl"列,而丢弃了所有其他列.然后,通过使用行索引,它基本上将所有'ex ###'值移至该范围内...因此,我得到了下面框架的图:

 <代码> wl ex421 ex309 ex284 ex3471 43​​1 0.6168224 0.1267943 0.06392694 0.152204842 432 0.6687435 0.2416268 0.05631659 0.089615933433 0.6583593 0.4665072 0.05327245 0.131341874434 0.6832814 0.3576555 0.00000000 0.324324325435 0.6427830 0.2194976 0.12328767 0.503082036436 0.7393562 0.1866029 0.08675799 0.34660977 

不用说,这是完全错误的...因此,我尝试解决此问题的一种方法是进入Excel并手动上下移动列,以便在数据框中,每一行都对应一个"wl"值,无论是否有与之关联的测量值.这种摆脱是价值的转移",但仍旧是震丢弃后,第一个在WL"列.我没有得到完全错误的情节,而是得到了正确的情节的一部分.第一组观测值(ex421)绘制在其整个范围内;在范围重叠的地方可以看到其他部分.我看过过去在这里询问过的一些类似案例,例如-

This is my first time posting here, I hope my question is clear and appropriate. I have a set of data the head of which looks like this:

   wl     ex421  wl     ex309  wl      ex284  wl      ex347
1 431 0.6168224 321 0.1267943 301 0.06392694 361 0.15220484
2 432 0.6687435 322 0.2416268 302 0.05631659 362 0.08961593
3 433 0.6583593 323 0.4665072 303 0.05327245 363 0.13134187
4 434 0.6832814 324 0.3576555 304 0.00000000 364 0.32432432
5 435 0.6427830 325 0.2194976 305 0.12328767 365 0.50308203
6 436 0.7393562 326 0.1866029 306 0.08675799 366 0.34660977

and so on. The 'wl' columns represent wavelength, and there are four different ranges. The other four columns represent measurements (normalized) taken over the 'wl' ranges. The ranges are of different lengths, too. All of them partially overlap somewhere in the middle of the dataset. What I need to achieve is a plot showing all four sets of 'ex###' data on the same set of axis and plotted over their respective ranges. The x-axis needs to accommodate all four 'wl' ranges. However, I haven't yet succeeded.

When I had to plot multiple sets of data like this in the past I just melted the data and it always worked. Something like this:

df_melt <- melt(df, id.var = 'wl')

And then I'd plot it like this:

fluor_plt <- ggplot(fluor_ref2_melt, aes(x=wl,y=value,color=variable)) + 
geom_point(shape = 1, fill = NA) + geom_path(data = fluor_ref2_melt,size = 1) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + 
  scale_colour_manual(values = colvec)

However, because I have multiple columns with the name 'wl', which also have different ranges, what happens is that R only takes the first 'wl' column and discards all the other ones. It then basically shifts all the 'ex###' values into that range by using the row index... so I get a plot of the frame below:

   wl     ex421    ex309    ex284      ex347
1 431 0.6168224 0.1267943 0.06392694 0.15220484
2 432 0.6687435 0.2416268 0.05631659 0.08961593
3 433 0.6583593 0.4665072 0.05327245 0.13134187
4 434 0.6832814 0.3576555 0.00000000 0.32432432
5 435 0.6427830 0.2194976 0.12328767 0.50308203
6 436 0.7393562 0.1866029 0.08675799 0.34660977

Needless to say, this is entirely wrong... So one way I tried to circumvent the issue is going into Excel and manually moving columns up and down, so that in the dataframe each row corresponds to one 'wl' value, whether there are any measured values associated with it or not. This got rid of the values being 'shifted', but R still discards the 'wl' columns after the first one. Instead of getting an entirely wrong plot, I get a section of the right one. The first set of observations (ex421) is plotted over its entire range; pieces of the other ones are seen where ranges overlap. I've looked at some similar cases which were asked about here in the past, like this - Reshape data frame from wide to long with re-occuring column names in R. But I'm new to R and I don't think I could fully understand the proposed solutions. I didn't succeed in reshaping my data in the way I want it to be reshaped (keeping different 'wl' ranges for different sets) and I had no idea which arguments to give to ggplot afterwards. I've tried using data.table, but then I don't know what to give it for value.name and variable.name. To reiterate, what I want to achieve is what one would get from plotting the four datasets in the spreadsheet by making a single Scatter plot in Excel and adding four different series to it.

Any input would be greatly appreciated!

解决方案

Here I load a data frame with your data, making sure to allow repeated names with check.names = F, otherwise it would rename the wl columns to be distinct:

df <- read.table(
  header = T, check.names = F,
  stringsAsFactors = F,
  text = "   wl     ex421  wl     ex309  wl      ex284  wl      ex347
 431 0.6168224 321 0.1267943 301 0.06392694 361 0.15220484
 432 0.6687435 322 0.2416268 302 0.05631659 362 0.08961593
 433 0.6583593 323 0.4665072 303 0.05327245 363 0.13134187
 434 0.6832814 324 0.3576555 304 0.00000000 364 0.32432432
 435 0.6427830 325 0.2194976 305 0.12328767 365 0.50308203
 436 0.7393562 326 0.1866029 306 0.08675799 366 0.34660977")

Then here's a way to reshape, by just stacking subsets of the data. Since there weren't too many column pairs, I thought a semi-manual method would be ok. It preserves the distinct column headers so we can gather those into long form and map to color like in your plot.

library(tidyverse)
df2 <- bind_rows(
  df[1:2],
  df[3:4],
  df[5:6],
  df[7:8]
) %>%
  gather(variable, value, -wl) %>%
  drop_na()


ggplot(df2, aes(x=wl,y=value,color=variable)) + 
  geom_point(shape = 1, fill = NA) + 
  geom_path(size = 1) +
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank())

这篇关于如何在同一组轴上的不同范围内融合和绘制多个数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆