从两个数据帧中的数据生成多个串行图/散点图 [英] Generate multiple serial graphs/scatterplots from data in two dataframes

查看:196
本文介绍了从两个数据帧中的数据生成多个串行图/散点图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据帧,Tg和Pf,每个127列。所有列至少有一行,最多可以有数千个。所有值都在0和1之间,并且有一些缺少值(空单元格)。以下是一小部分:

  Tg 
Tg1 Tg2 Tg3 ... Tg127
0.9 0.5 0.4 0
0.9 0.3 0.6 0
0.4 0.6 0.6 0.3
0.1 0.7 0.6 0.4
0.1 0.8
0.3 0.9
0.9
0.6
0.1

Pf
Pf1 Pf2 Pf3 ... Pf127
0.9 0.5 0.4 1
0.9 0.3 0.6 0.8
0.6 0.6 0.6 0.7
0.4 0.7 0.6 0.5
0.1 0.6 0.5
0.3
0.3
0.3


$ b $请注意,某些单元格为空,同一子集(即1到127)的矢量长度可能具有非常不同的长度,并且很少具有相同的精确长度。
我想生成127个图形,如127个向量(即图表是每个数据帧中的col 1,图2是每个数据帧等的col 2):





<希望这是有道理的。我期待着你的帮助,因为我不想一张一张的图形...
谢谢!

解决方案

以下是一个让您开始使用的示例(


I have 2 dataframes, Tg and Pf, each of 127 columns. All columns have at least one row and can have up to thousands of them. All the values are between 0 and 1 and there are some missing values (empty cells). Here is a little subset:

Tg
Tg1 Tg2 Tg3 ... Tg127
0.9 0.5 0.4     0
0.9 0.3 0.6     0
0.4 0.6 0.6     0.3
0.1 0.7 0.6     0.4
0.1 0.8
0.3 0.9
    0.9
    0.6
    0.1

Pf
Pf1 Pf2 Pf3 ...Pf127
0.9 0.5 0.4    1
0.9 0.3 0.6    0.8 
0.6 0.6 0.6    0.7
0.4 0.7 0.6    0.5
0.1     0.6    0.5
0.3
0.3
0.3

Note that some cell are empty and the vector lengths for the same subset (i.e. 1 to 127) can be of very different length and are rarely the same exact length. I want to generate 127 graph as follow for the 127 vectors (i.e. graph is for col 1 from each dataframe, graph 2 is for col 2 for each dataframe etc...):

Hope that makes sense. I'm looking forward to your assistance as I don't want to make those graphs one by one... Thanks!

解决方案

Here is an example to get you started (data at https://gist.github.com/1349300). For further tweaking, check out the excellent ggplot2 documentation that is all over the web.

library(ggplot2)

# Load data
Tg = read.table('Tg.txt', header=T, fill=T, sep=' ')
Pf = read.table('Pf.txt', header=T, fill=T, sep=' ')

# Format data
Tg$x        = as.numeric(rownames(Tg))
Tg          = melt(Tg, id.vars='x')
Tg$source   = 'Tg'
Tg$variable = factor(as.numeric(gsub('Tg(.+)', '\\1', Tg$variable)))

Pf$x        = as.numeric(rownames(Pf))
Pf          = melt(Pf, id.vars='x')
Pf$source   = 'Pf'
Pf$variable = factor(as.numeric(gsub('Pf(.+)', '\\1', Pf$variable)))

# Stack data
data = rbind(Tg, Pf)

# Plot
dev.new(width=5, height=4)
p = ggplot(data=data, aes(x=x)) + geom_line(aes(y=value, group=source, color=source)) + facet_wrap(~variable)
p


Highlighting the area between the lines

First, interpolate the data onto a finer grid. This way the ribbon will follow the actual envelope of the lines, rather than just where the original data points were located.

data = ddply(data, c('variable', 'source'), function(x) data.frame(approx(x$x, x$value, xout=seq(min(x$x), max(x$x), length.out=100))))
names(data)[4] = 'value'

Next, calculate the data needed for geom_ribbon - namely ymax and ymin.

ribbon.data = ddply(data, c('variable', 'x'), summarize, ymin=min(value), ymax=max(value))

Now it is time to plot. Notice how we've added a new ribbon layer, for which we've substituted our new ribbon.data frame.

dev.new(width=5, height=4)
p + geom_ribbon(aes(ymin=ymin, ymax=ymax),  alpha=0.3, data=ribbon.data)


Dynamic coloring between the lines

The trickiest variation is if you want the coloring to vary based on the data. For that, you currently must create a new grouping variable to identify the different segments. Here, for example, we might use a function that indicates when the "Tg" group is on top:

GetSegs <- function(x) {
  segs = x[x$source=='Tg', ]$value > x[x$source=='Pf', ]$value
  segs.rle = rle(segs)

  on.top = ifelse(segs, 'Tg', 'Pf')
  on.top[is.na(on.top)] = 'Tg'

  group = rep.int(1:length(segs.rle$lengths), times=segs.rle$lengths)
  group[is.na(segs)] = NA

  data.frame(x=unique(x$x), group, on.top)
}

Now we apply it and merge the results back with our original ribbon data.

groups = ddply(data, 'variable', GetSegs)
ribbon.data = join(ribbon.data, groups)

For the plot, the key is that we now specify a grouping aesthetic to the ribbon geom.

dev.new(width=5, height=4)
p + geom_ribbon(aes(ymin=ymin, ymax=ymax, group=group, fill=on.top),  alpha=0.3, data=ribbon.data)

Code is available together at: https://gist.github.com/1349300

这篇关于从两个数据帧中的数据生成多个串行图/散点图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆