R，ggplot，用x值的范围分开平均值 [英] R, ggplot, separate mean by range of x value

查看：217 发布时间：2018/4/25 21:49:38 r ggplot2 mean

本文介绍了R，ggplot，用x值的范围分开平均值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一组数据看起来像这样

  CHROM POS GT DIFF 
 1 chr01 14653 CT 254 
 2 chr01 14907 AG 254 
 3 chr01 14930 AG 23 
 4 chr01 15190 GA 260 
 5 chr01 15211 TG 21 
 6 chr01 16378 TC 1167

其中POS范围从1xxxx到1xxxxxxx。 CHROM是一个包含chr01到chr22和chrX值的分类变量。

我想绘制散点图： p>

 
  y（DIFF）与X（POS）
 
 由CHROM分隔面板
 
 按GT分组（不同颜色的GT） 
  / b> 
 
 我正在创建一个运行平均值的ggplot而不是时间系列数据）。
 
 
我想要的是GT每1,000,000范围内的平均值。
 
 
 <例如，x在范围内（1〜1,000,000），$ D 
 
  
 ，DIFF平均= _____ 
 $ （1,000,001〜2,000,000），DIFF平均值= _____ 
 
 ，我想绘制ggplot上的水平线（用GT着色） 。
 
＃
 
 
到目前为止，我已经应用了你的函数：
  
 
应用您的功能后：
 
 
  < img src =https://i.stack.imgur.com/4w5PU.jpgalt =我在试图应用您的解决方案时使用了
 
 
我已经有了，这里有一些问题：
 
 
 
 有不同的面板，所以不同面板的平均值是不同的，但是当我应用您的代码，横向平均线与第一个面板完全相同。
 
 我对x轴有不同的范围，所以当应用您的功能时，它会自动填充额外的范围与以前的水平平均线
 
 
 
 
 以下是我之前的代码：
 
 
 < pre $  ggplot（data1，aes（x = POS，y = DIFF，color = GT））+ 
 geom_point（）+ 
 facet_grid（〜CHROM，scales = free_x，space =free_x）+ 
主题（strip.text.x = element_text（size = 40），
 strip.background = element_rect（color ='lightblue'，fill ='lightblue '），
 legend.position =top，
 legend.title = element_text（size = 40，color =darkblu元素文字（大小= 40），
 legend.key.size =单位（2.5，cm））+ 
指南（fill = guide_legend（title） .position =top，
 title =Legend：GT ='REF'+'ALT'），
 shape = guide_legend（override.aes = list（size = 10）））+ 
 scale_y_log10（breaks = trans_breaks（log10，function（x）10 ^ x，n = 10））+ 
 scale_x_continuous（breaks = pretty_breaks（n = 3））
   
 
解决方案
这比我想象的要困难得多！不过这应该至少让你开始吧： 
 
 
 ＃它可以节省很多麻烦， b $ b选项（stringsAsFactors = FALSE）
 
 
 
图书馆（ggplot2）
图书馆（plyr）
 
＃如果您可以发布
＃您的真实数据的子集，它总是有帮助的。 dput（）函数对此非常有用。 
 dat < -  data.frame（POS = seq（1,1e7，by = 1e4））
 
 
＃添加随机GT值
 dat $ GT< ;  - 样本（x = c（CT，AG，GA，TG，TC），
 size = nrow（dat），
 replace = TRUE）
 
＃分组数百万 - 有几种方法可以做到这一点，我可以
＃永远不会记住，但这里有一个简单的方法来分割数百万美元
 dat $ POSgroup<  -  floor（ dat $ POS / 1e6）
 
 
＃添加一个任意的DIFF值
 dat $ DIFF < -  rnorm（n = nrow（dat），
 mean = 200 * dat $ POSgroup，
 sd = 300）
 
 
 
＃通过GT和POS-group汇总数据
＃理想情况下，这里面的情节使用stat_summary，
＃但我无法让它工作。不过，在一张图
＃中使用两个数据集是可以的。 
 datsum<  -  ddply（dat，.var =POSgroup，.fun = function（x）{
 
＃计算此POS组中每个GT组的平均DIFF值
 meandiff<  -  ddply（x，.var =GT，.fun = summarize，ymean = mean（DIFF））
 
＃添加POSgroup范围的中心作为x位置
 meandiff $ center<  - （x $ POSgroup [1] * 1e6）+ 0.5e6 
 
＃返回结果
 meandiff 
 
}）
 
 
＃在图上，这些结果将由POS和GT分组 - 但是
＃ggplot只会接受一个分组向量。所以做一个组合。 
 datsum $ combogroup<  -  paste（datsum $ GT，datsum $ POSgroup）
 
 
＃绘制
 ggplot（）+ 
 
＃首先，点自己的图层
＃大量的点可能会变得非常慢 - 您可能会尝试获取
＃图以使用子样本（〜1000），然后添加其余的
＃您的数据
 geom_point（data = dat，
 aes（x = POS，y = DIFF，color = as.factor（GT）））+ 
 
＃然后是另一层手段。你可以在
＃中使用各种各样的geoms，但是ymin和ymax设置为group的crossbar意味着
＃是一个简单的
 geom_crossbar（data = datsum，aes（x =中心，
y = ymean，
 ymin = ..y ..，
 ymax = ..y ..，
 color = as.factor（GT），
 group = combogroup），
 size = 1）+ 
 
 
＃一些其他细节
 scale_x_continuous（breaks = seq（0，1e7，by = 1e6））+ 
 labs（x =POS，y =DIFF，color =GT）+ 
 theme_bw（）
  
其结果如下：
 
 
   
 
 
可能有更直接的方法来做到这一点，但我不知道。希望这有助于。
 
I have a set of data looks like this
  CHROM   POS GT DIFF
1 chr01 14653 CT 254
2 chr01 14907 AG 254
3 chr01 14930 AG 23
4 chr01 15190 GA 260
5 chr01 15211 TG 21
6 chr01 16378 TC 1167
Where POS range from 1xxxx to 1xxxxxxx.
And CHROM is a categorical variable that contains values of "chr01" to "chr22" and "chrX".

I want to plot a scatterplot:


y(DIFF) vs. X(POS) 
having panels separated by CHROM
grouped by GT (different colors by GT)


I'm creating a ggplot with running average (though not time series data).

What I want is to get average for every 1,000,000 range of POS by GT.

For example,

for x in range(1 ~ 1,000,000) , DIFF average = _____

for x in range(1,000,001 ~ 2,000,000), DIFF average = _____

and I want to plot horizontal lines on the ggplot (coloured by GT).

#

What I have so far before apply your function:


After apply your function:



I tried to apply your solution to what I already have, here are some problems:


There are different panels, so the mean values are different for different panel, but when I apply your code, the horizontal mean lines are all identical to the first panel.
I'm having different ranges for x-axis, so when apply your function, it automatically fills out the extra range with the previous horizontal mean line 


Here is my code before:
ggplot(data1, aes(x=POS,y=DIFF,colour=GT)) +
  geom_point() +
  facet_grid(~ CHROM,scales="free_x",space="free_x") + 
  theme(strip.text.x = element_text(size=40),
        strip.background = element_rect(color='lightblue',fill='lightblue'),
        legend.position="top",
        legend.title = element_text(size=40,colour="darkblue"),
        legend.text = element_text(size=40),
        legend.key.size = unit(2.5, "cm")) +
  guides(fill = guide_legend(title.position="top",
                             title = "Legend:GT='REF'+'ALT'"),
         shape = guide_legend(override.aes=list(size=10))) +
  scale_y_log10(breaks=trans_breaks("log10", function(x) 10^x, n=10)) + 
  scale_x_continuous(breaks = pretty_breaks(n=3))

 解决方案 
This was tougher than I expected! This should at least get you started, though:
# It saves a lot of headaches to just make factors as you need them
options(stringsAsFactors = FALSE)



library(ggplot2)
library(plyr)

# Here's some made-up data - it always helps if you can post a subset of
# your real data, though. The dput() function is really useful for that.
dat <- data.frame(POS = seq(1, 1e7, by = 1e4))


# Add random GT value
dat$GT <- sample(x = c("CT", "AG", "GA", "TG", "TC"),
                 size = nrow(dat),
                 replace = TRUE)

# Group by millions - there are several ways to do this that I can 
# never remember, but here's a simple way to split by millions
dat$POSgroup <- floor(dat$POS / 1e6)


# Add an arbitrary DIFF value
dat$DIFF <- rnorm(n = nrow(dat),
                  mean = 200 * dat$POSgroup,
                  sd = 300)



# Aggregate the data by GT and POS-group
# Ideally, you'd do this inside of the plot using stat_summary,
# but I couldn't get that to work. Using two datasets in a plot 
# is okay, though.
datsum <- ddply(dat, .var = "POSgroup", .fun = function(x) {

    # Calculate the mean DIFF value for each GT group in this POSgroup
    meandiff <- ddply(x, .var = "GT", .fun = summarise, ymean = mean(DIFF))

    # Add the center of the POSgroup range as the x position
    meandiff$center <- (x$POSgroup[1] * 1e6) + 0.5e6

    # Return the results
    meandiff

})


# On the plot, these results will be grouped by both POS and GT - but
# ggplot will only accept one vector for grouping. So make a combination.
datsum$combogroup <- paste(datsum$GT, datsum$POSgroup)


# Plot it
ggplot() +

    # First, a layer for the points themselves
    # Large numbers of points can get pretty slow - you might try getting
    # the plot to work with a subsample (~1000) and then add in the rest of
    # your data
    geom_point(data = dat, 
               aes(x = POS, y = DIFF, color = as.factor(GT))) +

    # Then another layer for the means. There are a variety of geoms you could
    # use here, but crossbar with ymin and ymax set to the group mean
    # is a simple one
    geom_crossbar(data = datsum, aes(x = center, 
                                     y = ymean, 
                                     ymin = ..y.., 
                                     ymax = ..y.., 
                                     color = as.factor(GT),
                                     group = combogroup),
                  size = 1) +


    # Some other niceties
    scale_x_continuous(breaks = seq(0, 1e7, by = 1e6)) +
    labs(x = "POS", y = "DIFF", color = "GT") +
    theme_bw()
Which results in this:



There's probably a more straightforward way to do this, but I don't know it. Hope this helps.

                        这篇关于R，ggplot，用x值的范围分开平均值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

R，ggplot，用x值的范围分开平均值 [英] R, ggplot, separate mean by range of x value

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R，ggplot，用x值的范围分开平均值 [英] R, ggplot, separate mean by range of x value

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭