R中的最大情节点? [英] maximum plot points in R?

查看:46
本文介绍了R中的最大情节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到过很多情况,我想绘制比我真正应该绘制的点更多的点 - 主要的障碍是当我与人分享我的图或将它们嵌入到文件中时,它们占据了太多空间.在数据框中随机采样行非常简单.

I have come across a number of situations where I want to plot more points than I really ought to be -- the main holdup is that when I share my plots with people or embed them in papers, they occupy too much space. It's very straightforward to randomly sample rows in a dataframe.

如果我想要一个真正随机的点图样本,很容易说:

if I want a truly random sample for a point plot, it's easy to say:

ggplot(x,y,data=myDf[sample(1:nrow(myDf),1000),])

但是,我想知道是否有更有效的(理想情况下是罐装的)方法来指定绘图点的数量,以便您的实际数据准确地反映在绘图中.所以这是一个例子.假设我正在绘制类似重尾分布的 CCDF 的内容,例如

However, I was wondering if there were more effective (ideally canned) ways to specify the number of plot points such that your actual data is accurately reflected in the plot. So here is an example. Suppose I am plotting something like the CCDF of a heavy tailed distribution, e.g.

ccdf <- function(myList,density=FALSE)
{
  # generates the CCDF of a list or vector
  freqs = table(myList)
  X = rev(as.numeric(names(freqs)))
  Y =cumsum(rev(as.list(freqs)));
  data.frame(x=X,count=Y)
}
qplot(x,count,data=ccdf(rlnorm(10000,3,2.4)),log='xy')

这将产生一个图,其中 x &y 轴变得越来越密集.在这里,最好为较大的 x 或 y 值绘制较少的样本.

This will produce a plot where the x & y axis become increasingly dense. Here it would be ideal to have fewer samples plotted for large x or y values.

是否有人对处理类似问题有任何提示或建议?

Does anybody have any tips or suggestions for dealing with similar issues?

谢谢,-e

推荐答案

这里是一个关于 x 轴的下采样图的可能解决方案,如果它是对数变换的.它对 x 轴进行对数变换,对该数量进行四舍五入,并选择该 bin 中的 x 中值:

Here is one possible solution for downsampling plot with respect to the x-axis, if it is log transformed. It log transforms the x-axis, rounds that quantity, and picks the median x value in that bin:

downsampled_qplot <- function(x,y,data,rounding=0, ...) {
  # assumes we are doing log=xy or log=x
  group = factor(round(log(data$x),rounding))
  d <- do.call(rbind, by(data, group, 
    function(X) X[order(X$x)[floor(length(X)/2)],]))
  qplot(x,count,data=d, ...)
}

使用上面的 ccdf() 定义,我们可以将分布的 CCDF 的原始图与下采样版本进行比较:

Using the definition of ccdf() from above, we can then compare the original plot of the CCDF of the distribution with the downsampled version:

myccdf=ccdf(rlnorm(10000,3,2.4))

qplot(x,count,data=myccdf,log='xy',main='original')

downsampled_qplot(x,count,data=myccdf,log='xy',rounding=1,main='rounding = 1')

downsampled_qplot(x,count,data=myccdf,log='xy',rounding=0,main='rounding = 0')

PDF格式,原图占用640K,下采样版本分别占用20K和8K.

In PDF format, the original plot takes up 640K, and the downsampled versions occupy 20K and 8K, respectively.

这篇关于R中的最大情节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆