如何在页面上找到R图形的字节大小? [英] How to find byte sizes of R figures on pages?

查看:151
本文介绍了如何在页面上找到R图形的字节大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想监视R在各个页面上生成的图形的基本质量,例如每页的字节大小,... 我现在只能对普通页面进行质量保证,请参阅以下章节. 我认为该任务必须有一些内置功能,而不是一般措施.

I would like to monitor the basic quality of the figures produced in R on individual pages such as byte size of each page, ... I can now do only quality assurance of average pages, see the following chapter about it. I think there must be something builtin for the task than average measures.

Rplots.pdf中产生4页的代码,在这里我想知道输出中每页的字节大小;也欢迎页面输出的任何其他统计信息; 您可以在此处进行基本的内存监视,但我希望这样做对应于PDF中的输出

Code which produces 4 pages in Rplots.pdf where I would like to know the byte size of each page in an output here; any other statistics of the page outputs is also welcome; you can get the basic memory monitoring by objects here but I would like it to correspond to the outputs in PDF

# https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/plot.html
require(stats) # for lowess, rpois, rnorm
plot(cars)
lines(lowess(cars))

plot(sin, -pi, 2*pi) # see ?plot.function

## Discrete Distribution Plot:
plot(table(rpois(100, 5)), type = "h", col = "red", lwd = 10,
     main = "rpois(100, lambda = 5)")

## Simple quantiles/ECDF, see ecdf() {library(stats)} for a better one:
plot(x <- sort(rnorm(47)), type = "s", main = "plot(x, type = \"s\")")
points(x, cex = .5, col = "dark red")

## TODO summarise here the byte size of figures in the figures (1-4)
# Output: Rplot.pdf where 4 pages; I want to know the size of each page in bytes

我目前正在命令行中进行基本的质量保证,但是想将其中的一些转移到R中,以便更快地观察错误.

I am currently doing the basic quality assurance in commandline but would like to move some of it to R, to observe bugs faster.

预期输出:字节大小,例如ls -l

Expected output: byte size, for instance like 4th column of ls -l

限制

  • 页面中数据的同质性的要求.仅当页面全部来自同一样本时,此方法才有效. 否则,这很麻烦,因为它仅仅是平均的,没有描述各个现象. 其他可能的弱点
  • PDF元素和元数据.将pdf文件视为一个整体,而不是专注于图形对象本身.因此,这限制了绝对值的使用,因为文件大小还包含与图形对象无关的标题和其他元数据.
  • Requirement of the homogeneity of the data in pages. This method only works if the pages are all from the same sample. Otherwise, it is troublesome because it is only average, not describing then the individual phenomenons. Other possible weaknesses
  • PDF-elements and meta data. Consider pdf-file as whole, not focusing on the graphic objects itself. So this limits the absolute value use because the filesize contains also headers and other meta data which are not about the graphic objects.

代码

filename <- "main.pdf"
filesize <- file.size(filename)
# http://unix.stackexchange.com/q/331175/16920
pages <- Rpoppler::PDF_info(filename)$Pages 

# print page size (= filesize / pages)
pagesize <- filesize / pages

## data of example file 
num 7350960
int 62
num 118564

输入:仅62页的文档
输出:平均个人页面大小(118564)

Input: just any 62-pages document
Output: average individual page size (118564)

输出,但是您不能轻松地将输入更改为所需的pdf文件

Output but you cannot change the input easily to your wanted pdf-file

     files                             size_bytes 
[1,] "./test_page_size_pdf/page01.pdf" "4,123,942"
[2,] "./test_page_size_pdf/page02.pdf" "    4,971"
[3,] "./test_page_size_pdf/page03.pdf" "    4,672"
[4,] "./test_page_size_pdf/page04.pdf" "    5,370"

输入:只有64页的文档
预期输出:67(= 64 + 3)页,未分析4页

Input: just any 64-pages document
Expected output: 67 (= 64 + 3) pages, not 4 analysed

R:3.3.2
操作系统:Debian 8.5

R: 3.3.2
OS: Debian 8.5

推荐答案

如果您的系统上尚未安装pdftk实用程序,请下载并安装该实用程序,然后从R中尝试以下替代方法之一.

Download and install the pdftk utility if it is not already on your system and then try one of the following alternatives this from within R.

1).它将返回一个数据帧,其页面文件大小以字节为单位以及其他信息.

1) It will return a data frame with the page file sizes in bytes and other information.

myfile <- "Rplots.pdf"
system(paste("pdftk", myfile, "burst"))
file.info(Sys.glob("pg_*.pdf"))

它还将生成一个文件doc_data.txt,其中包含一些您可能感兴趣或不感兴趣的其他信息.

It will also generate a file doc_data.txt with some miscellaneous information that may or may not be of interest.

1a)(此选项不会生成任何文件).它将简单地将页面的字符大小作为数字矢量返回.

1a) This alternative will not generate any files. It will simply return the character sizes of the pages as a numeric vector.

myfile <- "Rplots.pdf"
pages <- as.numeric(read.dcf(pipe(paste("pdftk", myfile, "dump_data")))[, "NumberOfPages"])
cmds <- sprintf("pdftk %s cat %d output - | wc -c", myfile, seq_len(pages))
unname(sapply(cmds, function(cmd) scan(pipe(cmd), quiet = TRUE)))

如果pdftkwc在您的路径上,则上面的命令应该起作用.请注意,在Windows上,您可以在Rtools发行版中找到wc,并且通常在安装Rtools后位于"C:\\Rtools\\bin\\wc".

The above should work if pdftk and wc are on your path. Note that on Windows you can find wc in the Rtools distribution and is typically at "C:\\Rtools\\bin\\wc" once Rtools is installed.

2):此替代方法类似于(1),但使用了动画包:

2) This alternative is similar to (1) but uses the animation package:

library(animation)

ani.options(pdftk = "/path/to/pdftk")
pdftk("Rplots.pdf", "burst", "pg_%04d.pdf", "")
file.info(Sys.glob("pg_*.pdf"))

这篇关于如何在页面上找到R图形的字节大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆