使用 Python 制作包含 100 多个图的 PDF 报告的最佳方法是什么? [英] Which is the best way to make a report in PDF with more than 100 plots with Python?

查看:55
本文介绍了使用 Python 制作包含 100 多个图的 PDF 报告的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一份包含很多图表的PDF报告.它们中的大多数将在循环中使用 matplotlib 创建,但我还需要包括 Pandas 图和数据框(整个视图)和 seaborn 图.现在,我探索了以下解决方案:

  • PythonTex.我已经将它用于其他项目,但它会消耗大量时间,因为您必须为要显示的每个图编写 \pythontexprint.
  • 在循环的每次迭代中使用 savefig 命令,并将所有绘图保存为图像,以便稍后将其全部插入Latex中.这也是非常耗时的选择.另一个选择是使用该命令将图另存为pdf,然后合并所有pdf.由于图表无法容纳整个页面,因此这将产生难看的报告.
  • 使用 RStudio 和 reticulate 创建 Markdown 报告.这里的问题是我需要学习网状功能,从而花费时间.
  • 据我所知,PyPDF不能满足我的需求.
  • 创建一个jupyter笔记本,然后尝试将其导出为PDF.再一次,我不知道如何使用jupyter笔记本,我读到我必须先转换为html,然后转换为pdf.
  • 此处的解决方案:

    导入io从 reportlab.lib.pagesizes 导入信从reportlab.platypus导入SimpleDocTemplate,段落,空格,图像从 reportlab.lib.styles 导入 getSampleStyleSheet从 reportlab.lib.units 导入英寸将numpy导入为np导入matplotlib.pyplot作为pltdef plot_hist():"创建样本直方图,并返回带有图的bytesio缓冲区退货-------BytesIO : 在带有绘图图像的内存缓冲区中,可以传递给 reportlab 或其他地方"""#来自https://matplotlib.org/gallery/lines_bars_and_markers/scatter_masked.html#sphx-glr-gallery-lines-bars-and-markers-scatter-masked-pyplt.figure(figsize =(7,2.25))N = 100r0 = 0.6x = 0.9 * np.random.rand(N)y = 0.9 * np.random.rand(N)面积=(20 * np.random.rand(N))** 2#0到10点半径c = np.sqrt(面积)r = np.sqrt(x * x + y * y)area1 = np.ma.masked_where(r < r0, area)区域2 = np.ma.masked_where(r> = r0,区域)plt.scatter(x,y,s = area1,marker ='^',c = c)plt.scatter(x,y,s = area2,marker ='o',c = c)#显示区域之间的边界:theta = np.arange(0,np.pi/2,0.01)plt.plot(r0 * np.cos(theta), r0 * np.sin(theta))#创建缓冲区并将图像保存到缓冲区#dpi应该与您的PDF的dpi相匹配,我认为300是典型值,否则效果不佳buf = io.BytesIO()plt.savefig(buf, format='png', dpi=300)buf.seek(0)# 保存到缓冲区后,您需要关闭图形plt.close()返回bufdef add_text(text,style ="Normal",fontsize = 12):""" 将带有一些间距的文本添加到 PDF 报告中参数----------文字:str要打印为PDF的字符串风格:str报告实验室风格字体大小:整数文本的字体大小"""Story.append(Spacer(1, 12))ptext =< font size = {}> {}</font>".format(字体大小,文本)Story.append(段落(ptext,styles [style]))Story.append(Spacer(1,12))#使用基本样式和SimpleDocTemplate来开始使用reportlab样式=getSampleStyleSheet()doc = SimpleDocTemplate("form_letter.pdf",pagesize=letter,rightMargin = inch/2,leftMargin = inch/2,topMargin = 72,bottomMargin = 18)# 故事"只包含有关如何构建 PDF 的说明"故事= []add_text("我的报告", style="Heading1", fontsize=24)#有关如何获取matplotlib图的BytesIO对象的信息,请参见plot_hist# 此代码使用reportlab Image函数将有效的PIL输入添加到报告中image_buffer1 = plot_hist()im = Image(image_buffer1, 7*inch, 2.25*inch)Story.append(im)add_text("这段文字解释了有关图表的一些内容.")image_buffer2 = plot_hist()im =图片(image_buffer2,7 * inch,2.25 * inch)Story.append(im)add_text(此文本解释了另一个图表的其他信息.")#此命令将实际生成PDFdoc.build(故事)#应该关闭打开的缓冲区,可以在python中使用"with"语句为您完成此操作#如果效果更好image_buffer1.close()image_buffer2.close()

    I need to have a report in PDF with a lot of plots. Most of them will be created with matplotlib within a loop, but I would need also to include pandas plots and dataframes (the whole view) and seaborn plots. Right now I have explored the following solutions:

    • PythonTex. I have already used it for other projects, but it would consume a lot of time because you have to write \pythontexprint for each plot you want to display.
    • Use savefig command in every iteration of the loop and save all the plots as image for inserting all in Latex later. That would be very time consuming choice too. Other option is with that command save the plots as pdf and then merge all the pdfs. That would create an ugly report since the plots are not going to fit the whole page.
    • Use RStudio with reticulate for creating a Markdown report. The problem here is that I would need to learn reticulate functionality, thus spending time.
    • As far as I know, PyPDF does not fit my needs.
    • Create a jupyter notebook and then try to export it to a PDF. Once again, I do not know how to use jupyter notebook and I read that I would have to convert first to html and then to pdf.
    • Solutions from here: Generating Reports with Python: PDF or HTML to PDF However, the question is from three years ago and it might better options nowadays.

    So my question is the following: is there any easy and quick way of getting all those plots (if it is along the code which generates them even better) in a PDF with a decent aspect?

    解决方案

    My recommendation would be to use matplotlibs savefig to a BytesIO buffer (or save buffers to a list or similar data structure for 100). Then you can use those image buffers to insert the image into a pdf using a library like reportlab (website here and docs here). I regularly use this approach to create PowerPoint documents using python-pptx library but also verified it via PDF with reportlab. reportlab library is very powerful and a bit "low level" so there might be a little learning curve getting started but it surely meets your needs. There is a simple getting started tutorial here. reportlab is BSD license and available on pip and conda.

    Anyways my code snippet looks like this.
    Sorry its a bit long but my code has some helper functions to print text and dummy images. You should be able to copy/paste it directly.

    The code will yield a PDF that looks like this

    import io
    
    from reportlab.lib.pagesizes import letter
    from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image
    from reportlab.lib.styles import getSampleStyleSheet
    from reportlab.lib.units import inch
    
    import numpy as np
    import matplotlib.pyplot as plt
    
    
    def plot_hist():
        """ Create a sample histogram plot and return a bytesio buffer with plot
    
        Returns
        -------
        BytesIO : in memory buffer with plot image, can be passed to reportlab or elsewhere
        """    
        # from https://matplotlib.org/gallery/lines_bars_and_markers/scatter_masked.html#sphx-glr-gallery-lines-bars-and-markers-scatter-masked-py
        plt.figure(figsize=(7, 2.25))
    
        N = 100
        r0 = 0.6
        x = 0.9 * np.random.rand(N)
        y = 0.9 * np.random.rand(N)
        area = (20 * np.random.rand(N))**2  # 0 to 10 point radii
        c = np.sqrt(area)
        r = np.sqrt(x * x + y * y)
        area1 = np.ma.masked_where(r < r0, area)
        area2 = np.ma.masked_where(r >= r0, area)
        plt.scatter(x, y, s=area1, marker='^', c=c)
        plt.scatter(x, y, s=area2, marker='o', c=c)
        # Show the boundary between the regions:
        theta = np.arange(0, np.pi / 2, 0.01)
        plt.plot(r0 * np.cos(theta), r0 * np.sin(theta))
    
        # create buffer and save image to buffer
        # dpi should match the dpi of your PDF, I think 300 is typical otherwise it won't pretty well
        buf = io.BytesIO()
        plt.savefig(buf, format='png', dpi=300)
        buf.seek(0)
        # you'll want to close the figure once its saved to buffer
        plt.close()
    
        return buf
    
    def add_text(text, style="Normal", fontsize=12):
        """ Adds text with some spacing around it to  PDF report 
    
        Parameters
        ----------
        text : str
            The string to print to PDF
    
        style : str
            The reportlab style
    
        fontsize : int
            The fontsize for the text
        """
        Story.append(Spacer(1, 12))
        ptext = "<font size={}>{}</font>".format(fontsize, text)
        Story.append(Paragraph(ptext, styles[style]))
        Story.append(Spacer(1, 12))
    
    # Use basic styles and the SimpleDocTemplate to get started with reportlab
    styles=getSampleStyleSheet()
    doc = SimpleDocTemplate("form_letter.pdf",pagesize=letter,
                            rightMargin=inch/2,leftMargin=inch/2,
                            topMargin=72,bottomMargin=18)
    
    # The "story" just holds "instructions" on how to build the PDF
    Story=[]
    
    add_text("My Report", style="Heading1", fontsize=24)
    
    # See plot_hist for information on how to get BytesIO object of matplotlib plot
    # This code uses reportlab Image function to add and valid PIL input to the report
    image_buffer1 = plot_hist()
    im = Image(image_buffer1, 7*inch, 2.25*inch)
    Story.append(im)
    
    add_text("This text explains something about the chart.")
    
    image_buffer2 = plot_hist()
    im = Image(image_buffer2, 7*inch, 2.25*inch)
    Story.append(im)
    
    add_text("This text explains something else about another chart chart.")
    
    # This command will actually build the PDF
    doc.build(Story)
    
    # should close open buffers, can use a "with" statement in python to do this for you
    # if that works better
    image_buffer1.close()
    image_buffer2.close()
    

    这篇关于使用 Python 制作包含 100 多个图的 PDF 报告的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆