在字典键(或csv)中附加基于pdf文件的multilpe值会导致页面过多 [英] Appending pdf files based multilpe values in a dictionary key (or csv) results in too many pages

查看:40
本文介绍了在字典键(或csv)中附加基于pdf文件的multilpe值会导致页面过多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据所属县创建pdf文件.如果每个县有多个pdf文件,那么我需要根据县密钥将文件附加到单个文件中.我似乎无法根据键来添加地图.生成的最终贴图似乎是随机的,并且通常会添加太多文件.我敢肯定我没有正确地将它们分组.我已经读过,键中的多个值可能会导致多次显示.有人可以帮我提示一下如何分别一次访问每个键的每个值吗?显然,我不了解某些关键内容.

I am trying generate pdf files based on the county they fall in. If there is more than one pdf file per county then I need to append the files into a single file based on the county key. I can't seem to get the maps to append based on key. The final maps generated seem random and often have way too many files appended. I am pretty sure I am not grouping them correctly. I have read that multiple values in a key can result in showing up multiple times. Can someone please clue me in on how to access each value per key separately, one time only? Obviously I am not understanding something crucial.

我的代码:

import csv, os
import shutil
from PyPDF2 import PdfFileMerger, PdfFileReader, PdfFileWriter

merged_file = PdfFileMerger()
counties = {'County4': ['C:\\maps\\map2.pdf', 'C:\\maps\\map3.pdf', 'C:\\maps\\map4.pdf'], 'County1': ['C:\\maps\\map1.pdf', 'C:\\maps\\map2.pdf'], 'County3': ['C:\\maps\\map3.pdf'], 'County2': ['C:\\maps\\map1.pdf', 'C:\\maps\\map3.pdf']}
for k, v in counties.items():
    newPdfFile = ('C:\maps\JoinedMaps\k +'.pdf')
    if len(v) > 1:
        for filename in v:
            merged_file.append(PdfFileReader(filename,'rb'))
        merged_file.write(newPdfFile)
    else:
        for filename in v:
            shutil.copyfile(filename, newPdfFile)

我输出了四张地图(这是正确的),但是其中一些文件中的页面"(附加文件)数量非常少.据我所知,这些页面的添加方式没有任何韵律或原因.县4 pdf具有3页(正确),县1 pdf具有8页而不是2,县3 pdf具有1页(正确),县2具有15页而不是2页.

I get four maps outputted (which is correct) but the number of "pages" (appended files) in some of these files is wildly off. As far as I can tell there is no rhyme or reason as to how these pages are appended. County4 pdf has 3 pages (correct), County1 pdf has 8 pages instead of 2, County3 pdf has 1 page (correct) and County2 has 15 pages instead of 2.

事实证明pyPDF2不喜欢使用group-by的概念遍历和创建文件.我想它与存储内存有关.结果是当您遍历键值时创建越来越多的页面.我花了几天的时间认为这是我的编码.很高兴知道这不是我想的,但令我惊讶的是,这条信息没有更好地存在于互联网上".

It turns out pyPDF2 does not like iterating through and creating files using the concept of group-by. I imagine it has something to so with how it stores memory. The results are the creation of increasingly greater number of pages as you iterate through the key values. I spent days thinking it was my coding. Good to know it wasn't I guess but I am surprised this piece of information is not "out there on the internet" better.

我的解决方案是使用arcpy,抱歉地说,这并不能帮助大多数用户阅读此内容.

My solution was to use arcpy, which doesn't help most users reading this, sorry to say.

对于那些正在寻找我解决方案的人,我的csv文件看起来像这样:

For those looking at my solution, my csv file looked like this:

County1   C:\maps\map1.pdf
County1   C:\maps\map2.pdf
County2   C:\maps\map1.pdf
County2   C:\maps\map3.pdf
County3   C:\maps\map3.pdf
County4   C:\maps\map2.pdf
County4   C:\maps\map3.pdf
County4   C:\maps\map4.pdf

和我生成的pdf文件看起来像这样:

and my resulting pdf files looked like this:

County-County1 (2 pages - Map1 and Map2)
County-County2 (2 pages - Map1 and Map3)
County-County3 (1 page - Map3)
County-County2 (3 pages - Map2, Map3, and Map4)

推荐答案

我的数据最初是一个csv文件,下面的代码引用了此数据,而不是我上面使用的字典(由csv文件生成)例如,但是您应该能够根据下面的代码收集我所做的事情.我基本上是放弃了字典的想法,然后逐行读取csv文件,然后使用arcpy追加.尝试根据一个键输出多个文件时,pyPDF2不会正确合并.我一生三天都无法回来

My data started out as a csv file and the code below references this instead of the dictionaries (which were generated from the csv file) which I used in the above example, but you should be able to glean what I did based on code below. I basically scraped the dictionary idea and went with reading the csv file line by line and then appending using arcpy. pyPDF2 does NOT merge correctly when trying to output multiple files based on a key. Three days of my life I can't get back

import csv
import arcpy
from arcpy import env
import shutil, os, glob

# clear out files from destination directory
files = glob.glob(r'C:\maps\JoinedMaps\*')
for f in files:
    os.remove(f)

# open csv file
f = open("C:\maps\Maps.csv", "r+")
ff = csv.reader(f)

# set variable to establish previous row of csv file (for comaprrison)
pre_line = ff.next()

# Iterate through csv file

for cur_line in ff:
    # new file name and location based on value in column (county name)
    newPdfFile = (r'C:\maps\JoinedMaps\County-' + cur_line[0] +'.pdf')
    # establish pdf files to be appended
    joinFile = pre_line[1]
    appendFile = cur_line[1]

    # If columns in both rows match
    if pre_line[0] == cur_line[0]: # <-- compare first column
        # If destnation file already exists, append file referenced in current row
        if os.path.exists(newPdfFile):
            tempPdfDoc = arcpy.mapping.PDFDocumentOpen(newPdfFile)
            tempPdfDoc.appendPages(appendFile)
        # Otherwise create destination and append files reference in both the previous and current row
        else:
            tempPdfDoc = arcpy.mapping.PDFDocumentCreate(newPdfFile)
            tempPdfDoc.appendPages(joinFile)
            tempPdfDoc.appendPages(appendFile)
        # save and delete temp file
        tempPdfDoc.saveAndClose()
        del tempPdfDoc
    else:
        # if no match, do not merge, just copy
        shutil.copyfile(appendFile,newPdfFile)

    # reset variable
    pre_line = cur_line

这篇关于在字典键(或csv)中附加基于pdf文件的multilpe值会导致页面过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆