用于在循环中下载和合并pdf文件的Python代码产生1KB的结束文件 [英] Python code for downloading and merging pdf files in a loop results in a 1KB endfile

查看:281
本文介绍了用于在循环中下载和合并pdf文件的Python代码产生1KB的结束文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好。

我有一个网站,我需要得到~1000 pdf from。 pdf与1101-2300之间的四位数不同,如

https://intvd.gib.gov.tr/2014_Emlak_Arsa/EmlakServlet?tip=9&ilKodu=3&ilceKodu= < b> 1101

虽然这个范围之间的一些数字没有分配给pdf,所以我需要一些东西来支付
1-)下载全部pdf

2-)删除1KB的pdf(这些是未分配的pdfs)

3-)将所有pdf文件合并为一个pdf文件

每个步骤都有答案,但不是在一起,所以我看了一下并做了些什么。最后,我得到的是1 KB pdf文件,名为merged_full.pdf

我做错了什么?

干杯



我尝试了什么:



 导入 urllib.request 
import os
来自 PyPDF2 import PdfFileReader,PdfFileMerger

os.chdir(' the_directory'

mylist =(list(范围( 1101 2500 )))

i mylist:
def download_file(download_url):
web_file = urllib.request.urlopen(' https://intvd.gib.gov.tr/2014_Emlak_Arsa/EmlakServlet?tip=9&ilKodu= 3& ilceKodu =%d'%(i),' %d.pdf' %(i))
local_file = open(' %d.pdf'%(i ),' wb'
local_file.write(web_file.read())
web_file.close()
local_file.close()
filesize = os.path.getsize(' %d.pdf'%(i))
if filesize< 1024:
os.remove(' %d.pdf'%(i))
del filesize

files_dir = the_directory
pdf_files = [f for f in os.listdir(files_dir) if f.endswith( pdf)]
merger = PdfFileMerger()

for filename in pdf_files:
merger.append(PdfFileReader(os.path.join(files_dir,filename), rb) )

merger.write(os.path.join(files_dir, merged_full.pdf ))

解决方案

大小为1 KB表示创建的文件只是一个空PDF。



你应该在你的代码中插入支票,看看每一步是否按预期工作:



是否下载了文件?

哪些尺寸有现有文件?

您的 pdf_files 中列出的文件是什么?

PdfFileReader()是否读取文件?

是否附加了内容合并?



根据 PdfFileMerger 文档,不需要使用阅读器。只需将路径传递给追加函数(此处已创建流对象):

 merger.append( file(os.path.join(files_dir,filename),'  rb'))


Hello everyone.
I have a website that I need to get ~1000 pdfs from. The pdfs are differed by a four digit number between 1101-2300, like
https://intvd.gib.gov.tr/2014_Emlak_Arsa/EmlakServlet?tip=9&ilKodu=3&ilceKodu=1101
Some of the numbers between the range are not assigned to a pdf though, so I needed something that would
1-) dowload all the pdfs
2-) delete the pdfs that are 1KB (these are non-assigned ones)
3-) merge all the pdf files into one pdf file
There were answers to each of these steps but not together, so I looked at those and made something. In the end though, all I get is a 1 KB pdf file called merged_full.pdf
What am I doing wrong?
Cheers

What I have tried:

import urllib.request
import os
from PyPDF2 import PdfFileReader, PdfFileMerger

os.chdir('the_directory')

mylist=(list(range(1101,2500)))

for i in mylist:
    def download_file(download_url):
        web_file = urllib.request.urlopen('https://intvd.gib.gov.tr/2014_Emlak_Arsa/EmlakServlet?tip=9&ilKodu=3&ilceKodu=%d'%(i),'%d.pdf'%(i))
        local_file = open('%d.pdf'%(i), 'wb')
        local_file.write(web_file.read())
        web_file.close()
        local_file.close()
        filesize = os.path.getsize('%d.pdf'%(i))
        if filesize<1024:
                os.remove('%d.pdf'%(i))
        del filesize

files_dir = "the_directory"
pdf_files = [f for f in os.listdir(files_dir) if f.endswith("pdf")]
merger = PdfFileMerger()

for filename in pdf_files:
    merger.append(PdfFileReader(os.path.join(files_dir, filename), "rb"))

merger.write(os.path.join(files_dir, "merged_full.pdf"))

解决方案

A size of 1 KB indicates that the created file is just an empty PDF.

You should insert checks in your code to see if each step is working as expected:

Are the files downloaded?
Which sizes have the existing file?
Are the files listed in your pdf_files?
Does PdfFileReader() reads files?
Is the content appended to the merger?

According to the PdfFileMerger documentation there should be no need to use a reader. Just pass the path to the append function (creating already the stream object here):

merger.append(file(os.path.join(files_dir, filename), 'rb'))


这篇关于用于在循环中下载和合并pdf文件的Python代码产生1KB的结束文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆