用于在循环中下载和合并pdf文件的Python代码产生1KB的结束文件 [英] Python code for downloading and merging pdf files in a loop results in a 1KB endfile
问题描述
大家好。
我有一个网站,我需要得到~1000 pdf from。 pdf与1101-2300之间的四位数不同,如
https://intvd.gib.gov.tr/2014_Emlak_Arsa/EmlakServlet?tip=9&ilKodu=3&ilceKodu= < b> 1101
虽然这个范围之间的一些数字没有分配给pdf,所以我需要一些东西来支付
1-)下载全部pdf
2-)删除1KB的pdf(这些是未分配的pdfs)
3-)将所有pdf文件合并为一个pdf文件
每个步骤都有答案,但不是在一起,所以我看了一下并做了些什么。最后,我得到的是1 KB pdf文件,名为merged_full.pdf
我做错了什么?
干杯
我尝试了什么:
导入 urllib.request
import os
来自 PyPDF2 import PdfFileReader,PdfFileMerger
os.chdir(' the_directory')
mylist =(list(范围( 1101 , 2500 )))
i mylist:
def download_file(download_url):
web_file = urllib.request.urlopen(' https://intvd.gib.gov.tr/2014_Emlak_Arsa/EmlakServlet?tip=9&ilKodu= 3& ilceKodu =%d'%(i),' %d.pdf' %(i))
local_file = open(' %d.pdf'%(i ),' wb')
local_file.write(web_file.read())
web_file.close()
local_file.close()
filesize = os.path.getsize(' %d.pdf'%(i))
if filesize< 1024:
os.remove(' %d.pdf'%(i))
del filesize
files_dir = the_directory
pdf_files = [f for f in os.listdir(files_dir) if f.endswith( pdf)]
merger = PdfFileMerger()
for filename in pdf_files:
merger.append(PdfFileReader(os.path.join(files_dir,filename), rb) )
merger.write(os.path.join(files_dir, merged_full.pdf ))
大小为1 KB表示创建的文件只是一个空PDF。
你应该在你的代码中插入支票,看看每一步是否按预期工作:
是否下载了文件?
哪些尺寸有现有文件?
您的pdf_files
中列出的文件是什么?
PdfFileReader()是否读取文件?
是否附加了内容合并?
根据PdfFileMerger
文档,不需要使用阅读器。只需将路径传递给追加
函数(此处已创建流对象):
merger.append( file(os.path.join(files_dir,filename),' rb'))
Hello everyone.
I have a website that I need to get ~1000 pdfs from. The pdfs are differed by a four digit number between 1101-2300, like
https://intvd.gib.gov.tr/2014_Emlak_Arsa/EmlakServlet?tip=9&ilKodu=3&ilceKodu=1101
Some of the numbers between the range are not assigned to a pdf though, so I needed something that would
1-) dowload all the pdfs
2-) delete the pdfs that are 1KB (these are non-assigned ones)
3-) merge all the pdf files into one pdf file
There were answers to each of these steps but not together, so I looked at those and made something. In the end though, all I get is a 1 KB pdf file called merged_full.pdf
What am I doing wrong?
Cheers
What I have tried:
import urllib.request
import os
from PyPDF2 import PdfFileReader, PdfFileMerger
os.chdir('the_directory')
mylist=(list(range(1101,2500)))
for i in mylist:
def download_file(download_url):
web_file = urllib.request.urlopen('https://intvd.gib.gov.tr/2014_Emlak_Arsa/EmlakServlet?tip=9&ilKodu=3&ilceKodu=%d'%(i),'%d.pdf'%(i))
local_file = open('%d.pdf'%(i), 'wb')
local_file.write(web_file.read())
web_file.close()
local_file.close()
filesize = os.path.getsize('%d.pdf'%(i))
if filesize<1024:
os.remove('%d.pdf'%(i))
del filesize
files_dir = "the_directory"
pdf_files = [f for f in os.listdir(files_dir) if f.endswith("pdf")]
merger = PdfFileMerger()
for filename in pdf_files:
merger.append(PdfFileReader(os.path.join(files_dir, filename), "rb"))
merger.write(os.path.join(files_dir, "merged_full.pdf"))
A size of 1 KB indicates that the created file is just an empty PDF.
You should insert checks in your code to see if each step is working as expected:
Are the files downloaded?
Which sizes have the existing file?
Are the files listed in yourpdf_files
?
Does PdfFileReader() reads files?
Is the content appended to the merger?
According to thePdfFileMerger
documentation there should be no need to use a reader. Just pass the path to theappend
function (creating already the stream object here):
merger.append(file(os.path.join(files_dir, filename), 'rb'))
这篇关于用于在循环中下载和合并pdf文件的Python代码产生1KB的结束文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!