用于在循环中下载和合并pdf文件的Python代码产生1KB的结束文件 [英] Python code for downloading and merging pdf files in a loop results in a 1KB endfile

查看：281 发布时间：2019/6/8 6:54:10 Python PDF

本文介绍了用于在循环中下载和合并pdf文件的Python代码产生1KB的结束文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

大家好。

我有一个网站，我需要得到~1000 pdf from。 pdf与1101-2300之间的四位数不同，如

https://intvd.gib.gov.tr/2014_Emlak_Arsa/EmlakServlet?tip=9&ilKodu=3&ilceKodu= < b> 1101

虽然这个范围之间的一些数字没有分配给pdf，所以我需要一些东西来支付
1-）下载全部pdf

2-）删除1KB的pdf（这些是未分配的pdfs）

3-）将所有pdf文件合并为一个pdf文件

每个步骤都有答案，但不是在一起，所以我看了一下并做了些什么。最后，我得到的是1 KB pdf文件，名为merged_full.pdf

我做错了什么？

干杯

我尝试了什么：

 导入 urllib.request 
  import  os 
 来自 PyPDF2  import  PdfFileReader，PdfFileMerger 
 
 os.chdir（'  the_directory'）
 
 mylist =（list（范围（ 1101 ， 2500 ）））
 
   i   mylist：
  def  download_file（download_url）：
 web_file = urllib.request.urlopen（'   https://intvd.gib.gov.tr/2014_Emlak_Arsa/EmlakServlet?tip=9&ilKodu= 3& ilceKodu =％d'％（i），' ％d.pdf' ％（i））
 local_file = open（' ％d.pdf'％（i ），'  wb'）
 local_file.write（web_file.read（））
 web_file.close（）
 local_file.close（）
 filesize = os.path.getsize（' ％d.pdf'％（i））
  if  filesize< 1024：
 os.remove（' ％d.pdf'％（i））
  del  filesize 
 
 files_dir =   the_directory 
 pdf_files = [f  for  f  in  os.listdir（files_dir） if  f.endswith（  pdf）] 
 merger = PdfFileMerger（）
 
  for  filename  in  pdf_files：
 merger.append（PdfFileReader（os.path.join（files_dir，filename），  rb） ）
 
 merger.write（os.path.join（files_dir，  merged_full.pdf ））

解决方案

大小为1 KB表示创建的文件只是一个空PDF。

你应该在你的代码中插入支票，看看每一步是否按预期工作：

是否下载了文件？

哪些尺寸有现有文件？

您的 pdf_files 中列出的文件是什么？

PdfFileReader（）是否读取文件？

是否附加了内容合并？

根据 PdfFileMerger 文档，不需要使用阅读器。只需将路径传递给追加函数（此处已创建流对象）：
 merger.append（ file（os.path.join（files_dir，filename），'  rb'））

Hello everyone.
I have a website that I need to get ~1000 pdfs from. The pdfs are differed by a four digit number between 1101-2300, like
https://intvd.gib.gov.tr/2014_Emlak_Arsa/EmlakServlet?tip=9&ilKodu=3&ilceKodu=1101
Some of the numbers between the range are not assigned to a pdf though, so I needed something that would
1-) dowload all the pdfs
2-) delete the pdfs that are 1KB (these are non-assigned ones)
3-) merge all the pdf files into one pdf file
There were answers to each of these steps but not together, so I looked at those and made something. In the end though, all I get is a 1 KB pdf file called merged_full.pdf
What am I doing wrong?
Cheers

What I have tried:

import urllib.request
import os
from PyPDF2 import PdfFileReader, PdfFileMerger

os.chdir('the_directory')

mylist=(list(range(1101,2500)))

for i in mylist:
    def download_file(download_url):
        web_file = urllib.request.urlopen('https://intvd.gib.gov.tr/2014_Emlak_Arsa/EmlakServlet?tip=9&ilKodu=3&ilceKodu=%d'%(i),'%d.pdf'%(i))
        local_file = open('%d.pdf'%(i), 'wb')
        local_file.write(web_file.read())
        web_file.close()
        local_file.close()
        filesize = os.path.getsize('%d.pdf'%(i))
        if filesize<1024:
                os.remove('%d.pdf'%(i))
        del filesize

files_dir = "the_directory"
pdf_files = [f for f in os.listdir(files_dir) if f.endswith("pdf")]
merger = PdfFileMerger()

for filename in pdf_files:
    merger.append(PdfFileReader(os.path.join(files_dir, filename), "rb"))

merger.write(os.path.join(files_dir, "merged_full.pdf"))

解决方案

A size of 1 KB indicates that the created file is just an empty PDF.

You should insert checks in your code to see if each step is working as expected:

Are the files downloaded?
Which sizes have the existing file?
Are the files listed in your pdf_files?
Does PdfFileReader() reads files?
Is the content appended to the merger?

According to the PdfFileMerger documentation there should be no need to use a reader. Just pass the path to the append function (creating already the stream object here):
merger.append(file(os.path.join(files_dir, filename), 'rb'))

这篇关于用于在循环中下载和合并pdf文件的Python代码产生1KB的结束文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用于在循环中下载和合并pdf文件的Python代码产生1KB的结束文件 [英] Python code for downloading and merging pdf files in a loop results in a 1KB endfile

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

用于在循环中下载和合并pdf文件的Python代码产生1KB的结束文件 [英] Python code for downloading and merging pdf files in a loop results in a 1KB endfile

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭