如何下载链接刮[python]的PDF文件？ [英] How to Download PDFs from Scraped Links [Python]?

查看：218 发布时间：2016/8/5 19:09:14 pdf web-scraping beautifulsoup python-requests python-3.4

本文介绍了如何下载链接刮[python]的PDF文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在制作PDF网站刮板Python编写的。从本质上讲，我试图刮掉所有的讲义，从我的课程，这是在PDF的形式之一。我想输入一个URL，然后得到的PDF文件，并将它们保存在我的笔记本电脑的目录。我看过几个教程，但我不完全知道如何去这样做。对StackOverflow的问题似乎都不需要任何帮助我。

I'm working on making a PDF Web Scraper in Python. Essentially, I'm trying to scrape all of the lecture notes from one of my courses, which are in the form of PDFs. I want to enter a url, and then get the PDFs and save them in a directory in my laptop. I've looked at several tutorials, but I'm not entirely sure how to go about doing this. None of the questions on StackOverflow seem to be helping me either.

下面是我到目前为止有：

Here is what I have so far:

import requests
from bs4 import BeautifulSoup
import shutil

bs = BeautifulSoup

url = input("Enter the URL you want to scrape from: ")
print("")

suffix = ".pdf"

link_list = []

def getPDFs():    
    # Gets URL from user to scrape
    response = requests.get(url, stream=True)
    soup = bs(response.text)

    #for link in soup.find_all('a'): # Finds all links
     #   if suffix in str(link): # If the link ends in .pdf
      #      link_list.append(link.get('href'))
    #print(link_list)

    with open('CS112.Lecture.09.pdf', 'wb') as out_file:
        shutil.copyfileobj(response.raw, out_file)
    del response
    print("PDF Saved")

getPDFs()

本来，我已经得到了所有的链接到PDF文件中，但不知道如何下载它们;在code表示，现在被注释掉了。

Originally, I had gotten all of the links to the PDFs, but did not know how to download them; the code for that is now commented out.

现在我已经得到的地方我想只下载一个PDF点;和一个PDF并获取下载，但它是一个0KB文件。

Now I've gotten to the point where I'm trying to download just one PDF; and a PDF does get downloaded, but it's a 0KB file.

如果它有什么用途，我使用Python 3.4.2

If it's of any use, I'm using Python 3.4.2

如何下载链接刮[python]的PDF文件？ [英] How to Download PDFs from Scraped Links [Python]?

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

如何下载链接刮[python]的PDF文件？ [英] How to Download PDFs from Scraped Links [Python]?

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭