从Google专利中使用Python 3.4下载文件 [英] Download files using Python 3.4 from Google Patents

查看:218
本文介绍了从Google专利中使用Python 3.4下载文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想下载(使用Python 3.4)Google专利批量下载页面上的所有(.zip)文件 http://www.google.com/googlebooks/uspto-patents-grants-text.html

I would like to download (using Python 3.4) all (.zip) files on the Google Patent Bulk Download Page http://www.google.com/googlebooks/uspto-patents-grants-text.html

我想知道这相当于大量的数据。)我想将所有文件保存一年,目录 [年] ,所以1976年每周)文件。我想将它们保存到我的Python脚本所在的目录。

(I am aware that this amounts to a large amount of data.) I would like to save all files for one year in directories [year], so 1976 for all the (weekly) files in 1976. I would like to save them to the directory that my Python script is in.

我尝试使用 urllib .request 包,但我可以得到足够远的http文本,而不是如何点击文件下载它。

I've tried using the urllib.request package, but I could get far enoughto get to the http text, not how to "click" on the file to download it.

import urllib.request

url = 'http://www.google.com/googlebooks/uspto-patents-grants-text.html'
savename = 'google_patent_urltext'
urllib.request.urlretrieve(url, savename )

非常感谢您的帮助。

推荐答案

据了解,一个命令将模拟左键文件并自动下载。如果是这样,可以使用硒。
如下:

As I understand you seek for a command that will simulate leftclicking on file and automatically download it. If so, you can use Selenium. something like:

from selenium import webdriver 
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
profile = FirefoxProfile ()
profile.set_preference("browser.download.folderList",2)
profile.set_preference("browser.download.manager.showWhenStarting",False)
profile.set_preference("browser.download.dir", 'D:\\') #choose folder to download to
profile.set_preference("browser.helperApps.neverAsk.saveToDisk",'application/octet-stream')
driver = webdriver.Firefox(firefox_profile=profile)
driver.get('https://www.google.com/googlebooks/uspto-patents-grants-text.html#2015')
filename = driver.find_element_by_xpath('//a[contains(text(),"ipg150106.zip")]') #use loop to list all zip files
filename.click()

更新!应该使用'application / octet-stream'zip-mime类型,而不是application / zip。现在应该工作:)

这篇关于从Google专利中使用Python 3.4下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆