使用 urlopen 打开 url 列表 [英] Using urlopen to open list of urls
问题描述
我有一个 Python 脚本,可以获取网页并对其进行镜像.它适用于一个特定的页面,但我无法让它在多个页面上工作.我假设我可以将多个 URL 放入一个列表中,然后将其提供给函数,但我收到此错误:
I have a python script that fetches a webpage and mirrors it. It works fine for one specific page, but I can't get it to work for more than one. I assumed I could put multiple URLs into a list and then feed that to the function, but I get this error:
Traceback (most recent call last):
File "autowget.py", line 46, in <module>
getUrl()
File "autowget.py", line 43, in getUrl
response = urllib.request.urlopen(url)
File "/usr/lib/python3.2/urllib/request.py", line 139, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.2/urllib/request.py", line 361, in open
req.timeout = timeout
AttributeError: 'tuple' object has no attribute 'timeout'
这是违规代码:
url = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com']
def getUrl(*url):
response = urllib.request.urlopen(url)
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
getUrl()
我已经用尽了 Google 试图找到如何使用 urlopen() 打开列表的方法.我找到了一种有效的方法.它需要一个 .txt
文档并逐行浏览它,将每一行作为 URL 提供,但我正在使用 Python 3 编写此文档,无论出于何种原因 twillcommandloop
不会导入.另外,这种方法很笨拙,并且需要(据说)不必要的工作.
I've exhausted Google trying to find how to open a list with urlopen(). I found one way that sort of works. It takes a .txt
document and goes through it line-by-line, feeding each line as a URL, but I'm writing this using Python 3 and for whatever reason twillcommandloop
won't import. Plus, that method is unwieldy and requires (supposedly) unnecessary work.
无论如何,我们将不胜感激.
Anyway, any help would be greatly appreciated.
推荐答案
在你的代码中有一些错误:
In your code there are some errors:
- 您使用可变参数列表(错误中的元组)定义 getUrls;
- 您将 getUrls 参数作为单个变量(改为列表)进行管理
你可以试试这个代码
import urllib2
import shutil
urls = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com']
def getUrl(urls):
for url in urls:
#Only a file_name based on url string
file_name = url.replace('https://', '').replace('.', '_').replace('/','_')
response = urllib2.urlopen(url)
with open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
getUrl(urls)
这篇关于使用 urlopen 打开 url 列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!