使用 urlopen 打开 url 列表 [英] Using urlopen to open list of urls

查看:25
本文介绍了使用 urlopen 打开 url 列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Python 脚本,可以获取网页并对其进行镜像.它适用于一个特定的页面,但我无法让它在多个页面上工作.我假设我可以将多个 URL 放入一个列表中,然后将其提供给函数,但我收到此错误:

I have a python script that fetches a webpage and mirrors it. It works fine for one specific page, but I can't get it to work for more than one. I assumed I could put multiple URLs into a list and then feed that to the function, but I get this error:

Traceback (most recent call last):
  File "autowget.py", line 46, in <module>
    getUrl()
  File "autowget.py", line 43, in getUrl
    response = urllib.request.urlopen(url)
  File "/usr/lib/python3.2/urllib/request.py", line 139, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.2/urllib/request.py", line 361, in open
    req.timeout = timeout
AttributeError: 'tuple' object has no attribute 'timeout'

这是违规代码:

url = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com']
def getUrl(*url):
    response = urllib.request.urlopen(url)
    with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
        shutil.copyfileobj(response, out_file)
getUrl()

我已经用尽了 Google 试图找到如何使用 urlopen() 打开列表的方法.我找到了一种有效的方法.它需要一个 .txt 文档并逐行浏览它,将每一行作为 URL 提供,但我正在使用 Python 3 编写此文档,无论出于何种原因 twillcommandloop 不会导入.另外,这种方法很笨拙,并且需要(据说)不必要的工作.

I've exhausted Google trying to find how to open a list with urlopen(). I found one way that sort of works. It takes a .txt document and goes through it line-by-line, feeding each line as a URL, but I'm writing this using Python 3 and for whatever reason twillcommandloop won't import. Plus, that method is unwieldy and requires (supposedly) unnecessary work.

无论如何,我们将不胜感激.

Anyway, any help would be greatly appreciated.

推荐答案

在你的代码中有一些错误:

In your code there are some errors:

  • 您使用可变参数列表(错误中的元组)定义 getUrls;
  • 您将 getUrls 参数作为单个变量(改为列表)进行管理

你可以试试这个代码

import urllib2
import shutil

urls = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com']
def getUrl(urls):
   for url in urls:
      #Only a file_name based on url string
      file_name = url.replace('https://', '').replace('.', '_').replace('/','_')
      response = urllib2.urlopen(url)
      with open(file_name, 'wb') as out_file:
         shutil.copyfileobj(response, out_file)
getUrl(urls)

这篇关于使用 urlopen 打开 url 列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆