urllib2中未知的url类型错误 [英] Unknown url type error in urllib2
问题描述
我在SO上搜索了很多类似的问题,但没有找到与我的案子完全匹配的内容.
I have searched a lot of similar question on SO, but did not find an exact match to my case.
我正在尝试使用python 2.7下载视频
I am trying to download a video using python 2.7
这是我下载视频的代码
import urllib2
from bs4 import BeautifulSoup as bs
with open('video.txt','r') as f:
last_downloaded_video = f.read()
webpage = urllib2.urlopen('http://*.net/watch/**-'+last_downloaded_video)
soup = bs(webpage)
a = []
for link in soup.find_all('a'):
if link.has_attr('data-video-id'):
a.append(link)
#try just with first data-video-id
id = a[0]['data-video-id']
webpage2 = urllib2.urlopen('http://*/video/play/'+id)
soup = bs(webpage2)
string = str(soup.find_all('script')[2])
print string
url = string.split(': ')[1].split(',')[0]
url = url.replace('"','')
print url
print type(url)
video = urllib2.urlopen(url).read()
filename = "video.mp4"
with open(filename,'wb') as f:
f.write(video)
此代码给出了未知的url类型错误.追溯是
This code gives an unknown url type error. The traceback is
Traceback (most recent call last):
File "naruto.py", line 26, in <module>
video = urllib2.urlopen(url).read()
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 427, in _open
'unknown_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1247, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: 'http>
但是,当我将相同的url存储在变量中并尝试从终端下载它时,未显示任何错误. 我对问题是什么感到困惑. 我在python邮件列表中 >
However, when i store the same url in a variable and attempt to download it from terminal, no error is shown. I am confused as to what the problem is. I got a similar question in python mailing list
推荐答案
如果不查看页面中正在刮擦的HTML,就很难分辨,但是,在开始处会有一个迷离的'
(单引号)字符. URL可能是原因-这会导致相同的异常:
It's hard to tell without seeing the HTML from the page that you are scraping, however, a stray '
(single quote) character at the beginning of the URL might be the cause - this causes the same exception:
>>> import urllib2
>>> urllib2.urlopen("'http://blah.com")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "urllib2.py", line 404, in open
response = self._open(req, data)
File "urllib2.py", line 427, in _open
'unknown_open', req)
File "urllib2.py", line 382, in _call_chain
result = func(*args)
File "urllib2.py", line 1249, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: 'http>
因此,请尝试清理您的URL并删除所有引号.
So, try cleaning up your URL and remove any stray quotes.
OP反馈后更新:
print语句的结果表明,URL在URL字符串的开头和结尾均带有单引号字符.传递给urlopen()
时,URL周围不应包含任何类型的 any 引号.您可以使用以下方法从URL字符串中删除前引号和尾引号(单引号和双引号):
The results of the print statement indicate that the URL has a single quote character at the beginning and end of the URL string. There should not any quotes of any type surrounding the URL when it is passed to urlopen()
. You can remove leading and trailing quotes (both single and double) from the URL string with this:
url = url.strip('\'"')
这篇关于urllib2中未知的url类型错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!