urllib2 错误没有给出主机 [英] urllib2 error no host given
问题描述
(已解决)当我从我的文件中读取值时,一个换行符被添加到末尾.(\n)这是在那时拆分我的请求字符串.我认为这与我首先将值保存到文件中的方式有关.非常感谢.
(SOLVED) When I am reading the values in from my file a newline char is getting added onto the end.(\n) this is splitting my request string at that point. I think it's to do with how I saved the values to the file in the first place. Many thanks.
我有以下代码:
results = 'http://www.myurl.com/'+str(mystring)
print str(results)
request = urllib2.Request(results)
request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
opener = urllib2.build_opener()
text = opener.open(request).read()
这是一个循环.在循环运行几次后, str(mystring) 更改以提供不同的结果集.我可以尽可能多地循环脚本,保持 str(mystring) 的值不变,但每次更改 str(mystring) 的值时,我都会收到一条错误消息,说代码尝试构建开启器时没有给出主机.
Which is in a loop. after the loop has run a few times str(mystring) changes to give a different set of results. I can loop the script as many times as I like keeping the value of str(mystring) constant but every time I change the value of str(mystring) I get an error saying no host given when the code tries to build the opener.
opener = urllib2.build_opener()
有人可以帮忙吗?
TIA,
保罗.
更多代码在这里.....
More code here.....
import sys
import string
import httplib
import urllib2
import re
import random
import time
def StripTags(text):
finished = 0
while not finished:
finished = 1
start = text.find("<")
if start >= 0:
stop = text[start:].find(">")
if stop >= 0:
text = text[:start] + text[start+stop+1:]
finished = 0
return text
mystring="test"
d={}
with open("myfile","r") as f:
while True:
page_counter=0
print str(mystring)
try:
while page_counter <20:
results = 'http://www.myurl.com/'+str(mystring)
print str(results)
request = urllib2.Request(results)
request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
opener = urllib2.build_opener()
text = opener.open(request).read()
finds = (re.findall('([\w\.\-]+'+mystring+')',StripTags(text)))
for find in finds:
d[find]=1
uniq_emails=d.keys()
page_counter = page_counter +1
print "found this " +str(finds)"
random.seed()
n = random.random()
i = n * 5
print "Pausing script for " + str(i) + " Seconds" + ""
time.sleep(i)
mystring=next(f)
except IOError:
print "No result found!"+""
推荐答案
在 while 循环中,您将结果设置为非 url 的内容:
In the while loop, you're setting results to something which is not a url:
结果 = 'myurl+str(mystring)'
results = 'myurl+str(mystring)'
应该是 results = myurl+str(mystring)
顺便说一句,您似乎不需要将所有转换为字符串 (str()
):(应要求扩展)
By the way, it appears there's no need for all the casting to string (str()
) you do:
(expanded on request)
print str(foo)
:在这种情况下,永远不需要 str().Python 将始终打印foo's
字符串表示results = 'http://www.myurl.com/'+str(mystring)
.这也是不必要的;mystring
已经是一个字符串,所以'http://www.myurl.com/' + mystring
就足够了.打印暂停脚本" + str(i) + 秒"
.在这里,如果没有str()
,你会得到一个错误,因为你不能做 string + int.但是,print "foo", 1, "bar"
确实有效.print "foo %i bar" % 1
和print "foo {0} bar".format(1)
也是如此(参见 这里)
print str(foo)
: in such a case, str() is never necessary. Python will always printfoo's
string representationresults = 'http://www.myurl.com/'+str(mystring)
. This is also unnecessary;mystring
is already a string, so'http://www.myurl.com/' + mystring
would suffice.print "Pausing script for " + str(i) + " Seconds"
. Here you would get an error withoutstr()
since you can't do string + int. However,print "foo", 1, "bar"
does work. As doprint "foo %i bar" % 1
andprint "foo {0} bar".format(1)
(see here)
这篇关于urllib2 错误没有给出主机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!