urllib2 错误没有给出主机 [英] urllib2 error no host given

查看:24
本文介绍了urllib2 错误没有给出主机的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(已解决)当我从我的文件中读取值时,一个换行符被添加到末尾.(\n)这是在那时拆分我的请求字符串.我认为这与我首先将值保存到文件中的方式有​​关.非常感谢.

(SOLVED) When I am reading the values in from my file a newline char is getting added onto the end.(\n) this is splitting my request string at that point. I think it's to do with how I saved the values to the file in the first place. Many thanks.

我有以下代码:

results = 'http://www.myurl.com/'+str(mystring)
print str(results)
request = urllib2.Request(results)
request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
opener = urllib2.build_opener()
text = opener.open(request).read()

这是一个循环.在循环运行几次后, str(mystring) 更改以提供不同的结果集.我可以尽可能多地循环脚本,保持 str(mystring) 的值不变,但每次更改 str(mystring) 的值时,我都会收到一条错误消息,说代码尝试构建开启器时没有给出主机.

Which is in a loop. after the loop has run a few times str(mystring) changes to give a different set of results. I can loop the script as many times as I like keeping the value of str(mystring) constant but every time I change the value of str(mystring) I get an error saying no host given when the code tries to build the opener.

opener = urllib2.build_opener()

有人可以帮忙吗?

TIA,

保罗.

更多代码在这里.....

More code here.....

import sys
import string
import httplib
import urllib2
import re
import random
import time


def StripTags(text):
    finished = 0
    while not finished:
        finished = 1
        start = text.find("<")
        if start >= 0:
            stop = text[start:].find(">")
            if stop >= 0:
                text = text[:start] + text[start+stop+1:]
                finished = 0
    return text
mystring="test"

d={}

    with open("myfile","r") as f:
        while True:
            page_counter=0
            print str(mystring)

            try:
                while page_counter <20:
                    results = 'http://www.myurl.com/'+str(mystring)
                    print str(results)
                    request = urllib2.Request(results)
                    request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
                    opener = urllib2.build_opener()
                    text = opener.open(request).read()
                    finds = (re.findall('([\w\.\-]+'+mystring+')',StripTags(text)))
                    for find in finds:
                        d[find]=1
                        uniq_emails=d.keys()
                    page_counter = page_counter +1
                    print "found this " +str(finds)"
                    random.seed()
                    n = random.random()
                    i = n * 5
                    print "Pausing script for " + str(i) + " Seconds" + ""
                    time.sleep(i)
                mystring=next(f)
            except IOError:
                print "No result found!"+""

推荐答案

在 while 循环中,您将结果设置为非 url 的内容:

In the while loop, you're setting results to something which is not a url:

结果 = 'myurl+str(mystring)'

results = 'myurl+str(mystring)'

应该是 results = myurl+str(mystring)

顺便说一句,您似乎不需要将所有转换为字符串 (str()):(应要求扩展)

By the way, it appears there's no need for all the casting to string (str()) you do: (expanded on request)

  • print str(foo):在这种情况下,永远不需要 str().Python 将始终打印 foo's 字符串表示
  • results = 'http://www.myurl.com/'+str(mystring).这也是不必要的;mystring 已经是一个字符串,所以 'http://www.myurl.com/' + mystring 就足够了.
  • 打印暂停脚本" + str(i) + 秒".在这里,如果没有 str(),你会得到一个错误,因为你不能做 string + int.但是, print "foo", 1, "bar" 确实有效.print "foo %i bar" % 1print "foo {0} bar".format(1) 也是如此(参见 这里)
  • print str(foo): in such a case, str() is never necessary. Python will always print foo's string representation
  • results = 'http://www.myurl.com/'+str(mystring). This is also unnecessary; mystring is already a string, so 'http://www.myurl.com/' + mystring would suffice.
  • print "Pausing script for " + str(i) + " Seconds". Here you would get an error without str() since you can't do string + int. However, print "foo", 1, "bar" does work. As do print "foo %i bar" % 1 and print "foo {0} bar".format(1) (see here)

这篇关于urllib2 错误没有给出主机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆