httplib.BadStatusLine: '' [英] httplib.BadStatusLine: ''

查看:43
本文介绍了httplib.BadStatusLine: ''的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

和往常一样,我经常遇到问题,我已经彻底搜索了当前问题的答案,但发现自己不知所措.以下是我搜索过的一些地方:- 如何修复 httplib.BadStatusLine 异常?- Python httplib2 处理异常- python http状态码

As always, I frequently have issues, and I have thoroughly searched for an answer to the current one but find myself at a loss. Here are some of the places I have searched: - How to fix httplib.BadStatusLine exception? - Python httplib2 Handling Exceptions - python http status code

我的问题如下.我创建了一个蜘蛛,想要抓取不同的网址.当我独立抓取每个 url 时,一切正常.但是,当我尝试抓取两者时,我收到以下错误:httplib.BadStatusLine: ''

My issue is the following. I have created a spider and want to crawl different urls. When I crawl each url independently everything works fine. However, when I try to crawl both I get the following error: httplib.BadStatusLine: ''

我遵循了我阅读的一些建议(请参阅上面提到的链接),并且可以为每个请求打印 response.status 工作,但 response.url 不会打印并抛出错误.(我只打印这两个语句来尝试确定错误的来源).

I have followed some advice that I read (see links mentioned above) and can print the response.status for each request works, but the response.url does not print and the error is thrown. (I only print both statements to try to identify the source of the error).

我希望这很清楚.

我使用的是scrapy和硒

I am using scrapy and selenium

class PeoplePage(Spider):
    name = "peopleProfile"
    allowed_domains = ["blah.com"]
    handle_httpstatus_list = [200, 404]
    start_urls = [
        "url1",
        "url2"
    ]

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        print response.status
        print '???????????????????????????????????'
        if response.status == 200:
            self.driver.implicitly_wait(5)
            self.driver.get(response.url)
            print response.url
            print '!!!!!!!!!!!!!!!!!!!!'

            # DO STUFF

        self.driver.close()

推荐答案

基于 Python Doc, httplib.BadStatusLine 如果服务器以我们不理解的 HTTP 状态代码响应,则引发.您可以尝试通过此异常.如果您要调用多个 url,则不应关闭驱动程序.

Based on Python Doc, httplib.BadStatusLine raised if a server responds with a HTTP status code that we don’t understand. You can try to pass this exception. You should not close your driver if you are going to call more than one url.

试试这个:

def parse(self, response):
    try:
        print response.status
        print '???????????????????????????????????'
        if response.status == 200:
            self.driver.implicitly_wait(5)
            self.driver.get(response.url)
            print response.url
            print '!!!!!!!!!!!!!!!!!!!!'

            # DO STUFF
    except httplib.BadStatusLine:
        pass

这篇关于httplib.BadStatusLine: ''的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆