DNS查找失败:找不到'your.proxy.com':[Errno -5]没有与主机名关联的地址 [英] DNS lookup failed: address 'your.proxy.com' not found: [Errno -5] No address associated with hostname

查看:3234
本文介绍了DNS查找失败:找不到'your.proxy.com':[Errno -5]没有与主机名关联的地址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题在这里是一个解决问题的扩展,即。爬行链接,同时用scrapy进行身份验证。 使用Scrapy进行身份验证抓取LinkedIn @Gates



虽然我保持脚本的基础相同,只添加了我自己的session_key和session_password - 并且在将start url更改为我的用例之后,如下所示。

  class LinkedPySpider(InitSpider):
name ='Linkedin'
allowed_domains = ['linkedin.com']
login_page = 'https://www.linkedin.com/uas/login'
start_urls = [http://www.linkedin.com/nhome/]

[也试过这个start url]
start_urls =
[http://www.linkedin.com/profile/view?id=38210724&trk=nav_responsive_tab_profile]

我也尝试将start_url更改为第二个(已注释),看看是否可以从我自己的个人资料页面开始抓取,我无法这样做。

  **我收到的错误**  -  
scrapy crawl Linkedin
** 2013-07- 29 11:37:10 + 0530 [Linkedin] DEBUG:重试< GET http://www.linkedin.com/nhome/> (失败1次):DNS查找失败:找不到地址'your.proxy.com:[Errno -5]没有与主机名相关联的地址**


**要查看如果名称空间解决了,我尝试了 - :**
nslookup www.linkedin.com #works
nslookup www.linkedin.com/uas/login#我认为一个主要网站的页面深度,不解决,是正常吗?

然后,我还试图看看错误是否可能是由于名称服务器不能解析和附加的名称服务器如下所示。
echo $ http_proxy #gives http:// username:password@your.proxy.com:80
sudo vi /etc/resolv.conf
,并附加免费的快速dns名称服务器的IP地址为遵循这个文件:
nameserver 208.67.222.222
nameserver 208.67.220.220
nameserver 202.51.5.52

我不太了解NS冲突和DNS查找失败,但是这可能是因为我在一个虚拟机 - 事实上,其他的刮擦项目似乎工作正常吗?



我的基本用例是能够提取连接和他们工作的公司列表,以及一些其他属性。所以,我想从主配置文件页面中的连接(全部)抓取/分页,如果我在start_url中使用公开的配置文件,则不显示。刮板贝壳 http://www.linkedin.com/in/ektagrover
On通过hxs.select传递合法的XPath - 这似乎有效,但是如果我和蜘蛛一起使用它,因为它不符合我的基本用例(如下)



问题:我的start_url有什么问题,还是只是我的方式?假设发布认证的start_page可能会潜在地任何网页该网站,当我将其身份验证重定向到 https://www.linkedin.com/uas/login



工作环境 - 我在带有ubuntu 12.04 LTS的Oracle VM虚拟盒带有Python 2.7.3,带有Scrapy 0.14.4



有什么工作/答案 - 看起来我的代理服务器错误地指向
echo $ http_proxy - 这给了 http:// username:password@your.proxy.com:80
[取消设置环境变量$ http_proxy]只是做了http_proxy =,它取消了代理,然后做了echo $ http_proxy,这给了null来确认。发布,只是抓住了抓住Linkedin,通过认证模块工作。虽然我被塞在这里和那里的硒,但这是另一个问题。谢谢@warwaruk

解决方案

  **我收到的错误**  -  
scrapy crawl Linkedin
** 2013-07-29 11:37:10 + 0530 [Linkedin] DEBUG:重试< GET http://www.linkedin.com/nhome/> (失败1次):DNS查找失败:找不到地址'your.proxy.com:[Errno -5]没有与主机名相关联的地址**


**要查看如果名称空间解决了,我尝试了 - :**
nslookup www.linkedin.com #works
nslookup www.linkedin.com/uas/login#我认为一个主要网站的页面深度,不解决,是正常吗?

然后,我还试图看看错误是否可能是由于名称服务器不能解析和附加的名称服务器如下所示。
echo $ http_proxy #gives http:// username:password@your.proxy.com:80

您有一个代理集: http:// username:password@your.proxy.com:80



Obviosly,它不存在于互联网:

  $ nslookup your.proxy.com 
服务器: 127.0.1.1
地址:127.0.1.1#53

**服务器找不到your.proxy.com:NXDOMAIN

取消设置环境变量 $ http_proxy 或设置代理并更改env。变量相应。


This question is an extension of the resolved question here, ie. Crawling linkedin while authenticating with scrapy. Crawling LinkedIn while authenticated with Scrapy @Gates

While I keep the base of the script the same, only adding my own session_key and session_password - and after changing the start url particular to my use-case, as below.

class LinkedPySpider(InitSpider):
    name = 'Linkedin'
    allowed_domains = ['linkedin.com']
    login_page = 'https://www.linkedin.com/uas/login'
    start_urls=["http://www.linkedin.com/nhome/"]

[Also tried with this start url] 
start_urls =
["http://www.linkedin.com/profile/view?id=38210724&trk=nav_responsive_tab_profile"]

I also tried changing the start_url to the second one(commented), to see if I could start scraping from my own profile page, I was unable to do so.

**Error that I get** - 
scrapy crawl Linkedin
**2013-07-29 11:37:10+0530 [Linkedin] DEBUG: Retrying <GET http://www.linkedin.com/nhome/> (failed 1 times): DNS lookup failed: address 'your.proxy.com' not found: [Errno -5] No address associated with hostname.**


**To see if the Name space was resolved, I tried -:**
nslookup www.linkedin.com #works
nslookup www.linkedin.com/uas/login # I think the depth of pages within a main website, does not resolve, and is normal right ?

Then I also tried to see if the error could have been due to Name Server not resolving and appended the Nameservers as below .
echo $http_proxy #gives http://username:password@your.proxy.com:80
sudo vi /etc/resolv.conf
and appended the free fast dns nameservers IP address as follows to this file :
nameserver 208.67.222.222
nameserver 208.67.220.220
nameserver 202.51.5.52

I am not too good with NS conflicts and DNS lookup failures, but could this be due to the fact that I am in a VM - though other scraping projects seemed to work just fine ?

My base use-case is to be able to extract connections and the list of companies they worked at, and a bunch of other attributes. So, I want to crawl/paginate from the "Connections" (All) in the main profile page, which does NOT show up if I use public profile in the start_url, ie. scrapy shell http://www.linkedin.com/in/ektagrover On passing legitimate XPath via hxs.select - this seems to work, but NOT if I used it along with a spider, since it did not meet my base-usecase(As below)

Question : Is there something wrong with my start_url, or is it just the way that I am "assuming that post the authentication the start_page could come to potentially ANY webpage in that site, when I redirect it post authentication at "https://www.linkedin.com/uas/login"

Work-environment - I am on Oracle VM Virtual Box with ubuntu 12.04 LTS with Python 2.7.3, with Scrapy 0.14.4

What worked/ Answer -- Looks like my proxy server was incorrectly pointing to echo $http_proxy - which gives http://username:password@your.proxy.com:80 [Unset the environment variable $http_proxy ] Just did " http_proxy= " , which unsets the proxy then did echo $http_proxy , which gives null to confirm . Post that just did scrapy crawl Linkedin, which worked through the authentication module. Though I am getting stuck here and there on selenium, but that's for another question. Thank you, @warwaruk

解决方案

**Error that I get** - 
scrapy crawl Linkedin
**2013-07-29 11:37:10+0530 [Linkedin] DEBUG: Retrying <GET http://www.linkedin.com/nhome/> (failed 1 times): DNS lookup failed: address 'your.proxy.com' not found: [Errno -5] No address associated with hostname.**


**To see if the Name space was resolved, I tried -:**
nslookup www.linkedin.com #works
nslookup www.linkedin.com/uas/login # I think the depth of pages within a main website, does not resolve, and is normal right ?

Then I also tried to see if the error could have been due to Name Server not resolving and appended the Nameservers as below .
echo $http_proxy #gives http://username:password@your.proxy.com:80

You have a proxy set: http://username:password@your.proxy.com:80.

Obviosly, it doesn't exist in Internet:

$ nslookup your.proxy.com
Server:         127.0.1.1
Address:        127.0.1.1#53

** server can't find your.proxy.com: NXDOMAIN

Either unset the environment variable $http_proxy or set up a proxy and change the env. variable accordingly.

这篇关于DNS查找失败:找不到'your.proxy.com':[Errno -5]没有与主机名关联的地址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆