不接收标头Scrapy ProxyMesh [英] Not receiving headers Scrapy ProxyMesh

查看:106
本文介绍了不接收标头Scrapy ProxyMesh的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Scrapy/ProxyMesh很陌生. 当我在ProxyMesh网站上看到带宽消耗时,我对Proxymesh服务器的请求似乎正在工作,并且meta.proxy在我的日志中是正确的. 但是,当我在Scrapy中记录响应标头时,我没有收到应该接收的X-Proxymesh-IP. 这是我的代码.我在做什么错了?

I am quite new to Scrapy / ProxyMesh. My request to the Proxymesh server seems to be working as I see my bandwith consumption on the ProxyMesh website, and the meta.proxy is correct in my logs. However, when I log the response headers in Scrapy, I do not receive the X-Proxymesh-IP that I am supposed to receive. Here is my code. What am I doing wrong?

这是我的中间件

class Proxymesh(object):

 def __init__(self):

    logging.debug('Initialized Proxymesh middleware')

    self.proxy_ip = 'http://host:port'

 def process_request(self, request, spider):

    logging.debug('Processing request through proxy IP: ' + self.proxy_ip)

    request.meta['proxy'] = self.proxy_ip

这些是我在蜘蛛网中的设置

These are my settings in my spider

custom_settings = {
    "DOWNLOADER_MIDDLEWARES": {
        "projectName.middlewares.proxymesh.Proxymesh" : 1,
     }

这是响应头的样子

['Set-Cookie']:['__cfduid=d88d4e4cb7... HttpOnly']
['Vary']:['User-Agent,Accept-Encoding']
['Server']:['cloudflare-nginx']
['Date']:['Thu, 19 Oct 2017 10...38:10 GMT']
['Cf-Ray']:['3b031b30cbef1565-CDG']
['Content-Type']:['text/html; charset=UTF-8']

谢谢您的帮助

推荐答案

不知道是否与此相关,但是我将在此处发布. proxymesh和scrapy或python请求存在问题. 连接到代理时,将CONNECT请求发送到代理服务,以创建将转发实际请求的隧道. 如果请求成功,则proxymesh将X-Proxymesh-IP添加到CONNECT请求的确认响应中.报头完全被scrapy遗漏了,因为它仅考虑了实际请求的响应报头.

Don't know if this relevant anymore but I'm going to post it here. There's an issue with proxymesh and scrapy or python requests. When connecting to a proxy, a CONNECT request is sent to the proxy service in order to create a tunnel which will forward the actual request. If the request is successful, proxymesh adds the X-Proxymesh-IP in the CONNECT requests's confirmation response. This is header totally missed by scrapy as it only takes into consideration the response headers of the actual request.

这仅发生在HTTPS请求上,因为实际请求的内容已加密.

This only happens to HTTPS requests because the content of the actual request is encrypted.

参考:

https://docs.proxymesh.com/article /74-proxy-server-headers-over-https

https://bugs.python.org/issue24964?fbclid=IwAR1c88hpLu2OdmEXlwfZfb2n8lMIqT8JvjLeO7pzsvFEiZBVlfJNpYZ4aFk

https://github.com/requests/requests/issues/3061?fbclid=IwAR34XDJa7dJqNpH33LRlvpoRHpaZJhVl75zXfFkEuBa7IjOVCoIxecW0zhw

这篇关于不接收标头Scrapy ProxyMesh的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆