如何停止NodeJS的“请求"使用代理时模块更改请求 [英] How to stop NodeJS "Request" module changes request when using proxy

查看:162
本文介绍了如何停止NodeJS的“请求"使用代理时模块更改请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对不起,如果这令人困惑.

Sorry if this comes off as confusing.

我已经使用NodeJS请求模块编写了一个脚本,该模块在网站上运行并执行功能,然后返回数据.当我不通过将其设置为false来使用代理时,此脚本可以很好地工作.这不是Selenium/puppeteer

I have written a script using the NodeJS request module that runs and performs a function on a website then returns with the data. This script works perfectly fine when I do not use a proxy by setting it to false. This is not a task that is NOT allowed to be done with Selenium/puppeteer

proxy: false

但是,当我设置一个(有效的)代理服务器时.它无法执行相同的任务,并被网站防火墙/antibot软件检测到.

However, when I set a (working) proxy. It fails to perform the same task and is detected by the website firewall/antibot software.

proxy: http://xx.xxx.xx.xx:3128

一些注意事项:

  • 我尝试了许多(超过20个)不同的代理提供程序(住宅和数据中心),他们都遇到了这个问题
  • 如果在我的系统上全局设置了该代理,则不会发生此问题
  • 如果在Chrome扩展程序中设置了该代理,则不会发生此问题
  • SSL密码套件不匹配Chrome,但是当不使用代理时它们仍然不匹配,所以我认为这不是问题
  • 保持标题顺序的一致性非常重要
  • I have tried many (20+) different proxy providers (Residential and Datacenter) and they all have this issue
  • The issue does not occur if that proxy is set globally on my system
  • The issue does not occur if that proxy is set in a chrome extension
  • The SSL cipher suites do not match Chrome but they still don't match when not using a proxy so I assume that isn't the issue
  • It is very important to keep consistency in the header order

问题基本上是.使用代理时,请求模块是否会更改任何内容(例如标头顺序)?

The question basically is. Does the request module change anything when using a proxy such as the header order?

这是通过/失败时发生的情况的图像.

Here is an image of what happens when it passes/fails.

唯一的区别是更改了导致失败的代理.提出一个请求,一个请求不提出.

The only difference is changing the proxy that causes this to fail. One request being made with, one request being made without.

url    : url,
simple : false,
forever: true,
resolveWithFullResponse: true,
gzip: true,
headers: {
    'Host'             : 'www.sitename.com',
    'Connection'       : 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent'       : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36',
    'Accept'           : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-encoding'  : 'gzip, deflate, br',
    'Accept-Language'  : 'en-GB,en-US;q=0.9,en;q=0.8',
},
method : 'GET',
jar: globalJar,
simple: false,
followRedirect: false,
followAllRedirects: false, 

推荐答案

停用旧帐户后,我想返回并给出此问题的实际答案,现在我已经完全理解了答案.一年前我问的是不可能的,Antibot正在通过TLS ClientHello(甚至在TCP/帧级别上略有指纹)对我进行指纹识别.

After deactivating my old account I wanted to come back and give an actual answer to this question now I fully understand the answer. What I was asking one year ago was not possible, The antibot was fingerprinting me through the TLS ClientHello (And even slightly on the TCP/frame level).

首先,我编写了一个名为request-curl的包装器,该包装器将libcurl/curl二进制文件包装为与request-promise相同格式的单个库,这使我对请求有了更多控制(防止编码,http2/proxy支持和进一步的会话/TLS控制),这仍然只使我达到了第687个最受欢迎的ClientHello( https"://client.tlsfingerprint.io:8443/).这还不够好.

To start, I wrote my a wrapper called request-curl which wrapped libcurl/curl binaries into a single library with the same format as request-promise, this gave me much more control over the request (preventing encoding, http2/proxy support and further session/TLS control) this still only let me reach a medicore rank of the 687th most popular ClientHello (https://client.tlsfingerprint.io:8443/). It wasn't good enough.

我不得不移动语言. NodeJS太多是高级语言,无法进行真正的深度控制(必须修改从第3层发送的数据包).因此,作为我的问题的答案.

I had to move language. NodeJS is too much of a high-level language to allow for a really deep control (had to modify packets being sent from Layer 3). So as the answer to my question.

这在NodeJS中尚不可能尚不能实现-更不用说 now 无需维护的request.js库了.

This is not yet possible to do in NodeJS - Let alone with the now unmaintained request.js library.

对于阅读本文的任何人,如果您想提出完美的请求来绕过反机器人安全性,则必须使用另一种语言:我建议使用Golang的utls或c#的BouncyCastle.真心感谢您,因为我花了一年的时间才真正做到了.即使这样,这些语言仍存在着更多的内部问题,它们还不具备其功能(Go不支持基本"标头排序,您需要猴子补丁/修改内部等,utls并不轻易支持代理).清单不停.

For anyone reading this, if you want to forge perfect requests to bypass antibot security you must move to a different language: I recommend utls in Golang or BouncyCastle in c#. Godspeed to you as it took me a year to really know how to do this. Even then, there's more internal issues these languages have and features they do not yet supposed (Go doesn't support 'basic' header-ordering, you need to monkey-patch/modify internals etc, utls doesn't easily support proxies). The list goes on and on.

如果您还不了解它,那真是个小坑,我建议您不要输入它.

If you're not already too deep into it, it's one hell of a rabbithole and I recommend you do not enter it.

这篇关于如何停止NodeJS的“请求"使用代理时模块更改请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆