如何停止NodeJS的“请求"使用代理时模块更改请求 [英] How to stop NodeJS "Request" module changes request when using proxy
问题描述
对不起,如果这令人困惑.
Sorry if this comes off as confusing.
我已经使用NodeJS请求模块编写了一个脚本,该模块在网站上运行并执行功能,然后返回数据.当我不通过将其设置为false来使用代理时,此脚本可以很好地工作.这不是Selenium/puppeteer
I have written a script using the NodeJS request module that runs and performs a function on a website then returns with the data. This script works perfectly fine when I do not use a proxy by setting it to false. This is not a task that is NOT allowed to be done with Selenium/puppeteer
proxy: false
但是,当我设置一个(有效的)代理服务器时.它无法执行相同的任务,并被网站防火墙/antibot软件检测到.
However, when I set a (working) proxy. It fails to perform the same task and is detected by the website firewall/antibot software.
proxy: http://xx.xxx.xx.xx:3128
一些注意事项:
- 我尝试了许多(超过20个)不同的代理提供程序(住宅和数据中心),他们都遇到了这个问题
- 如果在我的系统上全局设置了该代理,则不会发生此问题
- 如果在Chrome扩展程序中设置了该代理,则不会发生此问题
- SSL密码套件不匹配Chrome,但是当不使用代理时它们仍然不匹配,所以我认为这不是问题
- 保持标题顺序的一致性非常重要
- I have tried many (20+) different proxy providers (Residential and Datacenter) and they all have this issue
- The issue does not occur if that proxy is set globally on my system
- The issue does not occur if that proxy is set in a chrome extension
- The SSL cipher suites do not match Chrome but they still don't match when not using a proxy so I assume that isn't the issue
- It is very important to keep consistency in the header order
问题基本上是.使用代理时,请求模块是否会更改任何内容(例如标头顺序)?
The question basically is. Does the request module change anything when using a proxy such as the header order?
这是通过/失败时发生的情况的图像.
Here is an image of what happens when it passes/fails.
唯一的区别是更改了导致失败的代理.提出一个请求,一个请求不提出.
The only difference is changing the proxy that causes this to fail. One request being made with, one request being made without.
url : url,
simple : false,
forever: true,
resolveWithFullResponse: true,
gzip: true,
headers: {
'Host' : 'www.sitename.com',
'Connection' : 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-encoding' : 'gzip, deflate, br',
'Accept-Language' : 'en-GB,en-US;q=0.9,en;q=0.8',
},
method : 'GET',
jar: globalJar,
simple: false,
followRedirect: false,
followAllRedirects: false,
推荐答案
停用旧帐户后,我想返回并给出此问题的实际答案,现在我已经完全理解了答案.一年前我问的是不可能的,Antibot正在通过TLS ClientHello(甚至在TCP/帧级别上略有指纹)对我进行指纹识别.
After deactivating my old account I wanted to come back and give an actual answer to this question now I fully understand the answer. What I was asking one year ago was not possible, The antibot was fingerprinting me through the TLS ClientHello (And even slightly on the TCP/frame level).
首先,我编写了一个名为request-curl
的包装器,该包装器将libcurl/curl二进制文件包装为与request-promise
相同格式的单个库,这使我对请求有了更多控制(防止编码,http2/proxy支持和进一步的会话/TLS控制),这仍然只使我达到了第687个最受欢迎的ClientHello( https"://client.tlsfingerprint.io:8443/).这还不够好.
To start, I wrote my a wrapper called request-curl
which wrapped libcurl/curl binaries into a single library with the same format as request-promise
, this gave me much more control over the request (preventing encoding, http2/proxy support and further session/TLS control) this still only let me reach a medicore rank of the 687th most popular ClientHello (https://client.tlsfingerprint.io:8443/). It wasn't good enough.
我不得不移动语言. NodeJS太多是高级语言,无法进行真正的深度控制(必须修改从第3层发送的数据包).因此,作为我的问题的答案.
I had to move language. NodeJS is too much of a high-level language to allow for a really deep control (had to modify packets being sent from Layer 3). So as the answer to my question.
这在NodeJS中尚不可能尚不能实现-更不用说 now 无需维护的request.js库了.
This is not yet possible to do in NodeJS - Let alone with the now unmaintained request.js library.
对于阅读本文的任何人,如果您想提出完美的请求来绕过反机器人安全性,则必须使用另一种语言:我建议使用Golang的utls或c#的BouncyCastle.真心感谢您,因为我花了一年的时间才真正做到了.即使这样,这些语言仍存在着更多的内部问题,它们还不具备其功能(Go不支持基本"标头排序,您需要猴子补丁/修改内部等,utls并不轻易支持代理).清单不停.
For anyone reading this, if you want to forge perfect requests to bypass antibot security you must move to a different language: I recommend utls in Golang or BouncyCastle in c#. Godspeed to you as it took me a year to really know how to do this. Even then, there's more internal issues these languages have and features they do not yet supposed (Go doesn't support 'basic' header-ordering, you need to monkey-patch/modify internals etc, utls doesn't easily support proxies). The list goes on and on.
如果您还不了解它,那真是个小坑,我建议您不要输入它.
If you're not already too deep into it, it's one hell of a rabbithole and I recommend you do not enter it.
这篇关于如何停止NodeJS的“请求"使用代理时模块更改请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!