NodeJs 镜像网站代理 [英] NodeJs mirror website proxy

查看:44
本文介绍了NodeJs 镜像网站代理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当收到请求时,您将如何编写一个简单地镜像网站的服务器?例如,点击运行 NodeJS 的 http://localhost:5000 将渲染带有图像和所有内容的 cnn.com.这叫直通代理吗?

How would you write a server that simply mirrored a website when a request was received? For example, hitting http://localhost:5000 which is running NodeJS would render cnn.com with images and everything. Is this called a passthrough proxy?

我不是在寻找需要在浏览器设置中配置实际代理的东西,而是通过传递请求来提供另一个站点的镜像.

I'm not looking for something that requires configuring an actual proxy within your browser settings, but instead just serves up essentially a mirror of another site by passing the requests through.

推荐答案

首先,让我确保我理解你的问题.

First, let me make sure I understand your question.

您希望您的用户浏览到 http://mynodeproxy.example.com 并将该页面放入他们的浏览器呈现为 http://cnn.com.对吗?

You want to have your users browse to http://mynodeproxy.example.com and have that page in their browser render as if it was http://cnn.com. Right?

答案是:你不能按照你认为的方式去做.这可以通过 2 种方法实现:

The answer is: You can't do it the way you think you can. This is possible with 2 approaches:

  1. 用户在他们的浏览器设置中配置真正的代理服务器(这就是为什么所有浏览器都支持配置代理服务器).您可以使用现有的代理服务器或尝试使用节点和一些专门的应用程序逻辑编写自己的代理服务器.但关键是用户不要在浏览器的地址栏中输入您的代理地址.他们在浏览器设置的代理服务器"字段中输入您的代理地址,然后仍然在其中输入http://cnn.com"他们的浏览器地址栏.

  1. Users configure a real proxy server in their browser settings (this is why all browsers support configuring a proxy server). You could use an existing proxy server or try to write your own with node and some specialized application logic. But the point is the user's don't type your proxy address into the browser's address bar. They type your proxy address into their browser settings "proxy server" field and still type "http://cnn.com" into their browser address bar.

如果您控制来自网络的所有传出流量,您就可以执行酒店式的技巧,例如 DNS 劫持或通过代理路由所有流量.

If you control all outgoing traffic from your network, you can do hotel-style tricks like DNS hijacking or routing all traffic through your proxy.

但是让您的用户将您的直通代理服务器地址放在他们浏览器的地址栏中是行不通的,因为您的代理从 CNN.com 获取的 HTML 将具有指向其他 cnn.com 资源(其他页面位于网站、图像、字体、CSS、JS 等).如果这些链接包含主机名而不是相对于包含的 HTML 文档,浏览器将直接连接到 cnn.com 以加载它们,绕过您的代理.

But this won't work by having your users put your passthrough proxy server address in their browser's address bar because the HTML your proxy gets from CNN.com is going to have hyperlinks back to other cnn.com resources (other pages on the site, images, fonts, CSS, JS, etc). If those links include the hostname instead of being relative to the containing HTML document, the browser will connect directly to cnn.com to load them, bypassing your proxy.

现在假设 CNN HTML 有一个链接,如 View the CNN Home Page.当用户点击它时会发生什么?没错,您的代理完全不在画面中并绕过了.这就是代理服务器与浏览器显式支持一起工作的原因.

Now imagine the CNN HTML has a link like <a href="http://cnn.com">View the CNN Home Page</a>. What happens when the user clicks that? That's right, your proxy is entirely out of the picture and bypasses. This is why proxy servers work with explicit browser support.

一旦 CNN.com 的 javascript 开始执行诸如发出 ajax 请求、向 DOM 动态添加内容等操作,您将发现仅通过代理和修改初始 cnn.com 主页 HTML 是不可能的.是的,您可以为一个极其简单的人为示例网页执行此操作,但实际上对于像 cnn.com 这样的现代流行网站,这是不可行的.

Once CNN.com's javascript starts doing things like making ajax requests, dynamically adding stuff to the DOM, etc, you will see this is not possible by simply proxying and modifying the initial cnn.com home page HTML. Yes, you could do this for an extremely trivial contrived example web page, but realistically a modern popular site like cnn.com, it's not feasible.

这篇关于NodeJs 镜像网站代理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆