修改HTML响应(不是标题) [英] Modify HTML Response (Not Headers)

查看:80
本文介绍了修改HTML响应(不是标题)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望有人可以帮助我或为我指明正确的方向.

Hoping someone can help me out or point me in the right direction.

我被要求找出如何使Akamai(或任何其他CDN或NGINX)修改实际的响应主体.

I've been asked to find out how to make Akamai (or any other CDN, or NGINX) modify the actual response body.

为什么?

我要使CDN将所有"http://"请求更改为"https://",而不是修改应用程序代码以将"//"用于外部资源请求.

I'm to make the CDN change all "http://" requests to "https://" instead of modifying the App code to use "//" for external resource requests.

这可能吗?

有人知道吗?

推荐答案

通过许多不同的方法,这似乎是可能,但这并不是说建议可能是这样.

This appears to be possible via a number of different approaches, but that's not to say how advisable it might actually be.

这似乎有潜在的问题(例如:如果您重写了本不应该重写的内容该怎么办?)并且机器资源密集(很多CPU周期反复解析和修改响应主体).

It seems potentially problematic (example: what if you rewrite something that shouldn't have been rewritten?) and machine-resource-intensive (a lot of CPU cycles to parse and munge response bodies, repeatedly).

这是我发现的东西:

Nginx的 http_sub_module 似乎可以相当简单地完成此任务,假设您要替换的内容很简单,并且只需要匹配每页一个模式,例如将<a href="http://example.com/...替换为<a href="https://example.com/...一次或多次即可.这种内容混乱似乎很粗略,但是根据您所处的情况(可能是应用程序的有限控制之一),它可能可能使您到达那里.

Nginx has the http_sub_module that appears to accomplish this in a fairly straightforward way, assuming what you want to replace is simple and you only need to match one pattern per page, like replacing <a href="http://example.com/... with <a href="https://example.com/..., one or more times. This kind of content-mungery seems sketchy but depending on the situation you're in (which may be one of limited control of the application) it might get you there.

似乎有一个叫做 http_substitutions_filter 的东西,可能是非正式的,或者至少不是核心Nginx发行版的一部分可以执行更强大的基于过滤器的响应主体重写.

It looks like there's something called http_substitutions_filter, possibly unofficial or at least not part of the core Nginx distribution that can do more powerful filter-based rewriting of response bodies.

清漆似乎具有类似的功能(可能一个插件),但HAProxy 不会,因为它仅处理标头并留下尸体,除非进行gzip卸载时.其他具有反向代理功能的软件(例如Apache或Squid)也可能会提供一些有用的东西,您可以将它们放在应用程序服务器的前面.

Varnish seems to have a similar capability (possibly a plugin) but HAProxy doesn't, since it only deals in headers and leaves bodies alone except when doing gzip offloading. Other reverse-proxy-capable software like Apache or Squid might also offer something useful, that you'd place in front of your application server.

无论如何,我的最初印象是,简单的字符串替换可能无法完全解决您的问题,即使基于regex的替换实际上也不够用,因为在regexe中没有明显的复杂性,因为您始终要承担重写的风险你不应该的东西.

My initial impression, in any event, is that simple string replacing may not quite get you there, and even regex-based replacing isn't really sufficient, without significant sophistication in the regexes, because you always run the risk of rewriting something that you shouldn't.

为了以最正确的方式实现此目的,我建议确实需要发生",是使用DOM解析库实际解释生成的HTML,遍历树并在以下位置修改相关元素:放置,然后将修订的文件交给请求者.通过这种方式,可以根据对上下文内容的理解来修改文档.

What I would suggest "really needs to happen" in order to accomplish this purpose in the most correct way, would be to actually interpret the generated HTML with a DOM parsing library, traverse the tree, and modify the relevant elements in-place, before handing the revised document to the requester. This way, the document gets modified based on a contextual understanding of its contents.

在我看来,这听起来很复杂,因为它是-因此,我再次建议您重新考虑计划的方法,除非这超出了您的控制范围.

It sounds complicated, in my opinion, because it is -- so I would again suggest you reconsider your planned approach unless this is outside your control.

最终思想:好奇心使我受益匪浅,因此我提出了这个问题,并对我编写的http反向代理进行了改进(出于不同的目的),以便基于内容类型,它实际上可以解析和遍历HTML结构作为适当的实体,在将响应主体返回给请求者之前对其进行适当的修改(如上所述).

Final thought: Curiosity got the best of me, so I took this question and retrofitted the http reverse proxy I wrote (for a different purpose) so that, based on the content-type, it could actually parse and walk the HTML structure as a proper entity, modifying it in place (as described above), before returning the response body to the requester.

正如我预期的那样,这实际上是处理器密集型的.我的测试内容是来自现场站点的29K真实世界HTML,其中包含56个<a href ...>和6个<link rel ...>元素,并且在1 GHz Opteron 1218上的重写操作需要128毫秒,而在2.4GHz Xeon E5620上则需要43毫秒.这些基准严格用于附加操作-不包括实际代理"功能本身所需的(较少时间).此时间成本不是无法克服的,但可能会增加大量CPU时间.这比基于正则表达式的内容重写所需的时间要长得多,但是它要精确得多,并且不太可能破坏其接触的页面.

This turns out, as I expected, to be fairly processor-intensive. My test content was 29K of real-world HTML from a live site, with containing 56 <a href ...> and 6 <link rel ...> elements, and the rewrite operation required 128 ms on a 1 GHz Opteron 1218, and 43 ms 2.4GHz Xeon E5620. These benchmarks are strictly for the additional operations -- excluding the (smaller amount of) time required for the actual "proxy" functionality itself. This time cost is not insurmountable, but could add up to a lot of CPU time. This is far longer than a regular expression-based content rewrite would take, but it's far more precise and unlikely to break the pages it touches.

这篇关于修改HTML响应(不是标题)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆