URL 中的奇怪字符 [英] Weird characters in URL

查看:20
本文介绍了URL 中的奇怪字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的网络服务器中,当用户请求带有奇怪字符的 URL 时,我会删除这些字符.并且系统记录这些情况.当我检查消毒过的箱子时,我发现了这些.我很好奇这些 URL 的目的是什么?

In my webserver when user requests URLs with weird characters, I remove these characters. And system logs these cases. When I check sanitized cases I found these. I'm curious that what would be the objective of these URLs ?

我检查了 IP,这些都是真人,并且像正常人一样使用网站.但是这些人的 20 个 URL 请求中,有 1 次 URL 最后出现了这些奇怪的字符.

I check the IPs and these are real people and uses website as a normal person. But 1 time in their 20 URL requets of these people, URL has these weird characters at last.

http://example.com/@%EF%BF%BD%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0,
http://example.com/%60E%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%60E%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/p%EF%BF%BD%1D%01?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%EF%BF%BDC%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%EF%BF%BDR%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD`%EF%BF%BD%EF%BF%BD%7F, agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36
http://example.com/%EF%BF%BDe%EF%BF%BDv8%01%EF%BF%BD?o=3&g=P%01%EF%BF%BD&s=&z=%EF%BF%BD%EF%BF%BD%15%01%EF%BF%BD%EF%BF%BD, agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36

http://en.wikipedia.org/wiki/Specials_(Unicode_block)

推荐答案

它们本质上是格式错误的 URL.它们可以由试图利用网站漏洞的特定恶意软件、浏览器插件或扩展出现故障或 JS 文件中的错误(即使用 Google Analytics 跟踪)结合特定浏览器版本/操作系统生成.无论如何,您实际上无法控制来自客户端的请求,并且您无法阻止这种情况发生,因此,如果您生成的 HTML/JS 代码是正确的,那么您已经完成了工作.

They are essentially malformed URLs. They can be generated from a specific malware that is trying to exploit web site vulnerabilities, from malfunctioning browser plugin or extension, or from a bug in a JS file (i.e. tracking with Google Analytics) in combination with a specific browser version/operating system. In any case, you can't actually control what requests will come from a client and there's nothing you can do to stop that so, if your generated HTML/JS code is correct, you have done your work.

如果您出于任何原因想要更正这些 URL,您可以启用 URL 重写 并使用正则表达式过滤器设置规则以将这些 URL 转换为有效 URL.无论如何,我不建议这样做:Web 服务器应该响应 error 404 page not found 消息,因为这是标准的(毕竟这是客户端错误),这是在我认为比应用 URL 重写更快更安全的方法.(重写程序可能包含错误,因此有人可以尝试利用它等)

If you like to correct those URLs for any reason, you can enable URL rewriting and set a rule with a regular expression filter to transform those URLs to valid URLs. Anyway, I don't suggest do that: the web server should respond with a error 404 page not found message, because that is the standard (it's a client error, after all), and this is in my opinion a faster and safer method than applying URL rewriting. (rewriting procedure may contains bugs, so someone can try to exploit that, etc, etc)

出于好奇,您可以使用您选择的在线 URL 解码器轻松解码这些 URL(即 ),但基本上你会发现你已经知道的:有很多这些 URL 中的 UTF-8 替换字符.

For sake of curiosity, you can easily decode those URLs with an online URL decoder of your choice (i.e. this), but essentially you will discover what you already know: there are a lot of UTF-8 replacement characters in those URLs.

事实上,%EF%BF%BD 是 UTF-8 替换字符的 3 个字节 (EF BF BD) 的十六进制表示的 url 编码版本.您也可以看到该字符为 EF BF BDFFFDï ¿ ½,等等, 取决于您选择的表示方法.

In fact, %EF%BF%BD is the url-encoded version of the hex representation of the 3 bytes (EF BF BD) of the UTF-8 replacement character. You can see that character also as or EF BF BD or FFFD or ï ¿ ½, and so on, depending of the representation method you choose.

此外,您可以自行检查客户端如何处理该字符.去这里:

Also, you can check by your own how the client handles that character. Go here:

http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%EF%BF%BD&mode=char

按下 GO 按钮,然后使用浏览器开发人员工具检查实际发生的情况:在将未知字符发送到 Web 服务器之前,浏览器实际上是使用 %EF%BF%BD 对其进行编码.

press the GO button and, using your browser developer tools, check what really happens: the browser is actually encoding the unknown character with %EF%BF%BD before sending it to the web server.

这篇关于URL 中的奇怪字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆