在 ColdFusion 中使用带有 cgi.PATH_INFO 的 URL 中使用 unicode 时出现问题 [英] Problem using unicode in URLs with cgi.PATH_INFO in ColdFusion

查看:11
本文介绍了在 ColdFusion 中使用带有 cgi.PATH_INFO 的 URL 中使用 unicode 时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 ColdFusion(IIS 6 上的 MX7)站点具有搜索功能,可将搜索词附加到 URL,例如http://www.example.com/search.cfm/searchterm.

My ColdFusion (MX7 on IIS 6) site has search functionality which appends the search term to the URL e.g. http://www.example.com/search.cfm/searchterm.

我遇到的问题是这是一个多语言网站,所以搜索词可能是另一种语言,例如القاهرة 指向搜索 URL,例如 http://www.example.com/search.cfm/القاهرة

The problem I'm running into is this is a multilingual site, so the search term may be in another language e.g. القاهرة leading to a search URL such as http://www.example.com/search.cfm/القاهرة

问题是当我从 URL 中检索搜索词时.我正在使用 cgi.PATH_INFO 来检索搜索页面的路径和搜索词,并从中提取搜索词,例如/search.cfm/searchterm 但是,当在搜索中使用 unicode 字符时,它们会转换为问号,例如/search.cfm/??????.

The problem is when I come to retrieve the search term from the URL. I'm using cgi.PATH_INFO to retrieve the path of the search page and the search term and extracting the search term from this e.g. /search.cfm/searchterm however, when unicode characters are used in the search they are converted to question marks e.g. /search.cfm/??????.

这些显示为实际的问号,而不是浏览器无法格式化 unicode 字符,或者它们在输出时被破坏.

These appear actual question marks, rather than the browser not being able to format unicode characters, or them being mangled on output.

我找不到任何有关 ColdFusion 是否在 URL 中支持 unicode 的信息,或者我如何解决这个问题并以某种方式获取完整的 URL - 有人有什么想法吗?

I can't find any information about whether ColdFusion supports unicode in the URL, or how I can go about resolving this and getting hold of the complete URL in some way - does anyone have any ideas?

干杯,

汤姆

编辑:进一步的研究让我相信这个问题可能与 IIS 而不是 ColdFusion 有关,但我原来的查询仍然有效.

Edit: Further research has lead me to believe the issue may related to IIS rather than ColdFusion, but my original query still stands.

进一步编辑

GetPageContext().GetRequest().GetRequestUrl().ToString()的结果是http://www.example.com/search.cfm/searchterm/????? 所以看来问题很深.

The result of GetPageContext().GetRequest().GetRequestUrl().ToString() is http://www.example.com/search.cfm/searchterm/????? so it appears the issue goes fairly deep.

推荐答案

是的,这不是 ColdFusion 的错.这是一个常见的问题.

Yeah, it's not really ColdFusion's fault. It's a common problem.

这主要是原始 CGI 规范的错误,该规范指定 PATH_INFO 必须进行 % 解码,从而丢失了原本允许的 %xx 字节序列你要找出真正的字符是什么意思.

It's mostly the fault of the original CGI specification, which specifies that PATH_INFO has to be %-decoded, thus losing the original %xx byte sequences that would have allowed you to work out which real characters were meant.

这部分是 IIS 的错,因为它总是尝试将路径部分中提交的 %xx 字节读取为 UTF-8 编码的 Unicode(除非路径不是有效的 UTF-8 字节序列在这种情况下,它会为 Windows 默认代码页填充,但无法让您发现这种情况发生了).完成后,它将它作为 Unicode 字符串放入环境变量中(因为 envvars 在 Windows 下是 Unicode).

And it's partly IIS's fault, because it always tries to read submitted %xx bytes in the path part as UTF-8-encoded Unicode (unless the path isn't a valid UTF-8 byte sequence in which case it plumps for the Windows default code page, but gives you no way to find out this has happened). Having done so, it puts it in environment variables as a Unicode string (as envvars are Unicode under Windows).

然而,大多数使用 C stdio 的基于字节的工具(我假设这适用于 ColdFusion,就像在 Perl、Python 2、PHP 等下一样)然后尝试将环境变量读取为字节,然后 MSC 运行时使用 Windows 默认代码页再次对 Unicode 内容进行编码.因此,任何不适合默认代码页的字符都将永远丢失.这将包括您在西方 Windows 安装上运行时的阿拉伯字符.

However most byte-based tools using the C stdio (and I'm assuming this applies to ColdFusion, as it does under Perl, Python 2, PHP etc.) then try to read the environment variables as bytes, and the MS C runtime encodes the Unicode contents again using the Windows default code page. So any characters that don't fit in the default code page are lost for good. This would include your Arabic characters when running on a Western Windows install.

一个可以直接访问 Win32 GetEnvironmentVariableW API 的聪明脚本可以调用它来检索本机 Unicode 环境变量,然后他们可以将其编码为 UTF-8 或其他任何他们想要的,假设输入也是 UTF-8(这是您今天通常想要的).但是,我不认为 CodeFusion 可以为您提供这种访问权限,而且无论如何它只能从 IIS6 开始工作;IIS5.x 会在它们到达环境变量之前丢弃所有非默认代码页字符.

A clever script that has direct access to the Win32 GetEnvironmentVariableW API could call that to retrieve a native-Unicode environment variable which they could then encode to UTF-8 or whatever else they wanted, assuming that the input was also UTF-8 (which is what you'd generally want today). However, I don't think CodeFusion gives you this access, and in any case it only works from IIS6 onwards; IIS5.x will throw away any non-default-codepage characters before they even reach the environment variables.

否则,您最好的选择是 URL 重写.如果 CF 上方的层可以将 search.cfm/القاهرة 转换为 search.cfm/?q=القاهرة 那么您就不会遇到同样的问题,因为 PATH_INFO 不同,>QUERY_STRING 变量未指定为 % 解码,因此 %xx 字节保留在 CF 级别的工具可以看到它们的位置.

Otherwise, your best bet is URL-rewriting. If a layer above CF can convert that search.cfm/القاهرة to search.cfm/?q=القاهرة then you don't face the same problem, as the QUERY_STRING variable, unlike PATH_INFO, is not specified to be %-decoded, so the %xx bytes remain where a tool at CF's level can see them.

这篇关于在 ColdFusion 中使用带有 cgi.PATH_INFO 的 URL 中使用 unicode 时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆