在ColdFusion中使用带有cgi.PATH_INFO的URL中的unicode时出现问题 [英] Problem using unicode in URLs with cgi.PATH_INFO in ColdFusion

查看:161
本文介绍了在ColdFusion中使用带有cgi.PATH_INFO的URL中的unicode时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的ColdFusion(IIS 6上的MX7)网站具有搜索功能,可将搜索字词附加到网址,例如 http://www.example.com/search.cfm/searchterm

My ColdFusion (MX7 on IIS 6) site has search functionality which appends the search term to the URL e.g. http://www.example.com/search.cfm/searchterm.

我正在运行的问题这是一个多语言网站,所以搜索术语可能是另一种语言,例如القاهرة导致搜索网址,例如 http://www.example.com/search.cfm/القاهرة

The problem I'm running into is this is a multilingual site, so the search term may be in another language e.g. القاهرة leading to a search URL such as http://www.example.com/search.cfm/القاهرة

问题是当我从URL检索搜索词时。我正在使用 cgi.PATH_INFO 来检索搜索页面的路径和搜索字词,并从此处搜索搜索字词。 /search.cfm/searchterm 但是,当在搜索中使用unicode字符时,它们会转换为问号,例如 /search.cfm / ??????

The problem is when I come to retrieve the search term from the URL. I'm using cgi.PATH_INFO to retrieve the path of the search page and the search term and extracting the search term from this e.g. /search.cfm/searchterm however, when unicode characters are used in the search they are converted to question marks e.g. /search.cfm/??????.

这些似乎是实际的问号,而不是浏览器无法格式化unicode字符,或者它们在输出上被破坏。

These appear actual question marks, rather than the browser not being able to format unicode characters, or them being mangled on output.

我找不到有关ColdFusion是否支持URL中的unicode的任何信息,或者我如何解决此问题并以某种方式获取完整的URL - 有没有人有任何想法?

I can't find any information about whether ColdFusion supports unicode in the URL, or how I can go about resolving this and getting hold of the complete URL in some way - does anyone have any ideas?

干杯,

Tom

编辑:进一步的研究让我相信这个问题可能与IIS而不是ColdFusion有关,但我的原始查询仍然存在。

Edit: Further research has lead me to believe the issue may related to IIS rather than ColdFusion, but my original query still stands.

进一步编辑

GetPageContext()。GetRequest()。GetRequestUrl()。ToString()的结果 http://www.example.com/search.cfm/searchterm/ ????? 所以问题看起来相当深。

The result of GetPageContext().GetRequest().GetRequestUrl().ToString() is http://www.example.com/search.cfm/searchterm/????? so it appears the issue goes fairly deep.

推荐答案

是的,这不是ColdFusion的错。这是一个常见的问题。

Yeah, it's not really ColdFusion's fault. It's a common problem.

这主要是原始CGI规范的错误,它指定 PATH_INFO 必须是%-decoded,因此丢失了原始的%xx 字节序列,这些字节序列可以让你找出真正意义上的字符。

It's mostly the fault of the original CGI specification, which specifies that PATH_INFO has to be %-decoded, thus losing the original %xx byte sequences that would have allowed you to work out which real characters were meant.

这部分是IIS的错,因为它总是试图在路径部分读取提交的%xx 字节为UTF-8编码的Unicode(除非路径是'n'这是一个有效的UTF-8字节序列,在这种情况下,它会为Windows默认代码页填充,但是没有办法发现这已经发生了)。完成后,它将它作为Unicode字符串放入环境变量中(因为envvars是Windows下的Unicode)。

And it's partly IIS's fault, because it always tries to read submitted %xx bytes in the path part as UTF-8-encoded Unicode (unless the path isn't a valid UTF-8 byte sequence in which case it plumps for the Windows default code page, but gives you no way to find out this has happened). Having done so, it puts it in environment variables as a Unicode string (as envvars are Unicode under Windows).

但是大多数使用C stdio的基于字节的工具(和我假设这适用于ColdFusion,就像在Perl,Python 2,PHP等下一样。)然后尝试将环境变量读取为字节,并且MS C运行时使用Windows默认代码页再次对Unicode内容进行编码。因此,任何不适合默认代码页的字符都会丢失。这将包括在西方Windows安装上运行时的阿拉伯字符。

However most byte-based tools using the C stdio (and I'm assuming this applies to ColdFusion, as it does under Perl, Python 2, PHP etc.) then try to read the environment variables as bytes, and the MS C runtime encodes the Unicode contents again using the Windows default code page. So any characters that don't fit in the default code page are lost for good. This would include your Arabic characters when running on a Western Windows install.

一个聪明的脚本,可以直接访问Win32 GetEnvironmentVariableW API可以调用它来检索本机Unicode环境变量,然后它们可以编码为UTF-8或其他任何他们想要的东西,假设输入也是UTF-8(这是你今天通常想要的) 。但是,我不认为CodeFusion会为您提供此访问权限,并且无论如何它只能从IIS6开始工作; IIS5.x会在它们到达环境变量之前丢弃任何非默认代码页字符。

A clever script that has direct access to the Win32 GetEnvironmentVariableW API could call that to retrieve a native-Unicode environment variable which they could then encode to UTF-8 or whatever else they wanted, assuming that the input was also UTF-8 (which is what you'd generally want today). However, I don't think CodeFusion gives you this access, and in any case it only works from IIS6 onwards; IIS5.x will throw away any non-default-codepage characters before they even reach the environment variables.

否则,最好的选择是URL重写。如果CF以上的图层可以将 search.cfm /القاهرة转换为 search.cfm /?q =القاهرة那么你不要遇到同样的问题,因为 QUERY_STRING 变量与 PATH_INFO 不同,未指定为%-decoded ,所以%xx 字节仍保留在CF级别的工具可以看到的位置。

Otherwise, your best bet is URL-rewriting. If a layer above CF can convert that search.cfm/القاهرة to search.cfm/?q=القاهرة then you don't face the same problem, as the QUERY_STRING variable, unlike PATH_INFO, is not specified to be %-decoded, so the %xx bytes remain where a tool at CF's level can see them.

这篇关于在ColdFusion中使用带有cgi.PATH_INFO的URL中的unicode时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆