URL 中的非 ASCII 字符 [英] Non-ascii characters in URL

查看:35
本文介绍了URL 中的非 ASCII 字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个我以前从未见过的新问题:我的客户正在将文件添加到我们构建的项目中,并且其中一些文件名中包含特殊字符,因为其中一些单词是西班牙语.

I ran into a new problem that I've never seen before: My client is adding files to a project we built and some of the filenames have special characters in them because some of the words are spanish.

例如,我正在测试的文件中有一个 á.我在 css 文件中将该图像称为背景图像,但在 Safari 中它不显示.但它适用于 FF 和 Chrome.

For example a file I'm testing has an á in it. I am calling that image in a css file as a background image but in Safari it doesnt show up. But it does on FF and Chrome.

作为测试,我将链接粘贴到浏览器中,同样的事情.适用于 FF 和 Chrome,但 Safari 会引发错误.所以我猜语言字符正在抛出它?

As a test I pasted the link into the browser and the same thing. Works on FF and Chrome but Safari throws an error. So the language characters are throwing it I guess?

Firefox 转换以下 url 并将 á 更改为 a%CC%81 并加载图像.

Firefox converts the following url and changes the á to a%CC%81 and loads the image.

http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Clássico_foto-Henrique-Peron-470x120-1371827671.jpg

http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Clássico_foto-Henrique-Peron-470x120-1371827671.jpg

你可以看到它在上面中断了...但 FF 和 Chrome 将其转换为:http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Cla%CC%81ssico_foto-Henrique-Peron-470x120-1371827671.jpg

You can see it breaks above... but FF and Chrome convert that to: http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Cla%CC%81ssico_foto-Henrique-Peron-470x120-1371827671.jpg

你也可以在这里看到这个:http://jsfiddle.net/Md4gZ/2/

You can also see this in action here: http://jsfiddle.net/Md4gZ/2/

.testbox {宽度:340px;高度:100px;背景:url('http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Clássico_foto-Henrique-Peron-470x120-1371827671.jpg')无重复左上角;}

那么处理这个问题的正确方法是什么.我正在用 PHP 和 WORDPRESS 进行开发.我宁愿不必告诉客户端返回并用特殊字符替换所有文件.

So whats the right way to handle this. I'm developing in PHP and WORDPRESS. I'd rather not have to tell the client to go back and replace all files with special characters.

感谢任何帮助.谢谢!

推荐答案

我相信正在成为标准的是将非 ascii 字符转换为 UTF-8 字节序列,并将这些序列作为 %HH 十六进制代码包含在 URL 中.á 字符是 U+00E1 (Unicode),它在 UTF-8 中使两个字节 0xC3 0xA1.因此,Clássico 将变成 Cl%C3%A1ssico.

I believe what is becoming the standard is to convert non-ascii characters to UTF-8 byte sequences, and include those sequences as %HH hex codes in the URL. The á character is U+00E1 (Unicode), which in UTF-8 makes the two bytes 0xC3 0xA1. Hence, Clássico would become Cl%C3%A1ssico.

您从 Firefox 报告的转换,Cla%CC%81ssico,其做法略有不同:它将 á 更改为后跟 U+0301(组合 ACUTE ACCENT 字符).在 UTF-8 中,U+0301 使 0xCC 0x81.

The conversion you report from Firefox, Cla%CC%81ssico, did this slightly differently: it changed the á into a followed by U+0301, the COMBINING ACUTE ACCENT character. In UTF-8, U+0301 makes 0xCC 0x81.

你应该选择哪种表示方式——unicodeá"或a后跟组合重音"——取决于网络服务器需要什么来匹配正确的东西.在您的情况下,也许文件名实际上包含组合字符重音,这就是它起作用的原因(很难说).

Which representation you should choose – unicode "á" or "a followed by combining accent" – depends on what the web server needs for matching the right thing. In your case, maybe the filename actually contains the combining-character accent, and that's why it worked (hard to tell).

另一种处理非 ascii 拉丁字符的旧方法是使用 8 位拉丁字符集表示(ISO-8859-1 或类似的东西,例如 Windows-1252)并将其编码为一个字节.这将使 Clássico 变成 Cl%E1ssico.但由于这仅适用于拉丁字符集,并且对于其中的某些字符不明确,因此它有望并且可能会消失.

Another, older, way to handle non-ascii latin characters is to use an 8-bit latin charset representation (ISO-8859-1 or something similar, such as Windows-1252) and encode that as one byte. That would make Clássico into Cl%E1ssico. But since this only works for latin charsets, and is ambiguous for some of their characters, it is hopefully and probably disappearing.

这篇关于URL 中的非 ASCII 字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆