Wordpress/Apache - 图像文件名中的 unicode 字符出现 404 错误 [英] Wordpress/Apache - 404 error with unicode characters in image filenames

查看:40
本文介绍了Wordpress/Apache - 图像文件名中的 unicode 字符出现 404 错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们最近将一个网站移到了新服务器上,但遇到了一个奇怪的问题,即某些上传的文件名中包含 unicode 字符的图像给我们带来了 404 错误.

We've recently moved a website to a new server, and are running into an odd issue where some uploaded images with unicode characters in the filename are giving us a 404 error.

通过ssh/FTP,我们可以看到文件肯定在那里.

Via ssh/FTP, we can see that the files are definitely there.

例如:

http://sjofasting.no/project/adnoy

所有图像均无效:

代码:

<img class='image-display' title='' src='http://sjofasting.no/wp/wp-content/uploads/2012/03/ådnøy_1_2.jpg' width='685' height='484'/>

SSH:

-rw-r--r-- 1 xxxxxxxx xxxxxxxx 836813 八月 3 日 16:12 ådnøy_1_2.jpg

-rw-r--r-- 1 xxxxxxxx xxxxxxxx 836813 Aug 3 16:12 ådnøy_1_2.jpg

同样奇怪的是,如果您导航到目录,您甚至可以单击图像并且它可以工作:

What is also strange is that if you navigate to the directory you can even click on the image and it works:

http://sjofasting.no/wp/wp-content/uploads/2012/03/

点击ådnøy_1_2.jpg"就可以了.

click on 'ådnøy_1_2.jpg' and it works.

不知何故 wordpress 正在生成

Somehow wordpress is generating

http://sjofasting.no/wp/wp-content/uploads/2012/03/ådnøy_1_2.jpg

http://sjofasting.no/wp/wp-content/uploads/2012/03/ådnøy_1_2.jpg

并从直接文件夹浏览复制生成

and copying from the direct folder browse is generating

http://sjofasting.no/wp/wp-content/uploads/2012/03/a%CC%8Adn%C3%B8y_1_2.jpg

这是怎么回事??

如果我从 wordpress 源复制图像 url,我得到:

If I copy the image url from the wordpress source I get:

http://sjofasting.no/wp/wp-content/uploads/2011/11/Bore-Strand-Hotellg%C3%A5rd-12.jpg

从 apache 浏览器复制时,我得到:

When copied from the apache browser I get:

http://sjofasting.no/wp/wp-content/uploads/2011/11/Bore-Strand-Hotellga%cc%8ard-12.jpg

造成这种差异的原因是:%C3%A5 和 %cc%8

What could account for this discrepancy between: %C3%A5 and %cc%8

??

推荐答案

Unicode 规范化.

Unicode normalisation.

0xC3 0xA5 是 U+00E5 a-with-ring 的 UTF-8 编码.

0xC3 0xA5 is the UTF-8 encoding for U+00E5 a-with-ring.

0xCC 0x8A 是 U+030A 组合环的 UTF-8 编码.

0xCC 0x8A is the UTF-8 encoding for U+030A combining ring.

U+0035 是写 a 环的组合方式(标准 C 型);a 字母后跟 U+030A 是分解(正常形式 D)的书写方式.åå - 它们看起来应该相同,但它们可能会因字体渲染而略有不同.

U+0035 is the composed (Normal Form C) way of writing an a-ring; an a letter followed by U+030A is the decomposed (Normal Form D) way of writing it. å vs å - they should look the same, though they may differ slightly depending on font rendering.

现在通常情况下,您拥有哪一个并不重要,因为合理的文件系统不会影响它们.如果您保存一个名为 [char U+00E5].txt (å.txt) 的文件,它在 Windows 和 Linux 下仍保持该名称.

Now normally it doesn't really matter which one you've got because sensible filesystems leave them untouched. If you save a file called [char U+00E5].txt (å.txt), it stays called that under Windows and Linux.

另一方面,Mac 很疯狂.文件系统更喜欢范式 D,因为您传递给它的任何组合字符都会被转换为分解字符.如果你把一个名为 [char U+00E5].txt 的文件放入并立即列出目录,你会发现你实际上有一个名为 a[char U+030A] 的文件.txt.您仍然可以在 Mac 上以 [char U+00E5].txt 的形式访问该文件,因为在查找之前它也会将该输入转换为 Normal Form D,但是您无法在字符序列术语中恢复与您输入的文件名相同的文件名:这是一种有损转换.

Macs, on the other hand, are insane. The filesystem prefers Normal Form D, to the extent that any composed characters you pass into it get converted into decomposed ones. If you put a file in called [char U+00E5].txt and immediately list the directory, you'll find you've actually got a file called a[char U+030A].txt. You can still access the file as [char U+00E5].txt on a Mac because it'll convert that input into Normal Form D too before looking it up, but you cannot recover the same filename in character sequence terms as you put in: it's a lossy conversion.

因此,如果您将文件保存在 Mac 上,然后传输到 [char U+00E5].txta[char U+030A].txt 的文件系统> 引用不同的文件,你会得到断开的链接.

So if you save your files on a Mac and then transfer to a filesystem where [char U+00E5].txt and a[char U+030A].txt refer to different files, you will get broken links.

更新页面以指向 URL 的范式 D 版本,或从不会严重破坏 Unicode 字符的文件系统重新上传文件.

Update the pages to point to the Normal Form D versions of the URLs, or re-upload the files from a filesystem that doesn't egregiously mangle Unicode characters.

思维不同,导致奇怪的互操作性问题.

Think Different, Cause Bizarre Interoperability Problems.

这篇关于Wordpress/Apache - 图像文件名中的 unicode 字符出现 404 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆