FAT-32上的Unicode文件名? [英] Unicode filenames on FAT-32?

查看:142
本文介绍了FAT-32上的Unicode文件名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我了解-NTFS支持Unicode文件名(如Micorsoft声称的是UTF-16吗?).

As far as I understand - NTFS supports Unicode filenames (UTF-16 as Micorsoft claims?).

但是,关于使用哪种代码页在FAT-32上存储文件名(文件路径),MSDN官方文档非常含糊.

But official MSDN documentation is very vague regarding what codepage(s) is used to store filenames (filepaths) on FAT-32.

此处显示 OEM代码页(我假设为CP437)用于存储文件名:

Here it says that OEM code page (CP437 I assume) is used to store filenames: http://msdn.microsoft.com/en-us/library/windows/desktop/dd317748.aspx

但是,事实证明这里可以有不同的 OEM代码页,其中CP437是其中之一:

But here it turns out that there can be different OEM codepages with CP437 being one of them: http://msdn.microsoft.com/en-us/library/windows/desktop/dd317752.aspx

我们现在都知道,像 mount 这样的实用程序支持FAT的更多不同代码页,而不仅仅是OEM代码集.

And we all now that utilities like mount support many more different codepages for FAT, more than just OEM codepages set.

那么FAT-32文件名的实际cdepage是多少?这取决于创建FAT卷时的系统代码页吗? FAT可以支持真正的双字节字符集代码页(如UTF-16)吗?还是像UTF-8这样的多字节字符集代码页是限制?

So what is the actual cdepage for FAT-32 filenames? It depends on the system codepage at the time when FAT volume was created? Can FAT support true Double Byte Character Set codepages like UTF-16? Or Multi Byte Character Set codepages like UTF-8 is the limit?

还有更具体的问题: 当我使用CreateFileW函数(如MSDN所述,使用UTF-16作为文件名代码页)在FAT-32卷上创建文件时会发生什么?

And more specific question: What happens when I use CreateFileW function (which, as MSDN states, use UTF-16 as filename codepage) to create a file on FAT-32 volume?

推荐答案

您可能必须在这里进行实验.这是一个很好的问题,我不是100%自信,但是:

You might have to experiment here. This is a great question, and I'm not 100% confident, but:

那么FAT-32文件名的实际代码页是什么?这取决于创建FAT卷时的系统代码页?

So what is the actual codepage for FAT-32 filenames? It depends on the system codepage at the time when FAT volume was created?

"OEM代码页",与系统无关.

The "OEM codepage", whatever that is for the system.

FAT是否可以支持真正的双字节字符集代码页(如UTF-16)?还是像UTF-8这样的多字节字符集代码页是限制?

Can FAT support true Double Byte Character Set codepages like UTF-16? Or Multi Byte Character Set codepages like UTF-8 is the limit?

不,我不认为FAT可以直接支持UTF-16或UTF-8.也就是说,Microsoft以带外方法存储Unicode文件名.因此,文件具有两个文件名. (这也是您可以使用超过8.3个字符的文件名的方式.)

No, I don't believe FAT is directly capable of either UTF-16 or UTF-8. That said, Microsoft stores the Unicode filename in an out of band method. A file thus has two filenames. (This is how you can have longer than 8.3 character filenames, as well.)

还有一个更具体的问题:当我使用CreateFileW函数(如MSDN所述,使用UTF-16作为文件名代码页)在FAT-32卷上创建文件时会发生什么?

And more specific question: What happens when I use CreateFileW function (which, as MSDN states, use UTF-16 as filename codepage) to create a file on FAT-32 volume?

传递给CreateFileW的Unicode文件名直接存储在带外文件名中.它被重新编码为OEM代码页(无论系统上是什么),然后放在此处.如果无法将其转换为OEM代码页或超过8.3个字符,则Windows会调用FILENA~1.TXT之类的文件.

The Unicode filename, as passed to CreateFileW is stored directly in the out of band filename. It is re-encoded into the OEM codepage (whatever that happens to be on the system) and is put there. If it cannot be converted into the OEM codepage, or exceeds 8.3 characters, Windows will call the file something like, FILENA~1.TXT.

首先,此页面告诉我们OEM代码页!= Windows代码页:

First, this page tells us that the OEM code page != the Windows code page:

创建FAT文件的非Unicode应用程序有时必须使用标准的C运行时库转换功能在Windows代码页字符集和OEM代码页字符集之间进行转换.使用文件系统功能的Unicode实现,就不必执行此类转换.

Non-Unicode applications that create FAT files sometimes have to use the standard C runtime library conversion functions to translate between the Windows code page character set and the OEM code page character set. With Unicode implementations of the file system functions, it is not necessary to perform such translations.

在典型的美国系统上,OEM代码页为"CP437" ,但Windows代码页为 Windows-1252 (我相信FooA调用使用Windows代码页,通常是美国计算机上的Windows-1252,但取决于语言环境.

On a typical American system, the OEM code page is "CP437", but the Windows code page is Windows-1252 (The FooA calls, I believe, use the Windows code page, typically Windows-1252 on an American machine, but depends on locale).

如果有可用的FAT卷,则可以看到它的作用. Windows-1252中不存在字符Σ"(U + 03a3),但CP437中存在该字符.您可以使用dir /X看到短文件名和长文件名.使用名为asdfΣ.txt的文件,您将看到:

If you have a FAT volume available, you can see this in action. The character "Σ" (U+03a3) is not present in Windows-1252, however, it is in CP437. You can see both the short and long filenames with dir /X. With a file named asdfΣ.txt, you'll see:

ASDFΣ.TXT    asdfΣ.txt

但是,使用名为asdfΛ.txt"的文件(CP437或Windows-1252中都不存在Λ),您将看到:

However, with a file named "asdfΛ.txt" (Λ is not present in either CP437 or Windows-1252), you'll see:

ASDF~1.TXT   asdf?.txt

(您可能会看到?,因为cmd.exe的字体无法显示Λ.)

(You'll likely see ?, because cmd.exe's font cannot display a Λ.)

有关长文件名的信息,请参见这篇Wikipedia文章.

For information about long filenames, see this Wikipedia article.

有趣的是,如果您将文件命名为"asdf©.txt",则可能会得到:

Also, interestingly, if you name a file "asdf©.txt", you might get:

ASDFC.TXT    asdfc.txt

…我在这里不确定100%,但是我认为Windows明智地决定用"c"代替©,并同样地显示了它.如果将字体更改为不基于栅格的字体(例如Consolas),则会看到:

… I'm not 100% sure here, but I think Windows cleverly decided to substitute "c" for ©, and did likewise for displaying it. If you change the font to something not raster based, like Consolas, you'll see:

ASDFC.TXT    asdf©.txt

这就是为什么您应该使用FooW函数的原因.

And this is why you should use the FooW functions.

这篇关于FAT-32上的Unicode文件名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆