文件名中不允许出现什么字符序列? [英] What character sequence should I not allow in a filename?

查看:15
本文介绍了文件名中不允许出现什么字符序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在测试后发现 linux 允许文件名中的任何字符,除了 / 和 null ().那么我不应该在文件名中允许什么顺序?我听说领先的 - 可能会混淆一些命令行程序,这对我来说无关紧要,但是如果其他人决定收集一堆文件并使用一些 GNU 程序对其进行过滤,则可能会打扰其他人.

I found out after testing that linux allows any character in a file name except for / and null (). So what sequence should I not allow in a filename? I heard a leading - may confuse some command line programs, which doesn't matter to me, however it may bother other people if they decide to collect a bunch of files and filter it with some GNU programs.

有人建议我删除前导和尾随空格,我打算这样做只是因为通常用户并不想使用前导/尾随空格.

It was suggested to me to remove leading and trailing spaces and I plan to only because typically the user doesn't mean to have leading/trailing space.

可能存在哪些有问题的序列,我应该考虑不允许哪些序列?为了方便,我也在考虑不允许在 Windows 中使用非法字符.我想我可能不允许开头有破折号(破折号是合法的窗口字符)

What problematic sequence might there be and what sequence should I consider not allowing? I am also considering not allowing characters illegal in windows just for convenience. I think I may not allow dashes at the beginning (dash is a legal window character)

推荐答案

您的问题有点令人困惑,因为您详细讨论了 Linux,但是在对另一个答案的评论中,您说您正在生成文件名供人们下载,这大概意味着您完全无法控制将存储文件的文件系统和操作系统,从而使 Linux 完全无关紧要.

Your question is somewhat confusing since you talk at length about Linux, but then in a comment to another answer you say that you are generating filenames for people to download, which presumably means that you have absolutely no control whatsoever over the filesystem and operating system that the files will be stored on, making Linux completely irrelevant.

为了回答这个问题,我假设你的问题是错误的,你的评论是正确的.

For the purpose of this answer I'm going to assume that your question is wrong and your comment is correct.

当今使用的绝大多数操作系统和文件系统大致可分为三类:POSIX、Windows 和 MacOS.

The vast majority of operating systems and filesystems in use today fall roughly into three categories: POSIX, Windows and MacOS.

POSIX 规范非常清楚保证在所有 POSIX 系统中可移植的文件名是什么样的.您可以使用的字符在第 3.276 节(可移植文件名字符集)中定义开放组基本规范为:

The POSIX specification is very clear on what a filename that is guaranteed to be portable across all POSIX systems looks like. The characters that you can use are defined in Section 3.276 (Portable Filename Character Set) of the Open Group Base Specification as:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789._-

您可以依赖的最大文件名长度定义在 第 13.23.3.5 节( 最小值)14.(相关常量是_POSIX_NAME_MAX.)

The maximum filename length that you can rely on is defined in Section 13.23.3.5 (<limits.h> Minimum Values) as 14. (The relevant constant is _POSIX_NAME_MAX.)

因此,最多 14 个字符长且仅包含上面列出的 65 个字符的文件名可以安全地用于所有 POSIX 兼容系统,它为您提供 24407335764928225040435790 种组合(或大约 84 位).

So, a filename which is up to 14 characters long and contains only the 65 characters listed above, is safe to use on all POSIX compliant systems, which gives you 24407335764928225040435790 combinations (or roughly 84 bits).

如果你不想惹恼你的用户,你应该再添加两个限制:不要以破折号或点开头的文件名.以点开头的文件名通常被解释为隐藏"文件,除非明确要求,否则不会显示在目录列表中.以破折号开头的文件名可能会被许多命令解释为一个选项.(旁注:令人惊讶的是,有多少用户不知道 rm ./-rfrm -- -rf 技巧.)

If you don't want to annoy your users, you should add two more restrictions: don't start the filename with a dash or a dot. Filenames starting with a dot are customarily interpreted as "hidden" files and are not displayed in directory listings unless explicitly requested. And filenames starting with a dash may be interpreted as an option by many commands. (Sidenote: it is amazing how many users don't know about the rm ./-rf or rm -- -rf tricks.)

这让您有 23656340818315048885345458 种组合(仍然是 84 位).

This leaves you at 23656340818315048885345458 combinations (still 84 bits).

Windows 为此添加了一些新限制:文件名不能以点结尾,并且文件名不区分大小写.这将字符集从 65 个字符减少到 39 个字符(第一个字符为 37 个,最后一个字符为 38 个字符).它没有添加任何长度限制,Windows 可以处理 14 个字符.

Windows adds a couple of new restrictions to this: filenames cannot end with a dot and filenames are case-insensitive. This reduces the character set from 65 to 39 characters (37 for the first, 38 for the last character). It doesn't add any length restrictions, Windows can deal with 14 characters just fine.

这将可能的组合减少到 17866587696996781449603(73 位).

This reduces the possible combinations to 17866587696996781449603 (73 bits).

另一个限制是 Windows 将最后一个点之后的所有内容视为文件扩展名,它表示文件的类型.如果您想避免潜在的混淆(例如,如果您为文本文件生成一个类似 abc.mp3 的文件名),您应该完全避免使用点.

Another restriction is that Windows treats everything after the last dot as a filename extension which denotes the type of the file. If you want to avoid potential confusion (say, if you generate a filename like abc.mp3 for a text file), you should avoid dots altogether.

您还有 13090925539866773438463 种组合(73 位).

You still have 13090925539866773438463 combinations (73 bits).

如果您不得不担心 DOS,则适用额外的限制:文件名由一或两部分组成(用点分隔),其中两部分都不能包含点.第一部分的最大长度为 8,第二部分为 3 个字符.同样,第二部分通常保留用于指示文件类型,只剩下 8 个字符.

If you have to worry about DOS, then additional restrictions apply: the filename consists of one or two parts (seperated by a dot), where neither of the two parts can contain a dot. The first part has a maximum length of 8, the second of 3 characters. Again, the second part is usually reserved to indicate the file type, which leaves you only 8 characters.

现在您有 4347792138495 个可能的文件名或 41 位.

Now you have 4347792138495 possible filenames or 41 bits.

好消息是,您可以使用 3 个字符的扩展名来实际正确指示文件类型,而不会违反 POSIX 文件名限制 (8+3+1 = 12 < 14).

The good news is that you can use the 3 character extension to actually correctly indicate the file type, without breaking the POSIX filename limit (8+3+1 = 12 < 14).

如果您希望您的用户能够将文件刻录到 ISO9660 级别 1 格式的 CD-R 上,那么您必须禁止在任何地方使用连字符,而不仅仅是作为第一个字符.现在,剩余的字符集看起来像

If you want your users to be able to burn the files onto a CD-R formatted with ISO9660 Level 1, then you have to disallow hyphen anywhere, not just as the first character. Now, the remaining character set looks like

ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789_

它为您提供了 3512479453921 个组合(41 位).

which gives you 3512479453921 combinations (41 bits).

这篇关于文件名中不允许出现什么字符序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆