在文件名中不应允许什么字符序列? [英] What character sequence should I not allow in a filename?

查看:126
本文介绍了在文件名中不应允许什么字符序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现在测试后,linux允许文件名中除了 / 和null( \0 )。那么在文件名中不应该允许什么序列?我听说一个领导 - 可能会混淆一些命令行程序,这对我不重要,但它可能会打扰其他人,如果他们决定收集一堆文件和过滤器它与一些GNU程序。



建议我删除前导和尾随空格,我计划只是因为通常用户不意味着有前/后空格。



有什么问题的序列可能有什么序列,我应该考虑不允许?
我也考虑不允许在Windows窗口中的字符非常方便。

解决方案

你的问题有点令人困惑,因为你详细介绍Linux,但是在一个评论中的另一个答案,你说你生成文件名供人们下载,这可能意味着你绝对没有控制文件系统和操作系统,文件将被存储,使得Linux完全不相关。



为了这个答案的目的,我假设你的问题是错误的,你的评论是正确的。



目前使用的绝大多数操作系统和文件系统大致分为三类:POSIX,Windows和MacOS。



POSIX规范非常清楚什么是保证在所有 POSIX系统中可移植的文件名。您可以使用的字符在第3.276节(便携式文件名字符集)中定义打开组基本规范为:

 ABCDEFGHIJKLMNOPQRSTUVWXYZ 
abcdefghijklmnopqrstuvwxyz
0123456789 ._-

可以依赖的最大文件名长度在第13.23.3.5节(< limits.h> ; 最小值) 14 。 (相关常数为 _POSIX_NAME_MAX 。)



因此,文件名长度最多为14个字符,上面列出的65个字符可以安全地用于所有符合POSIX标准的系统,这样可以提供24407335764928225040435790组合(或大约84位)。



如果您不想烦扰你的用户,你应该添加两个限制:不要用短划线或点开始文件名。以点开头的文件名通常被解释为隐藏文件,除非明确请求,否则不会显示在目录列表中。以破折号开头的文件名可能被许多命令解释为一个选项。 (Sidenote:这是惊人的有多少用户不知道 rm ./- rf rm - -rf

这会让你在23656340818315048885345458的组合(仍然是84位)。



几个新的限制到这:文件名不能以点结束,文件名不区分大小写。这将字符集从65减少到39个字符(第一个为37,最后一个字符为38)。



这会将可能的组合减少到17866587696996781449603(73位)。



另一个限制是Windows将最后一个点之后的所有内容作为文件扩展名,它表示文件的类型。如果你想避免潜在的混乱(例如,如果你为一个文本文件生成一个文件名如 abc.mp3 ),你应该避免点。



您仍然有13090925539866773438463个组合(73位)。



如果您必须担心DOS,一个或两个部分(由点分开),其中两个部分都不能包含点。第一部分的最大长度为8,第二部分为3个字符。



现在您有4347792138495个可能的文件名或41位。

好的消息是,您可以使用3个字符的扩展名来实际指示文件类型,而不会破坏POSIX文件名限制(8 + 3 + 1) = 12< 14)。



如果您希望用户能够将文件刻录到ISO9660 Level 1的CD-R格式,连字符在任何地方,不只是作为第一个字符。现在,剩下的字符集看起来像

 ABCDEFGHIJKLMNOPQRSTUVWXYZ 
0123456789 _

,它给你3512479453921个组合(41位)。


I found out after testing that linux allows any character in a file name except for / and null (\0). So what sequence should I not allow in a filename? I heard a leading - may confuse some command line programs, which doesn't matter to me, however it may bother other people if they decide to collect a bunch of files and filter it with some GNU programs.

It was suggested to me to remove leading and trailing spaces and I plan to only because typically the user doesn't mean to have leading/trailing space.

What problematic sequence might there be and what sequence should I consider not allowing? I am also considering not allowing characters illegal in windows just for convenience. I think I may not allow dashes at the beginning (dash is a legal window character)

解决方案

Your question is somewhat confusing since you talk at length about Linux, but then in a comment to another answer you say that you are generating filenames for people to download, which presumably means that you have absolutely no control whatsoever over the filesystem and operating system that the files will be stored on, making Linux completely irrelevant.

For the purpose of this answer I'm going to assume that your question is wrong and your comment is correct.

The vast majority of operating systems and filesystems in use today fall roughly into three categories: POSIX, Windows and MacOS.

The POSIX specification is very clear on what a filename that is guaranteed to be portable across all POSIX systems looks like. The characters that you can use are defined in Section 3.276 (Portable Filename Character Set) of the Open Group Base Specification as:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789._-

The maximum filename length that you can rely on is defined in Section 13.23.3.5 (<limits.h> Minimum Values) as 14. (The relevant constant is _POSIX_NAME_MAX.)

So, a filename which is up to 14 characters long and contains only the 65 characters listed above, is safe to use on all POSIX compliant systems, which gives you 24407335764928225040435790 combinations (or roughly 84 bits).

If you don't want to annoy your users, you should add two more restrictions: don't start the filename with a dash or a dot. Filenames starting with a dot are customarily interpreted as "hidden" files and are not displayed in directory listings unless explicitly requested. And filenames starting with a dash may be interpreted as an option by many commands. (Sidenote: it is amazing how many users don't know about the rm ./-rf or rm -- -rf tricks.)

This leaves you at 23656340818315048885345458 combinations (still 84 bits).

Windows adds a couple of new restrictions to this: filenames cannot end with a dot and filenames are case-insensitive. This reduces the character set from 65 to 39 characters (37 for the first, 38 for the last character). It doesn't add any length restrictions, Windows can deal with 14 characters just fine.

This reduces the possible combinations to 17866587696996781449603 (73 bits).

Another restriction is that Windows treats everything after the last dot as a filename extension which denotes the type of the file. If you want to avoid potential confusion (say, if you generate a filename like abc.mp3 for a text file), you should avoid dots altogether.

You still have 13090925539866773438463 combinations (73 bits).

If you have to worry about DOS, then additional restrictions apply: the filename consists of one or two parts (seperated by a dot), where neither of the two parts can contain a dot. The first part has a maximum length of 8, the second of 3 characters. Again, the second part is usually reserved to indicate the file type, which leaves you only 8 characters.

Now you have 4347792138495 possible filenames or 41 bits.

The good news is that you can use the 3 character extension to actually correctly indicate the file type, without breaking the POSIX filename limit (8+3+1 = 12 < 14).

If you want your users to be able to burn the files onto a CD-R formatted with ISO9660 Level 1, then you have to disallow hyphen anywhere, not just as the first character. Now, the remaining character set looks like

ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789_

which gives you 3512479453921 combinations (41 bits).

这篇关于在文件名中不应允许什么字符序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆