ToUpperInvariant() – MSDN 的建议是否错误? [英] ToUpperInvariant() – is MSDN wrong on its recommendation?

查看:41
本文介绍了ToUpperInvariant() – MSDN 的建议是否错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 .NET 框架,StringComparison OrdinalIgnoreCase 推荐用于不区分大小写的文件路径.(我们称之为声明 A.)

In Best Practices for Using Strings in the .NET Framework, StringComparison OrdinalIgnoreCase is recommended for case-insensitive file paths. (Let's call it Statement A.)

我同意这一点,因为我可以在同一目录中创建两个文件:

I can agree with that, because I can create two files in the same directory:

é.txt
é.txt

它们的文件名不一样,第二个是由e和修饰符组成的,所以它实际上有两个字母.(您可以尝试使用复制粘贴.)

Their filenames are not the same, second one is composed from e and modifier, so it actually has two letters. (You can try yourself using copy-paste.)

如果存在不变文化比较(而不是序数比较),NTFS 将不允许这些文件,因为在他们解释的同一篇文章中,在不变文化中 a + ̊ = å

If there was Invariant culture comparison (and not ordinal comparison) in effect, NTFS wouldn't allow these files, because in the same article they explain, that in invariant culture a + ̊ = å

但在关于 String.ToUpperInvariant() 有不同的建议:(语句 B.)

But in article on String.ToUpperInvariant() there is different recommendation: (Statement B.)

如果您需要小写或大写版本的操作系统标识符,例如文件名、命名管道或注册表项,请使用 ToLowerInvariant 或 ToUpperInvariant 方法.

If you need the lowercase or uppercase version of an operating system identifier, such as a file name, named pipe, or registry key, use the ToLowerInvariant or ToUpperInvariant methods.

我需要创建文件路径集合(实际上是HashSet)来检测重复项.所以如果我在创建地图时遵守语句 B,我可能会以误报结束,因为上述文件名 é.txté.txt 将被视为一个.我是否正确理解 MSDN 中的语句 B 具有误导性?还是我遗漏了什么?

I need to create file path collection (in fact HashSet) to detect duplicates. So if I will obey statement B when creating the map, I could end with false positives, because abovementioned filenames é.txt and é.txt will be considered as one. Am I understanding it correctly that statement B found in MSDN is misleading? Or am I missing something?

我将要构建一个库,最好从一开始就没有已知的错误,所以我不想忽视这一点.

I'm about to build a library, preferably without known bugs from start, so I simply don't want to neglect this.

更新:

语句 B 似乎还有一个问题:实际上不能使用 ToLowerInvariant().原因(我引用了最佳实践文章):DO:在规范化字符串进行比较时使用 ToUpperInvariant 而不是 ToLowerInvariant. 实际原因:有一小部分字符不往返,并且小写将使这些字符不可用. (来源)

Statement B seems to have one more issue: ToLowerInvariant() cannot be actually used. Reason (I quote Best practices article): DO: Use ToUpperInvariant rather than ToLowerInvariant when normalizing strings for comparison. Actual reason: There is a small range of characters that do not roundtrip, and going to lowercase will make these characters unavailable. (source)

推荐答案

当您想不区分大小写地比较字符串是否相等时,大写和小写都不正确.在这两种变体中,都有一些字符会搞砸.

Neither uppercasing nor lowercasing is correct when you want to compare strings for equality case-insensitively. In both variants there are characters that mess this up.

不区分大小写比较字符串的正确方法是使用不敏感的 StringComparison 选项之一(您知道的).

The correct way to compare strings case-insensitively is to use one of the insensitive StringComparison options (you know that).

不区分大小写地使用数据结构的正确方法是使用 StringComparer.*IgnoreCase 之一.例如:

The right way to use a data structure case-insensitively is to use one of StringComparer.*IgnoreCase. For example:

new HashSet<string>(StringComparer.InvariantCultureIgnoreCase)

在将大写字符串添加到数据结构之前不要.我会在任何代码审查中失败.

Do not uppercase strings before adding them to a data structure. I would fail that in any code review.

如果您需要操作系统标识符的小写或大写版本

If you need the lowercase or uppercase version of an operating system identifier

你不需要这样的东西.本声明不适用于您的案例.

You do not need such as thing. This statement does not apply to your case.

这篇关于ToUpperInvariant() – MSDN 的建议是否错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆