在JavaScript中使用toLowerCase或toUpperCase比较字符串会更好吗? [英] Is it better to compare strings using toLowerCase or toUpperCase in JavaScript?

查看:134
本文介绍了在JavaScript中使用toLowerCase或toUpperCase比较字符串会更好吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在进行代码审查,我很好奇,如果在尝试比较它们时忽略大小写,那么在JavaScript中将字符串转换为大写或小写更好。

I'm going through a code review and I'm curious if it's better to convert strings to upper or lower case in JavaScript when attempting to compare them while ignoring case.

琐碎的例子:

var firstString = "I might be A different CASE";
var secondString = "i might be a different case";
var areStringsEqual = firstString.toLowerCase() === secondString.toLowerCase();

或者我应该这样做:

var firstString = "I might be A different CASE";
var secondString = "i might be a different case";
var areStringsEqual = firstString.toUpperCase() === secondString.toUpperCase();

似乎应该或者只能使用有限的字符集,例如英文字母,所以它一个比另一个更强大吗?

It seems like either "should" or would work with limited character sets like only English letters, so it one more robust than the other?

作为一个注释,MSDN建议将字符串规范化为大写,但这适用于托管代码(可能是C#和F#,但它们有奇特的StringComparers和基础库): http://msdn.microsoft.com/en-us/ library / bb386042.aspx

As a note, MSDN recommends normalizing strings to uppercase, but that is for managed code (presumably C# & F# but they have fancy StringComparers and base libraries): http://msdn.microsoft.com/en-us/library/bb386042.aspx

推荐答案

修改后的答案



当我回答这个问题时已经有一段时间了。虽然文化问题仍然存在(我认为它们不会消失),但 ECMA-402 标准让我原来的答案......过时(或过时了?)。

Revised answer

It's been quite a while when I answered this question. While cultural issues still holds true (and I don't think they will ever go away), the development of ECMA-402 standard made my original answer... outdated (or obsolete?).

比较本地化的最佳解决方案字符串似乎是使用函数 localeCompare() ,包含适当的区域设置和选项:

The best solution for comparing localized strings seems to be using function localeCompare() with appropriate locales and options:

var locale = 'en'; // that should be somehow detected and passed on to JS
var firstString = "I might be A different CASE";
var secondString = "i might be a different case";
if (firstString.localeCompare(secondString, locale, {sensitivity: 'accent'}) === 0) {
    // do something when equal
}

这将比较两个字符串不区分大小写,但区分重音(例如±!= a)。

如果由于性能原因这还不够,你可能想要使用
toLocaleUpperCase() toLocaleLowerCase()`将语言环境作为参数传递:

This will compare two strings case-insensitive, but accent-sensitive (for example ą != a).
If this is not sufficient for performance reasons, you may want to use either
toLocaleUpperCase()ortoLocaleLowerCase()` passing the locale as a parameter:

if (firstString.toLocaleUpperCase(locale) === secondString.toLocaleUpperCase(locale)) {
    // do something when equal
}

理论上应该没有差异。在实践中,细微的实现细节(或在给定的浏览器中缺乏实现)可能会产生不同的结果...

In theory there should be no differences. In practice, subtle implementation details (or lack of implementation in the given browser) may yield different results...

我不确定你是否真的打算在国际化(i18n)标签中提出这个问题,但是你做了...

可能最意想不到的答案是:既不

I am not sure if you really meant to ask this question in Internationalization (i18n) tag, but since you did...
Probably the most unexpected answer is: neither.

大量问题使用大小写转换,如果你想转换字符大小写而不指示语言(如在JavaScript情况下),则不可避免地会导致功能问题。例如:

There are tons of problems with case conversion, which inevitably leads to functional issues if you want to convert the character case without indicating the language (like in JavaScript case). For instance:


  1. 有许多自然语言没有大写和小写字符的概念。尝试转换它们没有意义(虽然这样可行)。

  2. 转换字符串有特定于语言的规则。德语 sharp S 字符(ß)必然会被转换为两个大写字母S字母(SS)。

  3. 土耳其语和阿塞拜疆语(如果你愿意,还有阿塞拜疆语)有非常奇怪两个i字符的概念:dotlessı(转换为大写I)和点缀i(转换为大写İ< - 此字体不允许正确呈现,但这是真正不同的字形)。

  4. 希腊语有许多奇怪的转换规则。一个特殊的规则是关于大写字母 sigma (Σ),这取决于单词中的一个地方两个小写的对应物:常规西格玛(σ)和最终西格玛(ς)。关于重音字符还有其他转换规则,但在转换函数的实现过程中通常会省略它们。

  5. 某些语言有标题大小写字母,即Lj应转换为类似LJ或更不合适的LJ。同样可以考虑连字

  6. 最后有许多兼容性字符可能与您要比较的内容相同,但要由完全不同的人物组成。更糟糕的是,像ae这样的东西可能相当于德语和芬兰语中的ä,但相当于丹麦语中的æ。

  1. There are many natural languages that don't have concept of upper- and lowercase characters. No point in trying to convert them (although this will work).
  2. There are language specific rules for converting the string. German sharp S character (ß) is bound to be converted into two upper case S letters (SS).
  3. Turkish and Azerbaijani (or Azeri if you prefer) has "very strange" concept of two i characters: dotless ı (which converts to uppercase I) and dotted i (which converts to uppercase İ <- this font does not allow for correct presentation, but this is really different glyph).
  4. Greek language has many "strange" conversion rules. One particular rule regards to uppercase letter sigma (Σ) which depending on a place in a word has two lowercase counterparts: regular sigma (σ) and final sigma (ς). There are also other conversion rules in regard to "accented" characters, but they are commonly omitted during implementation of conversion function.
  5. Some languages has title-case letters, i.e. Lj which should be converted to things like LJ or less appropriately LJ. The same may regard to ligatures.
  6. Finally there are many compatibility characters that may mean the same as what you are trying to compare to, but be composed of completely different characters. To make it worse, things like "ae" may be the equivalent of "ä" in German and Finnish, but equivalent of "æ" in Danish.

我试图说服你,从字面上比较用户输入,而不是转换它真的更好。如果它与用户无关,则可能无关紧要,但案例转换总是需要时间。为什么要这么麻烦?

I am trying to convince you that it is really better to compare user input literally, rather than converting it. If it is not user-related, it probably doesn't matter, but case conversion will always take time. Why bother?

这篇关于在JavaScript中使用toLowerCase或toUpperCase比较字符串会更好吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆