如何在Javascript中检查Unicode字符串的相等性? [英] How do I check equality of Unicode strings in Javascript?
问题描述
我在Javascript中有两个字符串:_strange_chars_μö¬é@zendesk.com.eml
( f1
)和_strange_chars_μö¬é@zendesk.com.eml
( f2
)。乍一看,它们看起来完全相同(事实上,在StackOverflow上,它们可能是;我不确定当它们被粘贴到这样的形式时会发生什么。)但是,在我的应用程序中,
I have two strings in Javascript: "_strange_chars_µö¬é@zendesk.com.eml"
(f1
) and "_strange_chars_µö¬é@zendesk.com.eml"
(f2
). At first glance, they look identical (and, indeed, on StackOverflow, they may be; I'm not sure what happens when they are pasted into a form like this.) In my application, however,
f1[16] // ö
f2[16] // o
f1[17] // ¬
f2[17] // ̈
即 f1
使用ö字符, f2
使用 o 和diacritic ¨作为一个单独的角色。我可以做什么比较将这两个字符串显示为相等?
That is, where f1
uses the ö character, f2
uses an o and a diacritic ¨ as a separate character. What comparison can I do that will show these two strings to be "equal"?
推荐答案
f1
使用ö字符,f2
使用o和变音符号作为单独的字符。
f1
uses the ö character,f2
uses an o and a diacritic ¨ as a separate character.
f1
位于普通形式 C(组成)和 f2
。通常,Normal Form C是Windows和Web上最常见的,Unicode FAQ将其描述为一般文本的最佳形式。不幸的是,Apple世界为普通形式D而无足轻重。
f1
is in Normal Form C (composed) and f2
in Normal Form D (decomposed). In general Normal Form C is the most common on Windows and the web, with the Unicode FAQ describing it as "the best form for general text". Unfortunately the Apple world plumped for Normal Form D in order to be gratuitously different.
字符串在规范上等同于 Unicode等价。
The strings are canonically equivalent by the rules of Unicode equivalence.
我可以做什么比较将这两个字符串显示为相等?
What comparison can I do that will show these two strings to be "equal"?
通常,您将两个字符串转换为您选择的一个Normal Form然后比较它们。例如在Python中:
In general, you convert both strings to one Normal Form of your choosing and then compare them. For example in Python:
>>> import unicodedata
>>> a= u'\u00F6' # ö composed
>>> b= u'o\u0308' # o then combining umlaut
>>> unicodedata.normalize('NFC', a)==unicodedata.normalize('NFC', b)
True
类似地,Java有 Normalizer
类,.NET有 String.Normalize
,并且可能有语言ICU库提供的绑定也提供此功能。
Similarly Java has the Normalizer
class, .NET has String.Normalize
, and may languages have bindings available to the ICU library which also offers this feature.
不幸的是,JavaScript没有本机Unicode规范化功能。这意味着:
Unfortunately, JavaScript has no native Unicode normalisation ability. This means either:
-
自己动手做事,利用大型Unicode数据表来覆盖所有JavaScript(参见例如这里示例实现);或
将其发送回服务器端(例如通过XMLHttpRequest),在那里你可以使用装备更好的语言来完成它。
sending it back to the server-side (eg via XMLHttpRequest), where you've got a better-equipped language to do it.
这篇关于如何在Javascript中检查Unicode字符串的相等性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!