如何在Javascript中检查Unicode字符串的相等性? [英] How do I check equality of Unicode strings in Javascript?

查看:102
本文介绍了如何在Javascript中检查Unicode字符串的相等性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Javascript中有两个字符串:_strange_chars_μö¬é@zendesk.com.eml f1 )和_strange_chars_μö¬é@zendesk.com.eml f2 )。乍一看,它们看起来完全相同(事实上,在StackOverflow上,它们可能是;我不确定当它们被粘贴到这样的形式时会发生什么。)但是,在我的应用程序中,

I have two strings in Javascript: "_strange_chars_µö¬é@zendesk.com.eml" (f1) and "_strange_chars_µö¬é@zendesk.com.eml" (f2). At first glance, they look identical (and, indeed, on StackOverflow, they may be; I'm not sure what happens when they are pasted into a form like this.) In my application, however,

f1[16] // ö
f2[16] // o
f1[17] // ¬
f2[17] // ̈

f1 使用ö字符, f2 使用 o 和diacritic ¨作为一个单独的角色。我可以做什么比较将这两个字符串显示为相等?

That is, where f1 uses the ö character, f2 uses an o and a diacritic ¨ as a separate character. What comparison can I do that will show these two strings to be "equal"?

推荐答案


f1 使用ö字符, f2 使用o和变音符号作为单独的字符。

f1 uses the ö character, f2 uses an o and a diacritic ¨ as a separate character.

f1 位于普通形式 C(组成)和 f2 。通常,Normal Form C是Windows和Web上最常见的,Unicode FAQ将其描述为一般文本的最佳形式。不幸的是,Apple世界为普通形式D而无足轻重。

f1 is in Normal Form C (composed) and f2 in Normal Form D (decomposed). In general Normal Form C is the most common on Windows and the web, with the Unicode FAQ describing it as "the best form for general text". Unfortunately the Apple world plumped for Normal Form D in order to be gratuitously different.

字符串在规范上等同于 Unicode等价

The strings are canonically equivalent by the rules of Unicode equivalence.


我可以做什么比较将这两个字符串显示为相等?

What comparison can I do that will show these two strings to be "equal"?

通常,您将两个字符串转换为您选择的一个Normal Form然后比较它们。例如在Python中:

In general, you convert both strings to one Normal Form of your choosing and then compare them. For example in Python:

>>> import unicodedata
>>> a= u'\u00F6'  # ö composed
>>> b= u'o\u0308' # o then combining umlaut
>>> unicodedata.normalize('NFC', a)==unicodedata.normalize('NFC', b)
True

类似地,Java有 Normalizer 类,.NET有 String.Normalize ,并且可能有语言ICU库提供的绑定也提供此功能。

Similarly Java has the Normalizer class, .NET has String.Normalize, and may languages have bindings available to the ICU library which also offers this feature.

不幸的是,JavaScript没有本机Unicode规范化功能。这意味着:

Unfortunately, JavaScript has no native Unicode normalisation ability. This means either:


  • 自己动手做事,利用大型Unicode数据表来覆盖所有JavaScript(参见例如这里示例实现);或

将其发送回服务器端(例如通过XMLHttpRequest),在那里你可以使用装备更好的语言来完成它。

sending it back to the server-side (eg via XMLHttpRequest), where you've got a better-equipped language to do it.

这篇关于如何在Javascript中检查Unicode字符串的相等性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆