String.normalize() 有什么意义? [英] What's the point of String.normalize()?

查看:62
本文介绍了String.normalize() 有什么意义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在回顾 JavaScript 概念时,发现 String.normalize().这不是 W3Schools JavaScript 字符串参考 (www(dot)w3schools.com/jsref/jsref_obj_string.asp) 中出现的内容,因此可能是我之前错过的原因.

While reviewing JavaScript concepts, found String.normalize(). This is not something that shows up in W3Schools JavaScript String Reference (www(dot)w3schools.com/jsref/jsref_obj_string.asp) and hence the reason I might have missed before.

HackerRank 中找到了有关它的更多信息哪个国家

Found more information about it in HackerRank which states

返回一个包含 Unicode 规范化形式的字符串调用字符串的值.

Returns a string containing the Unicode Normalization Form of the calling string's value.

举个例子

var s = "HackerRank";
console.log(s.normalize());
console.log(s.normalize("NFKC"));

作为输出

HackerRank
HackerRank

此外,在 GeeksForGeeks

string.normalize() 是 javascript 中的一个内置函数,它是用于返回给定输入字符串的 Unicode 规范化形式.

The string.normalize() is an inbuilt function in javascript which is used to return a Unicode normalisation form of a given input string.

举个例子

<script> 
  
  // Taking a string as input. 
  var a = "GeeksForGeeks"; 
    
  // calling normalize function. 
  b = a.normalize('NFC') 
  c = a.normalize('NFD') 
  d = a.normalize('NFKC') 
  e = a.normalize('NFKD') 
    
  // Printing normalised form. 
  document.write(b +"<br>"); 
  document.write(c +"<br>"); 
  document.write(d +"<br>"); 
  document.write(e); 
    
</script> 

作为输出

GeeksForGeeks
GeeksForGeeks
GeeksForGeeks
GeeksForGeeks

也许给出的例子真的很糟糕,因为它们不允许看到任何变化.

Maybe the examples given are just really bad as they don't allow to see any change.

我想知道……这种方法有什么意义?

I wonder... what's the point of this method?

推荐答案

这取决于对字符串的处理方式:通常您不需要它(如果您只是从用户那里获取输入,然后将其提供给用户).但是要检查/搜索/用作密钥/等.对于此类字符串,您可能需要一种独特的方式来识别相同的字符串(从语义上讲).

It depends on what will do with strings: often you do not need it (if you are just getting input from user, and putting it to user). But to check/search/use as key/etc. such strings, you may want a unique way to identify the same string (semantically speaking).

主要问题是您可能有两个语义相同的字符串,但具有两种不同的表示形式:例如一个带有重音字符[一个代码点],一个带有一个字符与重音组合[一个字符代码点,一个用于组合重音].用户可能无法控制输入文本的发送方式,因此您可能有两个不同的用户名或两个不同的密码.但是,如果您处理数据,您可能会得到不同的结果,具体取决于初始字符串.用户不喜欢它.

The main problem is that you may have two strings which are semantically the same, but with two different representations: e.g. one with a accented character [one code point], and one with a character combined with accent [one code point for character, one for combining accent]. User may not be in control on how the input text will be sent, so you may have two different user names, or two different password. But also if you mangle data, you may get different results, depending on initial string. Users do not like it.

另一个问题是关于组合字符的唯一顺序.你可能有一个重音和一个较低的尾音(例如 cedilla):你可以用几种组合来表达:纯字符,尾,重音",纯字符,重音,尾",字符+尾,重音";, 字符 + 口音,小白".

An other problem is about unique order of combining characters. You may have an accent, and a lower tail (e.g. cedilla): you may express this with several combinations: "pure char, tail, accent", "pure char, accent, tail", "char+tail, accent", "char+accent, cedilla".

而且你可能有退化的情况(特别是如果你从键盘输入):你可能得到应该删除的代码点(你可能有一个无限长的字符串,可能相当于几个字节.

And you may have degenerate cases (especially if you type from a keyboard): you may get code points which should be removed (you may have a infinite long string which could be equivalent of few bytes.

在任何情况下,对于字符串排序,您(或您的库)都需要规范化形式:如果您已经提供了权限,则库将不需要再次对其进行转换.

In any case, for sorting strings, you (or your library) requires a normalized form: if you already provide the right, the lib will not need to transform it again.

所以:您希望相同的(从语义上讲)字符串具有相同的 unicode 代码点序列.

So: you want that the same (semantically speaking) string has the same sequence of unicode code points.

注意:如果你直接在 UTF-8 上做,你还应该关心 UTF-8 的特殊情况:相同的代码点可以用不同的方式编写[使用更多字节].这也可能是一个安全问题.

Note: If you are doing directly on UTF-8, you should also care about special cases of UTF-8: same codepoint could be written in different ways [using more bytes]. Also this could be a security problem.

K 通常用于搜索";和类似任务:CO2 和 CO₂ 将以相同的方式解释,但这可能会改变文本的含义,因此它应该经常仅在内部使用,用于临时任务,但保留原文.

The K is often used for "searches" and similar tasks: CO2 and CO₂ will be interpreted in the same manner, but this could change the meaning of the text, so it should often used only internally, for temporary tasks, but keeping the original text.

这篇关于String.normalize() 有什么意义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆