如果我使用Java 8的String.codePoints来获取int codePoints数组,那么数组的长度是字符数吗? [英] If I use Java 8's String.codePoints to get an array of int codePoints, is it true that the length of the array is the count of characters?
问题描述
在Java中给出字符串字符串
, string.codePoints()。toArray()。length
反映了 String
的长度,就人类会发现有意义的实际字符而言?换句话说,它是否平滑转义字符和其他编码工件?
Given a String string
in Java, does string.codePoints().toArray().length
reflect the length of the String
in terms of the actual characters that a human would find meaningful? In other words, does it smooth over escape characters and other artifacts of encoding?
编辑人类我的意思是程序员正如我想象的那样,大多数程序员会将 \\\\ n
视为两个字符, ESC
作为一个字符等但是现在我看到即使重音标记被雾化也无所谓。
Edit By "human" I kind of meant "programmer" as I would imagine most programmers would see \r\n
as two characters, ESC
as one character, etc. But now I see that even the accent marks get atomized so it doesn't matter.
推荐答案
没有。
例如:
-
控制字符(例如ESC, CR,NL等等不会被删除。这些在Unicode中具有不同的代码点。
Control characters (such as ESC, CR, NL, etcetera) will not be removed. These have distinct codepoints in Unicode.
空格,制表符等的序列未合并
Sequences of spaces, tabs, etc are not combined
自由连字符( http://www.fileformat.info/info /unicode/char/00AD/index.htm )不删除字符。
Discretionary hyphen (http://www.fileformat.info/info/unicode/char/00AD/index.htm) characters are not removed.
Unicode组合字符( https://en.wikipedia.org/wiki/Combining_character )未合并。
Unicode combining characters (https://en.wikipedia.org/wiki/Combining_character) are not combined.
现在有争议的是,其中一些可能是人类会发现有意义的实际角色 ...但总体答案仍然是否定。
Now it is debatable whether some of these might be "actual characters that a human would find meaningful" ... but the overall answer is still No.
您澄清如下:
通过人类我的意思是程序员,因为我想大多数程序员会将\\\\ n视为两个字符...
By "human" I kind of meant "programmer" as I would imagine most programmers would see \r\n as two characters ...
确实如此比这更复杂。我是程序员,对我来说,取决于上下文, \\\\ n
是否有意义。如果我正在阅读README文件,我的大脑会将空白区域中的差异视为没有语义重要性。但是,如果我正在编写解析器,我的代码会考虑空格...取决于它要解析的语言。
It is more complicated than that. I am a programmer, and for me it depends on the context whether \r\n
are meaningful or not. If I am reading a README file, my brain will treat differences in white space as having no semantic importance. But if I am writing a parser, my code would take whitespace into account ... depending on the language it is intended to parse.
这篇关于如果我使用Java 8的String.codePoints来获取int codePoints数组,那么数组的长度是字符数吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!