如果我使用Java 8的String.codePoints来获取int codePoints数组,那么数组的长度是字符数吗? [英] If I use Java 8's String.codePoints to get an array of int codePoints, is it true that the length of the array is the count of characters?

查看:163
本文介绍了如果我使用Java 8的String.codePoints来获取int codePoints数组,那么数组的长度是字符数吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Java中给出字符串字符串 string.codePoints()。toArray()。length 反映了 String 的长度,就人类会发现有意义的实际字符而言?换句话说,它是否平滑转义字符和其他编码工件?

Given a String string in Java, does string.codePoints().toArray().length reflect the length of the String in terms of the actual characters that a human would find meaningful? In other words, does it smooth over escape characters and other artifacts of encoding?

编辑人类我的意思是程序员正如我想象的那样,大多数程序员会将 \\\\ n 视为两个字符, ESC 作为一个字符等但是现在我看到即使重音标记被雾化也无所谓。

Edit By "human" I kind of meant "programmer" as I would imagine most programmers would see \r\n as two characters, ESC as one character, etc. But now I see that even the accent marks get atomized so it doesn't matter.

推荐答案

没有。

例如:


  • 控制字符(例如ESC, CR,NL等等不会被删除。这些在Unicode中具有不同的代码点。

  • Control characters (such as ESC, CR, NL, etcetera) will not be removed. These have distinct codepoints in Unicode.

空格,制表符等的序列未合并

Sequences of spaces, tabs, etc are not combined

自由连字符( http://www.fileformat.info/info /unicode/char/00AD/index.htm )不删除字符。

Discretionary hyphen (http://www.fileformat.info/info/unicode/char/00AD/index.htm) characters are not removed.

Unicode组合字符( https://en.wikipedia.org/wiki/Combining_character )未合并。

Unicode combining characters (https://en.wikipedia.org/wiki/Combining_character) are not combined.

现在有争议的是,其中一些可能是人类会发现有意义的实际角色 ...但总体答案仍然是否定。

Now it is debatable whether some of these might be "actual characters that a human would find meaningful" ... but the overall answer is still No.

您澄清如下:


通过人类我的意思是程序员,因为我想大多数程序员会将\\\\ n视为两个字符...

By "human" I kind of meant "programmer" as I would imagine most programmers would see \r\n as two characters ...

确实如此比这更复杂。我是程序员,对我来说,取决于上下文, \\\\ n 是否有意义。如果我正在阅读README文件,我的大脑会将空白区域中的差异视为没有语义重要性。但是,如果我正在编写解析器,我的代码会考虑空格...取决于它要解析的语言。

It is more complicated than that. I am a programmer, and for me it depends on the context whether \r\n are meaningful or not. If I am reading a README file, my brain will treat differences in white space as having no semantic importance. But if I am writing a parser, my code would take whitespace into account ... depending on the language it is intended to parse.

这篇关于如果我使用Java 8的String.codePoints来获取int codePoints数组,那么数组的长度是字符数吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆