如何正确计算Java中的字符串的长度? [英] How to correctly compute the length of a String in Java?
问题描述
我知道有 String#length
以及 Character
中的多种方法/代码点。
Java中建议的方式是返回Unicode标准指定的结果( : 使用代理对: I know there is What is the suggested way in Java to actually return the result as specified by Unicode standards (UAX#29), taking things like language/locale, normalization and grapheme clusters into account? Consider this code: Running it: With surrogate pairs: 这篇关于如何正确计算Java中的字符串的长度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! java.text.BreakIterator
可以迭代文本,并可以报告字符,单词,句子和行边界。
def length(text:String,locale:java.util.Locale = java.util.Locale.ENGLISH)= {
val charIterator = java.text.BreakIterator.getCharacterInstance(locale)
charIterator.setText(text)
var result = 0
while(charIterator.next()! = BreakIterator.DONE)result + = 1
result
}
scala> val text =Thîslóo̰kswe̐ird!
text:java.lang.String =Thîslóo̰kswe̐ird!
scala> val length = length(text)
length:Int = 17
scala> val codepoints = text.codePointCount(0,text.length)
codepoints:Int = 21
scala> val parens =\\\�\\\�surpi\\\́se!\\\�\\\�
parens:java.lang.String =String#length
and the various methods in Character
which more or less work on code units/code points.java.text.BreakIterator
is able to iterate over text and can report on "character", word, sentence and line boundaries.def length(text: String, locale: java.util.Locale = java.util.Locale.ENGLISH) = {
val charIterator = java.text.BreakIterator.getCharacterInstance(locale)
charIterator.setText(text)
var result = 0
while(charIterator.next() != BreakIterator.DONE) result += 1
result
}
scala> val text = "Thîs lóo̰ks we̐ird!"
text: java.lang.String = Thîs lóo̰ks we̐ird!
scala> val length = length(text)
length: Int = 17
scala> val codepoints = text.codePointCount(0, text.length)
codepoints: Int = 21
scala> val parens = "\uDBFF\uDFFCsurpi\u0301se!\uDBFF\uDFFD"
parens: java.lang.String =