如何正确计算Java中的字符串的长度? [英] How to correctly compute the length of a String in Java?

查看:1113
本文介绍了如何正确计算Java中的字符串的长度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道有 String#length 以及 Character 中的多种方法/代码点。



Java中建议的方式是返回Unicode标准指定的结果(

解决方案 / div>

java.text.BreakIterator 可以迭代文本,并可以报告字符,单词,句子和行边界。



  def length(text:String,locale:java.util.Locale = java.util.Locale.ENGLISH)= {
val charIterator = java.text.BreakIterator.getCharacterInstance(locale)
charIterator.setText(text)

var result = 0
while(charIterator.next()! = BreakIterator.DONE)result + = 1
result
}

  scala> val text =Thîslóo̰kswe̐ird! 
text:java.lang.String =Thîslóo̰kswe̐ird!

scala> val length = length(text)
length:Int = 17

scala> val codepoints = text.codePointCount(0,text.length)
codepoints:Int = 21

使用代理对:

  scala> val parens =\\\�\\\�surpi\\\́se!\\\�\\\�
parens:java.lang.String =

I know there is String#length and the various methods in Character which more or less work on code units/code points.

What is the suggested way in Java to actually return the result as specified by Unicode standards (UAX#29), taking things like language/locale, normalization and grapheme clusters into account?

解决方案

java.text.BreakIterator is able to iterate over text and can report on "character", word, sentence and line boundaries.

Consider this code:

def length(text: String, locale: java.util.Locale = java.util.Locale.ENGLISH) = {
  val charIterator = java.text.BreakIterator.getCharacterInstance(locale)
  charIterator.setText(text)

  var result = 0
  while(charIterator.next() != BreakIterator.DONE) result += 1
  result
}

Running it:

scala> val text = "Thîs lóo̰ks we̐ird!"
text: java.lang.String = Thîs lóo̰ks we̐ird!

scala> val length = length(text)
length: Int = 17

scala> val codepoints = text.codePointCount(0, text.length)
codepoints: Int = 21 

With surrogate pairs:

scala> val parens = "\uDBFF\uDFFCsurpi\u0301se!\uDBFF\uDFFD"
parens: java.lang.String = 
                        

这篇关于如何正确计算Java中的字符串的长度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆