Java中的Unicode纠正标题案例 [英] Unicode-correct title case in Java

查看:81
本文介绍了Java中的Unicode纠正标题案例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在浏览所有StackOverflow中有关Java资本化问题的大量问题,而且他们似乎都不关心国际化,事实上似乎没有一个在国际背景下工作。所以这是我的问题。

I've been looking through all StackOverflow in the bazillion of questions about capitalizing a word in Java, and none of them seem to care the least about internationalization and as a matter of fact none really seem to work in an international context. So here is my question.

我在Java中有一个String,它代表一个单词 - 所有isLetter()字符,没有空格。我想让第一个字符大写,其余小写。我确实有我的单词的语言环境。

I have a String in Java, which represents a word - all isLetter() characters, no whitespace. I want to make the first character upper case and the rest lower case. I do have the locale of my word in handy.

我的字符串的最后一部分调用.substring(1).toLowerCase(Locale)很容易。
我不知道如何获得正确的第一个字符。

It's easy enough to call .substring(1).toLowerCase(Locale) for the last part of my string. I have no idea how to get the correct first character, though.

我遇到的第一个问题是荷兰语,其中ij应该是有向图的一起资本化。我可以手工处理这个问题,因为我了解它;现在可能还有其他语言有这种我不知道的东西,我相信如果我问得好的话,Unicode会告诉我。但我不知道怎么问。

The first problem I have is with Dutch, where "ij" being a digraph should be capitalized together. I could special-case this by hand, because I know about it; now there may be other languages with this kind of thing that I don't know about, and I'm sure Unicode will tell me if I ask nicely. But I don't know how to ask.

即使上述问题得到解决,我仍然没有办法处理英语,土耳其语和希腊语,因为Character支持titlecase但不支持locale,而String支持locales但不支持titlecase。

Even if the above problem is solved, I'm still stuck with no proper way to handle English, Turkish and Greek, because Character supports titlecase but no locale, and String supports locales but not titlecase.

如果我取代码点,并将其传递给Character.toTitleCase(),这将失败,因为无法将语言环境传递给此方法。因此,如果系统语言环境是英语,但单词是土耳其语,并且单词的第一个字符是i,我将得到I而不是İ,这是错误的。
现在,如果我使用子字符串并使用.toUpperCase(Locale),这将失败,因为它是高位而不是标题大小写。所以,如果单词是希腊语,我仍然会得到错误的字符。

If I take the code point, and pass it to Character.toTitleCase(), this will fail because there is no way to pass the locale to this method. So if the system locale is in English but the word is Turkish, and the first char of the word is "i", I'll get "I" instead of "İ" and this is wrong. Now if I take a substring and use .toUpperCase(Locale), this will fail because it's upper and not title case. So if the word is Greek, I'll still get the wrong character.

如果有人有有用的指示,我会很高兴听到它们。

If anyone has useful pointers, I'd be happy to hear them.

推荐答案

和你一样,我无法在核心Java API中找到合适的方法。

Like you, I was unable to find a suitable method in the core Java API.

但是,似乎确实存在 locale-sensitive string-title-case方法( UCharacter#toTitleCase )在ICU图书馆

However, there does seem to be a locale-sensitive string-title-case method (UCharacter#toTitleCase) in the ICU library.

查看相关ICU方法的来源( UCharacter#toTitleCase UCaseProps#toUpperOrTitle ),似乎没有很多特定于语言环境的特殊情况下的标题 - 套管,所以你可能可以逃脱以下内容:

Looking at the source for the relevant ICU methods (UCharacter#toTitleCase and UCaseProps#toUpperOrTitle), there don't seem to be many locale-specific special cases for title-casing, so you might be able to get away with the following:


  1. 在字符串中查找第一个外壳字符。

  2. 如果有标题-case表单与其大写形式不同,使用它。

  3. 否则,对第一个字符及其组合字符执行区域设置敏感的大写。

  4. 对字符串的其余部分执行区域设置敏感的小写。

  5. 如果区域设置是荷兰语,并且第一个套接字符是I后跟 j,大写j。

  1. Find the first cased character in the string.
  2. If it has a title-case form distinct from its upper-case form, use that.
  3. Otherwise, perform a locale-sensitive upper-case on that first character and its combining characters.
  4. Perform a locale-sensitive lower-case on the rest of the string.
  5. If the locale is Dutch and the first cased character is an "I" followed by a "j", upper-case the "j".

这篇关于Java中的Unicode纠正标题案例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆