如何检测Java字符串中的日语文本? [英] How can I detect japanese text in a Java string?

查看:145
本文介绍了如何检测Java字符串中的日语文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要能够检测Java字符串中的日语字符。

I need to be able to detect Japanese characters in a Java string.

目前,我正在获取UnicodeBlock并检查其是否等于Character.UnicodeBlock。 .KATAKANA或Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS,但我不是100%会涵盖所有内容。

Currently I'm getting the UnicodeBlock and checking to see if it's equal to Character.UnicodeBlock.KATAKANA or Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS, but I'm not 100% that's going to cover everything.

有任何建议吗?

推荐答案

我使用以下java方法。

I use the following java method. Might not completely address your requirement though.

<!-- language: lang-java -->
/**
 * Returns if a character is one of Chinese-Japanese-Korean characters.
 * 
 * @param c
 *            the character to be tested
 * @return true if CJK, false otherwise
 */
private boolean isCharCJK(final char c) {
    if ((Character.UnicodeBlock.of(c) == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS)
            || (Character.UnicodeBlock.of(c) == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A)
            || (Character.UnicodeBlock.of(c) == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B)
            || (Character.UnicodeBlock.of(c) == Character.UnicodeBlock.CJK_COMPATIBILITY_FORMS)
            || (Character.UnicodeBlock.of(c) == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS)
            || (Character.UnicodeBlock.of(c) == Character.UnicodeBlock.CJK_RADICALS_SUPPLEMENT)
            || (Character.UnicodeBlock.of(c) == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION)
            || (Character.UnicodeBlock.of(c) == Character.UnicodeBlock.ENCLOSED_CJK_LETTERS_AND_MONTHS)) {
        return true;
    }
    return false;
}

Futhermore,这些似乎适用于平假名和片假名字符:

Futhermore, these seem they should work for Hiragana and Katakana characters:

private boolean isHiragana(final char c)
{
     return (Character.UnicodeBlock.of(c)==Character.UnicodeBlock.HIRAGANA);
}

private boolean isKatakana(final char c)
{
     return (Character.UnicodeBlock.of(c)==Character.UnicodeBlock.KATAKANA);
}

这篇关于如何检测Java字符串中的日语文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆