如何使用ICU解析汉字数字字符? [英] How to parse kanji numeric characters using ICU?

查看:116
本文介绍了如何使用ICU解析汉字数字字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ICU编写一个函数来解析由汉字数字字符组成的Unicode字符串,并希望返回该字符串的整数值.

I'm writing a function using ICU to parse an Unicode string which consists of kanji numeric character(s) and want to return the integer value of the string.

五" => 5
三十一" => 31
五千九百七十二" => 5972

"五" => 5
"三十一" => 31
"五千九百七十二" => 5972

我将语言环境设置为Locale :: getJapan()并使用NumberFormat :: parse()解析字符串.但是,每当我向其传递任何汉字字符时,parse()方法都将返回U_INVALID_FORMAT_ERROR.

I'm setting the locale to Locale::getJapan() and using the NumberFormat::parse() to parse the character string. However, whenever I pass it any Kanji characters, the parse() method is returning U_INVALID_FORMAT_ERROR.

有人知道ICU在NumberFormat :: parse()方法中是否支持汉字字符串吗?我希望由于将语言环境设置为日语,因此能够解析日文汉字数字值.

Does anyone know if ICU supports Kanji character strings in the NumberFormat::parse() method? I was hoping that since I'm setting the Locale to Japanese that it would be able to parse Kanji numeric values.

谢谢!

#include <iostream>
#include <unicode/numfmt.h>

using namespace std;

int main(int argc, char **argv) {
    const Locale &jaLocale = Locale::getJapan();
    UErrorCode status = U_ZERO_ERROR;
    NumberFormat *nf = NumberFormat::createInstance(jaLocale, status);

    UChar number[] = {0x4E94}; // Character for '5' in Japanese '五'
    UnicodeString numStr(number);
    Formattable formattable;
    nf->parse(numStr, formattable, status);
    if (U_FAILURE(status)) {
        cout << "error parsing as number: " << u_errorName(status) << endl;
        return(1);
    }
    cout << "long value: " << formattable.getLong() << endl;
}

推荐答案

您可以使用ICU基于规则的数字格式(RBNF)模块rbnf.h(C ++)或在C中使用UNUM_SPELLOUT选项在unum.h中使用这两种方法和日语的"ja"语言环境. Atryom为您的C ++代码提供了更正:new RuleBasedNumberFormat(URBNF_SPELLOUT,jaLocale, status);

You can use the ICU Rule Based Number Format (RBNF) module rbnf.h (C++) or for C, in unum.h with the UNUM_SPELLOUT option, both with the "ja" locale for Japanese. Atryom provides a correction to your code for C++: new RuleBasedNumberFormat(URBNF_SPELLOUT,jaLocale, status);

这篇关于如何使用ICU解析汉字数字字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆