ICU自定义音译 [英] ICU custom transliteration

查看:55
本文介绍了ICU自定义音译的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望使用 ICU 库进行音译,但我想为一组特定的自定义音译提供自定义音译文件,以便在编译时合并到 ICU 核心中,以便在其他地方以二进制形式使用.出于兼容性原因,我正在使用 ICU 4.2 的源代码.

I am looking to utilize the ICU library for transliteration, but I would like to provide a custom transliteration file for a set of specific custom transliterations, to be incorporated into the ICU core at compile time for use in binary form elsewhere. I am working with the source of ICU 4.2 for compatibility reasons.

据我所知,从 他们网站的 ICU 数据页面,一种方式关于这个是在 ICUHOME/source/data/translit/中创建文件 trnslocal.mk ,在这个文件中有一行 TRANSLIT_SOURCE_LOCAL=custom.txt.

As I understand it, from the ICU Data page of their website, one way of going about this is to create the file trnslocal.mk within ICUHOME/source/data/translit/ , and within this file have the single line TRANSLIT_SOURCE_LOCAL=custom.txt.

对于custom.txt 文件本身,我使用了以下格式,基于主文件root.txt:

For the custom.txt file itself, I used the following format, based on the master file root.txt:

custom{
    RuleBasedTransliteratorIDs {
            Kanji-Romaji {
            file {
              resource:process(transliterator){"custom/Kanji_Romaji.txt"}
              direction{"FORWARD"}
            }
         }
    }
    TransliteratorNamePattern {
        // Format for the display name of a Transliterator.
        // This is the language-neutral form of this resource.
        "{0,choice,0#|1#{1}|2#{1}-{2}}" // Display name
    }
    // Transliterator display names
    // This is the English form of this resource.
    "%Translit%Hex"         { "%Translit%Hex" }
    "%Translit%UnicodeName" { "%Translit%UnicodeName" }
    "%Translit%UnicodeChar" { "%Translit%UnicodeChar" }
    TransliterateLATIN{        
        "",
        ""
    }
}

然后我将文件 Kanji_Romaji.txt 存储在目录 custom 中,如发现 这里.因为它使用 > 而不是我在其他文件中看到的 ,我适当地转换了每个条目,所以它们现在看起来像:

I then store within the directory custom the file Kanji_Romaji.txt, as found here. Because it uses > instead of the I have seen in other files, I converted each entry appropriately, so they now look like:

丁 → Tei ;
七 → Shichi ;

当我编译 ICU 项目时,没有出现任何错误.

When I compile the ICU project, I am presented with no errors.

然而,当我尝试在测试文件中使用这个自定义音译器时(一个与内置音译器配合良好的测试文件),我遇到了错误 error: 65569:U_INVALID_ID.

When I attempt to utilize this custom transliterator within a testfile, however (a testfile that works fine with the in-built transliterators), I am met with the error error: 65569:U_INVALID_ID.

我正在使用以下代码构建音译器并输出错误:

I am using the following code to construct the transliterator and output the error:

UErrorCode status = U_ZERO_ERROR;
Transliterator *K_R = Transliterator::createInstance("Kanji-Romaji", UTRANS_FORWARD, status);
if (U_FAILURE(status))
{
std::cout << "error: " << status << ":" << u_errorName(status) << std::endl;
return 0;
}

此外,循环到 Transliterator::countAvailableIDs()Transliterator::getAvailableID(i) 不会列出我的自定义音译.我记得读过关于自定义转换器的内容,它们必须在/source/data/mappings/convrtrs.txt 中注册.有没有类似的音译文件?

Additionally, a loop through to Transliterator::countAvailableIDs() and Transliterator::getAvailableID(i) does not list my custom transliteration. I remember reading with regard to custom converters that they must be registered within /source/data/mappings/convrtrs.txt . Is there a similar file for transliterators?

我的自定义音译器似乎没有被内置到适当的包中(尽管没有编译错误),格式不正确,或者没有注册使用.顺便说一句,我知道运行时的 RuleBasedTransliterator 路由,但我希望能够编译自定义音译以在任何生成的二进制文件中使用.

It seems that my custom transliterator is either not being built into the appropriate packages (though there are no compile errors), is improperly formatted, or somehow not being registered for use. Incidentally, I am aware of the RuleBasedTransliterator route at runtime, but I would prefer to be able to compile the custom transliterations for use in any produced binary.

如果需要任何其他说明,请告诉我.我知道这里至少有一个 ICU 程序员,他在我在其他地方写过和看到的其他帖子中也很有帮助.我很感激我能找到的任何帮助.提前致谢!

Let me know if any additional clarification is necessary. I know there is at least one ICU programmer on here, who has been quite helpful in other posts I have written and seen elsewhere as well. I would appreciate any help I can find. Thank you in advance!

推荐答案

音译来自 CLDR - 你可以将您的音译器添加到 CLDR(crosswire 目录在 cldr/目录中以 XML 格式包含它)并重建 ICU 数据.ICU 没有像您尝试那样添加音译器的简单机制.我会做的是忘记 trnslocal.mk 或 custom.txt,因为您不需要添加任何文件,只需修改 root.txt - 如果您有建议的改进,您可能会提交错误.

Transliterators are sourced from CLDR - you could add your transliterator to CLDR (the crosswire directory contains it in XML format in the cldr/ directory) and rebuild ICU data. ICU doesn't have a simple mechanism for adding transliterators as you are trying to do. What I would do is forget about trnslocal.mk or custom.txt as you don't need to add any files, and simply modify root.txt - you might file a bug if you have a suggested improvement.

这篇关于ICU自定义音译的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆