将韩文音节分解为字母(jamo) [英] Breaking down a Hangul syllable into letters (jamo)

查看:259
本文介绍了将韩文音节分解为字母(jamo)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个处理韩语句子的程序,我需要一种方法来将一个音节或块子分解成字母。对于那些不懂韩语的人来说,一个音节由2-4个字母(jamo)组成,创造了数千种不同的组合。我想做的是将那些音节分解成形成它的字母。

I'm working on a program that deals with Korean sentences and I need a way to break down a syllable, or block, into its letters. For those who don't know Hangul, a syllable is composed of 2-4 letters (jamo), creating thousands of different combinations. What I'd like to do is break down those syllables into the letters that form it.

我能够通过将其Unicode值与该范围内的相关字母进行比较来获得第一个字母,即以x字母开头的音节在y范围内。但是,我找不到其余的字母。

I was able to get the first letter by comparing its Unicode value to the associated letter in that range, i.e. a syllable that starts with x letter is in y range. However, I'm at a loss for finding the rest of the letters.

这是一个包含Hangul音节的Unicode值的表: http://jrgraphix.net/r/Unicode/AC00-D7AF

This is a table containing the Unicode values for Hangul syllables: http://jrgraphix.net/r/Unicode/AC00-D7AF

推荐答案

Hangul音节分解(例如 + + )是通过 java.text.Normalizer class:

Hangul syllable decomposition (e.g. + + ) is done in Java through the java.text.Normalizer class:

String s = Normalizer.normalize("\uD4DB", Normalizer.Form.NFD);

Hangul分解的算法也在 Unicode标准的第3.12节(自第142页起);由于规范化也会影响其他非韩语字符,因此您应该熟悉 UAX#15

The algorithm for Hangul decomposition is also given in Section 3.12 of the Unicode Standard (from page 142); and since normalisation also affects other, non-Hangul characters, you should familiarise yourself with the general principles and forms of Unicode normalisation in UAX #15.

这篇关于将韩文音节分解为字母(jamo)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆