检测单词中的音节 [英] Detecting syllables in a word

查看:137
本文介绍了检测单词中的音节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要找到一种相当有效的方法来检测单词中的音节.例如,

I need to find a fairly efficient way to detect syllables in a word. E.g.,

Invisible-> in-vi-sib-le

Invisible -> in-vi-sib-le

可以使用一些音节化规则:

There are some syllabification rules that could be used:

V 简历 风投 CVC 闭路电视 中央电视台 CVCC

V CV VC CVC CCV CCCV CVCC

*其中V是元音,C是辅音. 例如

*where V is a vowel and C is a consonant. E.g.,

发音发音(5 Pro-nun-ci-a-tion; CV-CVC-CV-V-CVC)

Pronunciation (5 Pro-nun-ci-a-tion; CV-CVC-CV-V-CVC)

我尝试了几种方法,其中包括使用正则表达式(仅在您要计算音节时才有用)或硬编码规则定义(被证明是效率很低的蛮力方法),最后使用了有限状态自动机(没有任何有用的结果).

I've tried few methods, among which were using regex (which helps only if you want to count syllables) or hard coded rule definition (a brute force approach which proves to be very inefficient) and finally using a finite state automata (which did not result with anything useful).

我的应用程序的目的是创建给定语言的所有音节的字典.该词典稍后将用于拼写检查应用程序(使用贝叶斯分类器)和文本到语音的合成.

The purpose of my application is to create a dictionary of all syllables in a given language. This dictionary will later be used for spell checking applications (using Bayesian classifiers) and text to speech synthesis.

如果能在我以前的方法之外给我提示解决该问题的另一种方法,我将不胜感激.

I would appreciate if one could give me tips on an alternate way to solve this problem besides my previous approaches.

我使用Java,但是C/C ++,C#,Python,Perl的任何技巧都对我有用.

I work in Java, but any tip in C/C++, C#, Python, Perl... would work for me.

推荐答案

阅读有关出于断字目的使用TeX解决此问题的方法.特别是请参见Frank Liang的论文论文 Com-的单词Hy-phen-a-tion推杆.他的算法非常准确,然后为该算法无法正常工作的情况提供了一个小的例外字典.

Read about the TeX approach to this problem for the purposes of hyphenation. Especially see Frank Liang's thesis dissertation Word Hy-phen-a-tion by Com-put-er. His algorithm is very accurate, and then includes a small exceptions dictionary for cases where the algorithm does not work.

这篇关于检测单词中的音节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆