计数音节一个字 [英] Counting Syllables In A Word

查看:217
本文介绍了计数音节一个字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在寻找一种算法的一个完全准确的说法来计算的话音节。什么我发现,当我的研究是不一致或我所知道的,产生不正确的结果。有没有人对如何做到这一点有什么建议?谢谢你。

I'm looking for a fully accurate statement of an algorithm to count syllables in words. What I'm finding when I research is inconsistent or what I know to generate incorrect results. Does anyone have any suggestions of how to accomplish this? Thanks.

我现在使用的算法:

  1. 计数在词元音的数量。
  2. 请不要指望双元音(雨有2个元音,但只有1个音节)
  3. 如果在Word中最后一个字母是元音不计(方为1音节)

是否有任何更多的规则我失踪?我试图确定在测试我的不正确的结果,如果我使用的算法是错误的或我实现了。

Are there any more rules I'm missing? I'm trying to determine in testing for my incorrect results if the algorithm I'm using is wrong or my implementation of it.

推荐答案

歧义是一个巨大的问题在自然语言处理,但有些任务可以真正漂亮的准确性歧义处理。原来的音就是其中之一,所以不要听其他的答案。 :)

Ambiguity is a huge issue in natural language processing, but some tasks can actually handle with the ambiguity with nice accuracy. It turns out syllabification is one of them, so don't listen to the other answers. :)

您可以用算法几乎整个英语词汇获得正确的音节来了,但它看起来很复杂,以正确编程。

You could come up with algorithms achieving correct syllabification virtually throughout the English vocabulary, but it seems complicated to program correctly.

和往常一样,在手工制作的算法不帮助太多,自然语言处理研究人员使用包含给定词的正确答案手工标记的语料库。经验和教训的算法,然后用来而且还提供非常准确。您可以使用 LingPipe的音节(见英语音节)随后这一做法。

As always, when hand-made algorithms don't help too much, Natural Language Processing researchers use hand-tagged corpora containing the correct answers for given words. Learnings algorithms are then used and often provide great accuracy. You can use LingPipe's syllabification (see "English syllabification") which follows this approach.

英语只有这么多的话,这是我们如何想出了字典。这些词典通常包含正确的音节。你可以刮reference.com。例如,波状条目的包含«未·杜·晚»,这足以知道有有三个音节。

English only has so many words, which is how we came up with dictionaries. Such dictionaries often contain the correct syllabification. You could scrape reference.com. For example, the undulate entry contains « un·du·late », which is enough to know there are three syllables.

其它如字典包括 Answers.com ,的自由字典 Merriam-韦伯斯特时,等等。请阅读条款和条件,自动检索可能不会被允许。而不同的字典不总是同意对方。

Other such dictionaries include Answers.com, The Free Dictionary, Merriam-Webster, and so on. Do read the Terms and Conditions, automated retrieval may not be allowed. And different dictionaries don't always agree with each other.

这将不利于新词或专有名词,但我会说这将是最准确的方法。

It won't help with new words or proper nouns, but I'd say it's going to be the most accurate method.

另一个相关的问题得到了很多更多的接触:断字。但是,请不要使用!这是用在排版程序如乳胶,但只针对提供的一些的正确的连字符,而没有提供一个不正确(上限precision,低召回)。有趣的是注意,只有14个例外,如。项目,后者具有不同的连字根据零件的言语的(动词或名词)。

Another related problem got a lot more exposure: hyphenation. But don't use that! It is used in typesetting programs such as LaTeX, but only aims to provide some of the correct hyphens, without ever providing an incorrect one (high precision, low recall). It's interesting to note that there only are 14 exceptions, eg. project which has a different hyphenation depending on the part-of-speech (verb or noun).

如果您认为这是不够的,你需要的,不就是一个 TeX的断字算法的几种实现存在于其他语言,如Python,Perl或红宝石

If you decide that it's enough for you needs, not that a few implementations of the TeX hyphenation algorithm exist in other languages, such as Python, Perl or Ruby.

这篇关于计数音节一个字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆