如何计算音节文字与正则表达式和Java [英] How to calculate syllables in text with regex and Java

查看:265
本文介绍了如何计算音节文字与正则表达式和Java的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有文本作为字符串,需要计算每个单词的音节数。我试图分裂的所有文字到文字的数组,比单独处理每一个字。我用,经常前pressions。但对于音节模式不工作,因为它应该。请指点如何改变它来计算音节正确的号码。我最初的code。

 公众诠释getNumSyllables()
{
    的String []字=的getText()与toLowerCase()分裂([A-ZA-Z] +)。;
    诠释计数= 0;
    清单<串GT;令牌=新的ArrayList<串GT;();
    对(串词:字){
            令牌= Arrays.asList(word.split([bcdfghjklmnpqrstvwxyz] * [AEIOU] + [bcdfghjklmnpqrstvwxyz] *));
            数+ = tokens.size();            }
    返回计数;
}


解决方案

此问题是由加州大学圣迭戈分校的一个Java课程,对吗?

我想你应该为这个问题提供足够的信息,,这样就不会谁想要迷惑人提供一些帮助。在这里,我有我自己的解决方案,它已经由测试情况下,从本地节目,也从OJ UCSD测试。

您错过了关于这个问题的音节的定义一些重要信息。 其实我觉得这个问题的关键是,你应该如何应对电子例如,假设有<$组合C $ C> TE 。如果你把 TE 在一个字的中间,当然它应该算作一个音节;但是,如果它在一个单词的结尾,在电子应该被看作是一个不发音的e 英文,所以它不应该被看作是一个音节。

就是这样。我想我的想法写下来与一些伪code:

 如果(最后一个字符为e){
        如果(这是不发音的e在这个词的末尾){
           除去不发音的e;
           算上剩余部分作为常规;
        }其他{
           算上++;
  }其他{
        指望它作为常规;
  }
}

您可能会发现的我不仅使用正则表达式,以解决这一问题。其实我已经想过这个问题:可这个问题真正被使用正则表达式只是做?我的回答是:不,我不这么认为。至少现在,用知识UCSD给了我们,这是太困难的事了。正则表达式是一个强大的工具,它可以非常快的地图所需的字符。不过正则表达式缺少一些功能。就拿 TE 为例一次,正则表达式将无法想到两次当其面对的字像 teate (我做了这个词只是举例)。如果我们的正则表达式将计算第一个 TE 音节,那么为什么最近的 TE 呢?

同时,加州大学圣地亚哥分校实际上已经谈到它的赋值纸:


  

如果你发现自己做心理体操拿出一个单一的正则表达式来算,直接音节,这通常是一个指示,有一个简单的解决方案(提示:考虑过字的循环 - 参阅下提示)。仅仅因为一张code(例如一个正则表达式)的短并不意味着它总是更好。


这里的暗示是,你应该一起思考这个问题,一些循环,与正则表达式的完美组合。

OK,我现在终于显示我的code:

 保护INT countSyllables(字符串字)
{
    // TODO:实现此方法,这样就可以从调用它
    // getNumSyllables方法在BasicDocument(模块1)和
    // EfficientDocument(模块2)。
    诠释计数= 0;
    字= word.toLowerCase();    如果(word.charAt(word.length() - 1)=='E'){
        如果(silente(字)){
            串newword = word.substring(0,word.length() - 1);
            数=计+ countit(newword);
        }其他{
            算上++;
        }
    }其他{
        数=计+ countit(字);
    }
    返回计数;
}私人诠释countit(字符串字){
    诠释计数= 0;
    图案分配器= Pattern.compile([^ aeiouy] * [aeiouy] +);
    匹配M = splitter.matcher(字);    而(m.find()){
        算上++;
    }
    返回计数;
}私人布尔silente(字符串字){
    字= word.substring(0,word.length() - 1);    模式烨= Pattern.compile([aeiouy]);
    匹配M = yup.matcher(字);    如果(m.find()){
        返回true;
    }其他
        返回false;
}

您可能会发现,除了从给定的方法 countSyllables ,我还创建了两个额外的方法 countit silente countit 被用于计算字里面的音节, silente 正在试图弄明白是这个词用结束沉默电子。而且还应该注意到的定义不发音的e 。例如,应考虑不发音的e ,而被认为是不发音的e

这里是我的状态code已经通过了测试,从加州大学圣地亚哥分校本地测试用例和OJ:

和OJ从测试结果:

P.S:这应该是罚款使用像[^ aeiouy]直接,因为这个词被解析之前,我们调用此方法。也更改为小写是必要的,这将节省大量的处理大写工作。我们要的是唯一的音节数。
谈到数字,一种优雅的方式是定义计数为静态的,所以私有方法可以直接使用计数++ 里面。但现在它的罚款。

随时联系我,如果你还没有得到这个问题的方法:)

I have text as a String and need to calculate number of syllables in each word. I've tried to split all text into array of words and than processed each word separately. I used regular expressions for that. But pattern for syllables doesn't work as it should. Please advice how to change it to calculate correct number of syllables. My initial code.

public int getNumSyllables()
{
    String[] words = getText().toLowerCase().split("[a-zA-Z]+");
    int count=0;
    List <String> tokens = new ArrayList<String>();
    for(String word: words){
            tokens = Arrays.asList(word.split("[bcdfghjklmnpqrstvwxyz]*[aeiou]+[bcdfghjklmnpqrstvwxyz]*"));
            count+= tokens.size();

            }
    return count;
}

解决方案

This question is from a Java Course of UCSD, am I right?

I think you should provide enough information for this question, so that it won't confused people who want to offer some help. And here I have my own solution, which already been tested by the test case from the local program, also the OJ from UCSD.

You missed some important information about the definition of syllable in this question. Actually I think the key point of this problem is how should you deal with the e. For example, let's say there is a combination of te. And if you put te in the middle of a word, of course it should be counted as a syllable; However if it's at the end of a word, the e should be thought as a silent e in English, so it should not be thought as a syllable.

That's it. And I would like to write down my thought with some pseudo code:

  if(last character is e) {
        if(it is silent e at the end of this word) {
           remove the  silent e;
           count the rest part as regular;
        } else {
           count++;
  } else {
        count it as regular;
  }
}

You may find that I am not only using regex to deal with this problem. Actually I have thought about it: can this question really be done only using regex? My answer is: nope, I don't think so. At least now, with the knowledge UCSD gives us, it's too difficult to do that. Regex is a powerful tool, it can map the desired characters very fast. However regex is missing some functionality. Take the te as example again, regex won't be able to think twice when it is facing the word like teate (I made up this word just for example). If our regex pattern would count the first te as syllable, then why the last te not?

Meanwhile, UCSD actually have talked about it on the assignment paper:

If you find yourself doing mental gymnastics to come up with a single regex to count syllables directly, that's usually an indication that there's a simpler solution (hint: consider a loop over characters--see the next hint below). Just because a piece of code (e.g. a regex) is shorter does not mean it is always better.

The hint here is that, you should think this problem together with some loop, combining with regex.

OK, I should finally show my code now:

protected int countSyllables(String word)
{
    // TODO: Implement this method so that you can call it from the 
    // getNumSyllables method in BasicDocument (module 1) and 
    // EfficientDocument (module 2).
    int count = 0;
    word = word.toLowerCase();

    if (word.charAt(word.length()-1) == 'e') {
        if (silente(word)){
            String newword = word.substring(0, word.length()-1);
            count = count + countit(newword);
        } else {
            count++;
        }
    } else {
        count = count + countit(word);
    }
    return count;
}

private int countit(String word) {
    int count = 0;
    Pattern splitter = Pattern.compile("[^aeiouy]*[aeiouy]+");
    Matcher m = splitter.matcher(word);

    while (m.find()) {
        count++;
    }
    return count;
}

private boolean silente(String word) {
    word = word.substring(0, word.length()-1);

    Pattern yup = Pattern.compile("[aeiouy]");
    Matcher m = yup.matcher(word);

    if (m.find()) {
        return true;
    } else
        return false;
}

You may find that besides from the given method countSyllables, I also create two additional methods countit and silente. countit is for counting the syllables inside the word, silente is trying to figure it out that is this word end with a silent e. And it should also be noticed that the definition of not silent e. For example, the should be consider not silent e, while ate is considered silent e.

And here is the status my code has already passed the test, from both local test case and OJ from UCSD:

And from OJ the test result:

P.S: It should be fine to use something like [^aeiouy] directly, because the word is parsed before we call this method. Also change to lowercase is necessary, that would save a lot of work dealing with the uppercase. What we want is only the number of syllables. Talking about number, an elegant way is to define count as static, so the private method could directly use count++ inside. But now it's fine.

Feel free to contact me if you still don't get the method of this question :)

这篇关于如何计算音节文字与正则表达式和Java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆