RegEx:了解音节计数器代码 [英] RegEx: Understanding Syllable Counter Code

查看:160
本文介绍了RegEx:了解音节计数器代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用 Dylan的问题a>在这里关于JavaScript音节计数,更具体地说, artfulhacker的答案,在我自己的代码中,不管单个或多个单词字符串我喂它,功能总是能够正确地计数音节数。

I have used Dylan's question on here regarding JavaScript syllable counting, and more specifically artfulhacker's answer, in my own code and, regardless of which single or multi word string I feed it, the function is always able to correctly count the number of syllables.

我对RegEx的经验有限,没有足够的先验知识来破译什么是发生在以下代码没有一些帮助。我不是一个很高兴的人,因为我从某个地方提取了一些代码,只是在没有我知道 它可以工作的情况下工作。有人能够在下面的 new_count(word)函数中清楚发生什么,并帮助我解释RegEx的使用,以及该函数是否能够正确计数音节?许多

I have a limited experience with RegEx and not enough prior knowledge to decipher what exactly is happening in the following code without some help. I'm not someone who is ever happy with having some code I pulled from somewhere just work without me knowing how it works. Is someone able to please articulate what is happening in the new_count(word) function below and help me decipher the use of RegEx and how it is that the function is able to correctly count syllables? Many

function new_count(word) {
  word = word.toLowerCase();                                     //word.downcase!
  if(word.length <= 3) { return 1; }                             //return 1 if word.length <= 3
  word = word.replace(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '');   //word.sub!(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '')
  word = word.replace(/^y/, '');                                 //word.sub!(/^y/, '')
  return word.match(/[aeiouy]{1,2}/g).length;                    //word.scan(/[aeiouy]{1,2}/).size
}


推荐答案

就我所见,我们基本上是想用一些特殊情况来计算元音或元音对。我们从最后一行开始,即这样做,即数字元音和对:

As far as I see it, we basically want to count the vowels, or vowel pairs, with some special cases. Let's start by the last line, which does that, i.e. count vowels and pairs:

return word.match(/[aeiouy]{1,2}/g).length;

这将匹配任何元音或元音对。 [...] 表示字符类,即如果我们通过字符串逐个字符,我们有一个匹配,如果实际的字符是其中之一。 {1,2} 重复次数,即这意味着我们应该只匹配一个或两个这样的字符。

This will match any vowel, or vowel pair. [...] means a character class, i.e. that if we go through the string character-by-character, we have a match, if the actual character is one of those. {1, 2} is the number of repetitions, i.e. it means that we should match exactly one or two such characters.

另外两行是特殊情况。

word = word.replace(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '');

此行将从单词结尾删除音节,它们是:

This line will remove 'syllables' from the end of the word, which are either:


  • X es(其中 X 除了任何' laeiouy ',例如' zes ')

  • ed

  • X e(其中 X laeiouy 之外的任何内容,例如 xe

  • Xes (where X is anything but any of 'laeiouy', e.g. 'zes')
  • ed
  • Xe (where X is anything but any of 'laeiouy', e.g. 'xe')

我不太确定这个背后的语法含义是什么,但我猜,这个单词结尾的音节就像-ed,-ded,-xed等并不是真的如此。
对于正则表达式部分:(? 。)是一个非捕获组。我想这不是真的很重要,在这种情况下,这个组是非捕获;这只是意味着我们想把整个表达式分组,但是我们不需要再回头看。然而,我们也可以使用捕获组(即(...)

(I'm not really sure what the grammatical meaning behind this is, but I guess, that 'syllables' at the end of the word, like '-ed', '-ded', '-xed' etc. don't really count as such.) As for the regexp part: (?:...) is a non-capturing group. I guess it's not really important in this case that this group be non-capturing; this just means that we would like to group the whole expression, but then we do not need to refer back to it. However, we could have used a capturing group too (i.e. (...) )

[^ ...] 是一个否定的字符类。这意味着,匹配任何字符,这不是列在这里。 (与上述(非否定)字符类比较)
管道符号,即 | ,是最后, $ 锚匹配结束行或字符串(取决于上下文)。

The [^...] is a negated character class. It means, match any character, which is none of those listed here. (Compare to the (non-negated) character-class mentioned above.) The pipe symbol, i.e. |, is the alternation operator, which means, that any of the expressions can match. Finally, the $ anchor matches the end of the line, or string (depending on the context).

word = word.replace(/^y/, '');

此行从单词开头删除y,开头可能是y不算作音节 - 这在我看来是有道理的)。
^ 是匹配行的开头或上面提到的字符串(cf $ )。

This line removes 'y'-s from the beginning of words (probably 'y' at the beginning does not count as a syllable -- which makes sense in my opinion). ^ is the anchor for matching the beginning of the line, or string (c.f. $ mentioned above).

注意:算法只有在真的包含一个单词时才起作用。

Note: the algorithm only works if word really contains one single word.

这篇关于RegEx:了解音节计数器代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆