雪球阻止:定义区域 [英] Snowball Stemming: defining Regions

查看:120
本文介绍了雪球阻止:定义区域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试理解节流阀阻止算法.该算法使用两个区域R1和R2,它们的定义如下:

I'm trying to understand the snoball stemming algorithmus. The algorithmus is using two regions R1 and R2 that are definied as follows:

R1是元音之后第一个非元音之后的区域,或者是 如果没有这样的非元音,则单词末尾的空区域.

R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.

R2是R1中元音之后的第一个非元音之后的区域,或者 如果没有单词,则为单词末尾的空区域 非元音.

R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end of the word if there is no such non-vowel.

http://snowball.tartarus.org/texts/r1r2.html

示例是

    b   e   a   u   t   i   f   u   l
                      |<------------->|    R1
                              |<----->|    R2

   b   e   a   u   t   y
                     |<->|    R1
                       ->|<-  R2

   a   n   i   m   a   d   v   e   r   s   i   o   n
        |<----------------------------------------->|    R1
                |<--------------------------------->|    R2

   s   p   r   i   n   k   l   e   d
                     |<------------->|    R1
                                   ->|<-  R2

    e   u   c   h   a   r   i   s   t
            |<--------------------->|    R1
                        |<--------->|    R2

我的问题是,为什么在弹性状态中的"kled"和在圣体圣事中的"harist"被定义为R1?我以为正确的结果将是墨水"和"arist"?

My question is, why is "kled" in springkled and "harist" in eucharist defined as R1? I thought the correct result would be "inkled" and "arist"?

推荐答案

您应该再次阅读定义,它说:

You should read the definition again, it says :

R1是第一个非元音 元音之后的区域.

R1 is the region after the first non-vowel following a vowel.

否: 紧随其后的是 元音.

Not: followed by a vowel.

sprinkled中,元音之后的第一个非元音是n,因此后面的区域是kled.

In sprinkled, the first non-vowel following a vowel is n, so the region after is kled.

eucharist相同,元音之后的第一个非元音为c,因此后面的区域为harist.

The same for eucharist, the first non-vowel following a vowel is c, so the region after is harist.

这篇关于雪球阻止:定义区域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆