如何在词性标记的平滑处理中采用后缀 [英] How to take the suffix in smoothing of Part of speech tagging

查看:78
本文介绍了如何在词性标记的平滑处理中采用后缀的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作词性标注器".我正在处理带有后缀的未知单词.

但是主要的问题是,我将如何确定后缀的数量...应该是预先确定的(例如Weischedel方法),还是必须采用单词的最后几个字母(例如Samuelsson方法). /p>

哪种方法更好……

解决方案

快速谷歌搜索表明,Weischedel方法足以满足英语的英语要求,英语仅具有基本的词形变化.在处理变形语言时,Samuelsson方法似乎更好(直观上有意义).

Brants(2000)使用Samuelsson(1993)的后缀分析,这似乎最适用于变形的语言.

(不过,这与Weischedel的方法没有直接的比较.)

I am making a "Part of speech Tagger". I am handling the unknown word with the suffix.

But the main issue is that how would i decide the number of suffix... should it be pre-decided (like Weischedel approach) or I have to take the last few alphabets of the words(like Samuelsson approach).

Which approach would be better......

解决方案

Quick googling suggests that the Weischedel approach is sufficient for English, which has only rudimentary morphological inflection. The Samuelsson approach seems to work better (which makes sense intuitively) when it comes to processing inflecting languages.

A Resource-light Approach to Morpho-syntactic Tagging - Google Books p 9 quote:

To handle unknown words Brants (2000) uses Samuelsson's (1993) suffix analysis, which seems to work best for inflected languages.

(This is not in a direct comparison to Weischedel's approach, though.)

这篇关于如何在词性标记的平滑处理中采用后缀的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆