您能以编程方式检测英语单词的复数形式并得出单数形式吗? [英] Can you programmatically detect pluralizations of English words, and derive the singular form?
问题描述
鉴于我们假设的一些英语单词为复数形式,是否可以导出单数形式?如果可能的话,我想避免使用查找/字典表.
一些例子:
Examples -> Example a simple 's' suffix Glitch -> Glitches 'es' suffix, as opposed to above Countries -> Country 'ies' suffix. Sheep -> Sheep no change: possible fallback for indeterminate values
或者,这似乎是一个详尽无遗的清单. >
使用x
语言的图书馆建议是可以的,只要它们是开源的(即,以便有人可以检查它们以确定如何使用y
语言)
这实际上取决于您以编程方式"的含义.英语的一部分工作于易于理解的规则,而另一部分则没有.它主要与频率有关.对于简短的概述,您可以阅读Pinker的单词和规则",但请帮自己一个忙,不要完全将语言学的整个生成理论牢记在心.经验主义远不止于思想流派真正有助于追求.
很多英语都可以在统计上进行词素化.顺便说一句,词干或词条限制是您要寻找的术语. Porter Stemmer ,它足以有效地聚在一起. em>英文术语.
Given some (English) word that we shall assume is a plural, is it possible to derive the singular form? I'd like to avoid lookup/dictionary tables if possible.
Some examples:
Examples -> Example a simple 's' suffix Glitch -> Glitches 'es' suffix, as opposed to above Countries -> Country 'ies' suffix. Sheep -> Sheep no change: possible fallback for indeterminate values
Or, this seems to be a fairly exhaustive list.
Suggestions of libraries in language x
are fine, as long as they are open-source (ie, so that someone can examine them to determine how to do it in language y
)
It really depends on what you mean by 'programmatically'. Part of English works on easy to understand rules, and part doesn't. It has to do mainly with frequency. For a brief overview, you can read Pinker's "Words and Rules", but do yourself a favor and don't take the whole generative theory of linguistics entirely to heart. There's a lot more empiricism there than that school of thought really lends to the pursuit.
A lot of English can be statistically lemmatized. By the way, stemming or lemmatization is the term you're looking for. One of the most effective lemmatizers which work off of statistical rules bootstrapped with frequency-based exceptions is the Morpha Lemmatizer. You can give this a shot if you have a project that requires this type of simplification of strings which represent specific terms in English.
There are even more naive approaches that accomplish much with respect to normalizing related terms. Take a look at the Porter Stemmer, which is effective enough to cluster together most terms in English.
这篇关于您能以编程方式检测英语单词的复数形式并得出单数形式吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!