我怎样才能正确地用“a"前缀一个单词?和“一个"? [英] How can I correctly prefix a word with "a" and "an"?

查看:23
本文介绍了我怎样才能正确地用“a"前缀一个单词?和“一个"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 .NET 应用程序,其中给定一个名词,我希望它在该词前面正确加上a"或an".我该怎么做?

I have a .NET application where, given a noun, I want it to correctly prefix that word with "a" or "an". How would I do that?

在您认为答案是简单地检查第一个字母是否是元音之前,请考虑以下短语:

Before you think the answer is to simply check if the first letter is a vowel, consider phrases like:

  • 一个诚实的错误
  • 二手车

推荐答案

  1. 下载维基百科
  2. 解压并编写一个仅输出文章文本的快速过滤程序(下载内容通常为 XML 格式,还有非文章元数据).
  3. 找到 a(n).... 的所有实例,并为以下单词及其所有前缀创建索引(您可以为此使用简单的后缀).这应该区分大小写,并且您需要最大字长 - 15 个字母?
  4. (可选)丢弃所有出现少于 5 次或a"出现的前缀与一个"达到少于 2/3 的多数(或其他一些阈值 - 在此处进行调整).最好保留空前缀以避免极端情况.
  5. 您可以通过丢弃所有父级共享相同a"的前缀来优化您的前缀数据库.或一个"注释.
  6. 在确定是否使用A"时或AN"找到最长的匹配前缀,并跟随它的引导.如果你在第 4 步中没有丢弃空前缀,那么总是会有一个匹配的前缀(即空前缀),否则你可能需要一个完全不匹配的字符串的特殊情况(这样的输入应该很少见).
  1. Download Wikipedia
  2. Unzip it and write a quick filter program that spits out only article text (the download is generally in XML format, along with non-article metadata too).
  3. Find all instances of a(n).... and make an index on the following word and all of its prefixes (you can use a simple suffixtrie for this). This should be case sensitive, and you'll need a maximum word-length - 15 letters?
  4. (optional) Discard all those prefixes which occur less than 5 times or where "a" vs. "an" achieves less than 2/3 majority (or some other threshholds - tweak here). Preferably keep the empty prefix to avoid corner-cases.
  5. You can optimize your prefix database by discarding all those prefixes whose parent shares the same "a" or "an" annotation.
  6. When determining whether to use "A" or "AN" find the longest matching prefix, and follow its lead. If you didn't discard the empty prefix in step 4, then there will always be a matching prefix (namely the empty prefix), otherwise you may need a special case for a completely-non matching string (such input should be very rare).

您可能没有比这更好的了 - 它肯定会击败大多数基于规则的系统.

You probably can't get much better than this - and it'll certainly beat most rule-based systems.

我已经在 JS/C# 中实现了这个.您可以在浏览器中试用,或下载小型、可重用的 javascript 实现它用..NET 实现是 nuget 上的 AvsAn.实现是微不足道的,因此在必要时应该很容易移植到任何其他语言.

I've implemented this in JS/C#. You can try it in your browser, or download the small, reusable javascript implementation it uses. The .NET implementation is package AvsAn on nuget. The implementations are trivial, so it should be easy to port to any other language if necessary.

结果是规则"比我想象的要复杂得多:

Turns out the "rules" are quite a bit more complex than I thought:

  • 这是一个意料之外的结果,但却是一致投票
  • 这是一个诚实的决定,但是一种金银花灌木
  • 符号: 0800 数字,或 ∞ 牛至.
  • 缩写词: NASA 科学家, NSA 分析师; FIAT 汽车,但 FAA 政策.
  • it's an unanticipated result but it's a unanimous vote
  • it's an honest decision but a honeysuckle shrub
  • Symbols: It's an 0800 number, or an ∞ of oregano.
  • Acronyms: It's a NASA scientist, but an NSA analyst; a FIAT car but an FAA policy.

...这只是强调基于规则的系统很难构建!

...which just goes to underline that a rule based system would be tricky to build!

这篇关于我怎样才能正确地用“a"前缀一个单词?和“一个"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆