我该如何正确preFIX与&QUOT一个字; A"和"一"? [英] How can I correctly prefix a word with "a" and "an"?

查看:391
本文介绍了我该如何正确preFIX与&QUOT一个字; A"和"一"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在哪里,给定一个名词,我希望它能够正确preFIX这个词与一或一.NET应用程序。我会怎么做呢?

在你认为答案是简单地检查,如果第一个字母是元音,可以考虑这样的短语:


  • 一个诚实的错误

  • 二手车


解决方案

  1. 下载维基百科

  2. 解压,并编写吐出仅文章文本的快速过滤程序(下载通常是XML格式,具有非文档元数据也一起)。

  3. 找到一个(N)的所有实例....并就以下单词索引及其所有prefixes的(你可以使用这个简单的suffixtrie)。这应该是大小写敏感的,并且你需要一个最大单词长度 - 15个字母

  4. (可选)丢弃所有发生的5倍以内,或在一与一实现了不到2/3的多数票prefixes(或其他一些threshholds - 调整这里)。 preferably保持空preFIX避免角的情况。

  5. 您可以通过丢弃所有这些prefixes其母公司的股票相同的是或一个注解优化preFIX数据库。

  6. 当确定是使用A或一找最长匹配preFIX,并按照其领先地位。如果您未在步骤4丢弃空preFIX的话,会有的总是的是一个匹配preFIX(即空preFIX),否则你可能需要一个特殊的案例完全,不匹配的字符串(如输入应该是很罕见)。

您也许不能得到比这更好 - 它肯定会打败最基于规则的系统

编辑:我在JS / C#实现这。您可以尝试在您的浏览器,或者下载使用小,可重复使用的JavaScript实现。在.NET实现包 AvsAn 上的NuGet 。的实现是微不足道的,因此,如果必要,应该很容易地移植到任何其他语言

原来,规则比我想象的还要相当复杂一点:


  • 它的未预料到的结果,但它的 全票

  • 它的诚实的决定,但 金银花灌木

  • 符号:它的 0800号,或∞牛至

  • 缩写:它 美国航空航天局的科学家,但 NSA分析师; 的菲亚特汽车,但美国联邦航空局的政策。

...刚刚去强调以规则为基础的系统将是棘手的建设!

I have a .NET application where, given a noun, I want it to correctly prefix that word with "a" or "an". How would I do that?

Before you think the answer is to simply check if the first letter is a vowel, consider phrases like:

  • an honest mistake
  • a used car

解决方案

  1. Download Wikipedia
  2. Unzip it and write a quick filter program that spits out only article text (the download is generally in XML format, along with non-article metadata too).
  3. Find all instances of a(n).... and make an index on the following word and all of its prefixes (you can use a simple suffixtrie for this). This should be case sensitive, and you'll need a maximum word-length - 15 letters?
  4. (optional) Discard all those prefixes which occur less than 5 times or where "a" vs. "an" achieves less than 2/3 majority (or some other threshholds - tweak here). Preferably keep the empty prefix to avoid corner-cases.
  5. You can optimize your prefix database by discarding all those prefixes whose parent shares the same "a" or "an" annotation.
  6. When determining whether to use "A" or "AN" find the longest matching prefix, and follow its lead. If you didn't discard the empty prefix in step 4, then there will always be a matching prefix (namely the empty prefix), otherwise you may need a special case for a completely-non matching string (such input should be very rare).

You probably can't get much better than this - and it'll certainly beat most rule-based systems.

Edit: I've implemented this in JS/C#. You can try it in your browser, or download the small, reusable javascript implementation it uses. The .NET implementation is package AvsAn on nuget. The implementations are trivial, so it should be easy to port to any other language if necessary.

Turns out the "rules" are quite a bit more complex than I thought:

  • it's an unanticipated result but it's a unanimous vote
  • it's an honest decision but a honeysuckle shrub
  • Symbols: It's an 0800 number, or an ∞ of oregano.
  • Acronyms: It's a NASA scientist, but an NSA analyst; a FIAT car but an FAA policy.

...which just goes to underline that a rule based system would be tricky to build!

这篇关于我该如何正确preFIX与&QUOT一个字; A"和"一"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆