Objective-C:NSLinguisticTagger“new york” vs“纽约” [英] Objective-C: NSLinguisticTagger "new york" vs "New York"

查看:86
本文介绍了Objective-C:NSLinguisticTagger“new york” vs“纽约”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚开始玩 NSLinguisticTagger 我的代码基于这个博客: NSLinguisticTagger @ NSHipster.com

I just started playing around with NSLinguisticTagger basing my code on this blog: NSLinguisticTagger @ NSHipster.com

NSLinguisticTaggerOptions options = NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation | NSLinguisticTaggerJoinNames;
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes: [NSLinguisticTagger availableTagSchemesForLanguage:@"en"] options:options];
tagger.string = question;
[tagger enumerateTagsInRange:NSMakeRange(0, [question length]) scheme:NSLinguisticTagSchemeNameTypeOrLexicalClass options:options usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
NSString *token = [question substringWithRange:tokenRange];
NSLog(@"%@: %@", token, tag); }];

当我用问题= @纽约周末运行时纽约被标记为 PlaceName 这很棒。但当我用 question = @纽约周末运行时,new被标记为形容词york被标记为 PlaceName 。有没有办法解决这个问题,纽约纽约都被标记为地名

When I run this with question = @"Weekend in New York", "New York" gets tagged as PlaceName which is great. But when I run this with question = @"Weekend in new york", "new" gets tagged as "Adjective" and "york" gets tagged as PlaceName. Is there any way to get around this such that "New York" and "new york" both get tagged as PlaceName?

我对这种语言学事物完全陌生。

I'm totally new to this linguistics thing.

推荐答案

进一步讨论这个话题。 NSLinguisticTagger识别名称需要正确的名字和姓氏大写

Taking this topic a little further. Correct capitalization of first name and last name is a requirement for the NSLinguisticTagger to identify names.

经过几个小时的挫折之后,我决定用大写,小写和大写单词创建各种测试。

After several hours of frustration, I decided to create various tests with uppercase, lowercase and capitalized-case words.

NSLinguisticTagger几乎在所有测试中都有不同的结果

The NSLinguisticTagger had different results in almost all tests

当NSLinguisticTagger解析大写字母的字符串时,几乎所有的名词都是标记为personalName 。 wtf?

非常令人沮丧。

我想分享的课程是NSLinguistic标记器可以猜测它放在单词上的标签,但最后它只是对单词的语法评估。评估取决于正确的语言结构,例如单词放置以及单词是否大写。

The lesson I want to share is that the NSLinguistic tagger can guess at the tags it places on words, but in the end it is just a grammatical evaluation of words. The evaluation depends on proper language constructs such as word placement and whether the word is capitalized or not.

我仍然认为这是一个有用的课程,但这篇文章的寓意是Be Proper

I am still finding it a useful class, but the moral of this post is to "Be Proper".

在解析文本时,有时我们程序员倾向于使用大写和小写来简化我们的工作。我们仍然可以这样做,但请记住,单词大小写会改变NSLinguisticTagger结果

When parsing text sometimes we programmers have a tendency to play with uppercasing and lowercasing to simplify our work. We can still do this, but just keep in mind that word casing does change the NSLinguisticTagger results.

这篇关于Objective-C:NSLinguisticTagger“new york” vs“纽约”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆