Objective-C:NSLinguisticTagger“new york” vs“纽约” [英] Objective-C: NSLinguisticTagger "new york" vs "New York"
问题描述
我刚开始玩 NSLinguisticTagger
我的代码基于这个博客: NSLinguisticTagger @ NSHipster.com
I just started playing around with NSLinguisticTagger
basing my code on this blog: NSLinguisticTagger @ NSHipster.com
NSLinguisticTaggerOptions options = NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation | NSLinguisticTaggerJoinNames;
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes: [NSLinguisticTagger availableTagSchemesForLanguage:@"en"] options:options];
tagger.string = question;
[tagger enumerateTagsInRange:NSMakeRange(0, [question length]) scheme:NSLinguisticTagSchemeNameTypeOrLexicalClass options:options usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
NSString *token = [question substringWithRange:tokenRange];
NSLog(@"%@: %@", token, tag); }];
当我用问题= @纽约周末运行时
,纽约
被标记为 PlaceName
这很棒。但当我用 question = @纽约周末
运行时,new
被标记为形容词
和york
被标记为 PlaceName
。有没有办法解决这个问题,纽约
和纽约
都被标记为地名
?
When I run this with question = @"Weekend in New York"
, "New York"
gets tagged as PlaceName
which is great. But when I run this with question = @"Weekend in new york"
, "new"
gets tagged as "Adjective"
and "york"
gets tagged as PlaceName
. Is there any way to get around this such that "New York"
and "new york"
both get tagged as PlaceName
?
我对这种语言学事物完全陌生。
I'm totally new to this linguistics thing.
推荐答案
进一步讨论这个话题。 NSLinguisticTagger识别名称需要正确的名字和姓氏大写。
Taking this topic a little further. Correct capitalization of first name and last name is a requirement for the NSLinguisticTagger to identify names.
经过几个小时的挫折之后,我决定用大写,小写和大写单词创建各种测试。
After several hours of frustration, I decided to create various tests with uppercase, lowercase and capitalized-case words.
NSLinguisticTagger几乎在所有测试中都有不同的结果
The NSLinguisticTagger had different results in almost all tests
当NSLinguisticTagger解析大写字母的字符串时,几乎所有的名词都是标记为personalName 。 wtf?
非常令人沮丧。
我想分享的课程是NSLinguistic标记器可以猜测它放在单词上的标签,但最后它只是对单词的语法评估。评估取决于正确的语言结构,例如单词放置以及单词是否大写。
The lesson I want to share is that the NSLinguistic tagger can guess at the tags it places on words, but in the end it is just a grammatical evaluation of words. The evaluation depends on proper language constructs such as word placement and whether the word is capitalized or not.
我仍然认为这是一个有用的课程,但这篇文章的寓意是Be Proper。
I am still finding it a useful class, but the moral of this post is to "Be Proper".
在解析文本时,有时我们程序员倾向于使用大写和小写来简化我们的工作。我们仍然可以这样做,但请记住,单词大小写会改变NSLinguisticTagger结果。
When parsing text sometimes we programmers have a tendency to play with uppercasing and lowercasing to simplify our work. We can still do this, but just keep in mind that word casing does change the NSLinguisticTagger results.
这篇关于Objective-C:NSLinguisticTagger“new york” vs“纽约”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!