斯坦福NLP令牌Regex--无法识别NER [英] Stanford NLP Tokens Regex -- doesn't recognize NER
本文介绍了斯坦福NLP令牌Regex--无法识别NER的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我只是刚刚开始使用令牌Regex。我还没有找到一本能给我所需的介绍或教程。(如果我错过了什么,我很感激链接!)
最基本的想法是我想做一些事情,比如使用
pattern: ( ( [ { ner:PERSON } ]) /was/ /born/ /on/ ([ { ner:DATE } ]) )
(出自https://nlp.stanford.edu/software/tokensregex.html)
匹配"John Smith出生于1999年3月1日",然后可以提取"John Smith"作为人名,将"1999年3月1日"作为日期。
我从几次网络搜索中拼凑出以下内容。我可以让简单的Java regex/John/
匹配,但当我使用NER时,我尝试过的(都是从网络搜索复制的,并进行了一些调整)都不匹配。
为清晰起见进行编辑:(以下代码中的matcher2.matches()
目前的成功/失败为True/False。)
我不知道我是否需要明确提到某个模型、注释或其他东西,或者我是否遗漏了其他东西,或者我只是以完全错误的方式处理它。
任何真知灼见,不胜感激!谢谢!
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher;
import edu.stanford.nlp.ling.tokensregex.TokenSequencePattern;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.junit.Test;
public class StanfordSandboxTest {
private static final Log log = LogFactory.getLog(StanfordSandboxTest.class);
@Test
public void testFirstAttempt() {
Properties props2;
StanfordCoreNLP pipeline2;
TokenSequencePattern pattern2;
Annotation document2;
List<CoreMap> sentences2;
TokenSequenceMatcher matcher2;
String text2;
props2 = new Properties();
props2.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner, parse, dcoref");
pipeline2 = new StanfordCoreNLP(props2);
text2 = "March 1, 1999";
pattern2 = TokenSequencePattern.compile("pattern: (([{ner:DATE}])");
document2 = new Annotation(text2);
pipeline2.annotate(document2);
sentences2 = document2.get(CoreAnnotations.SentencesAnnotation.class);
matcher2 = pattern2.getMatcher(sentences2);
log.info("testFirstAttempt: Matches2: " + matcher2.matches());
props2 = new Properties();
props2.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner, parse, dcoref");
pipeline2 = new StanfordCoreNLP(props2);
text2 = "John";
pattern2 = TokenSequencePattern.compile("/John/");
document2 = new Annotation(text2);
pipeline2.annotate(document2);
sentences2 = document2.get(CoreAnnotations.SentencesAnnotation.class);
matcher2 = pattern2.getMatcher(sentences2);
log.info("testFirstAttempt: Matches2: " + matcher2.matches());
}
}
推荐答案
示例代码:
package edu.stanford.nlp.examples;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import java.util.*;
public class TokensRegexExampleTwo {
public static void main(String[] args) {
// set up properties
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,tokensregex");
props.setProperty("tokensregex.rules", "multi-step-per-org.rules");
props.setProperty("tokensregex.caseInsensitive", "true");
// set up pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// set up text to annotate
Annotation annotation = new Annotation("Joe Smith works for Apple Inc.");
// annotate text
pipeline.annotate(annotation);
// print out found entities
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
System.out.println(token.word() + " " + token.ner());
}
}
}
}
规则文件示例:
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
$ORGANIZATION_TITLES = "/inc.|corp./"
$COMPANY_INDICATOR_WORDS = "/company|corporation/"
ENV.defaults["stage"] = 1
{ pattern: (/works/ /for/ ([{pos: NNP}]+ $ORGANIZATION_TITLES)), action: (Annotate($1, ner, "RULE_FOUND_ORG") ) }
ENV.defaults["stage"] = 2
{ pattern: (([{pos: NNP}]+) /works/ /for/ [{ner: "RULE_FOUND_ORG"}]), action: (Annotate($1, ner, "RULE_FOUND_PERS") ) }
这将把NER标签应用于"Joe Smith"和"Apple Inc."。您可以根据您的特定情况对其进行调整。请让我知道,如果你想做一些更高级的事情,而不仅仅是应用NER标签。注意:请确保将这些规则放入名为"MULTI-Step-per-org.rules"的文件中。
这篇关于斯坦福NLP令牌Regex--无法识别NER的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文