斯坦福-NER定制软件编程关键词分类 [英] Stanford-NER customization to classify software programming keywords
问题描述
我是 NLP 新手,我使用斯坦福 NER 工具对一些随机文本进行分类,以提取软件编程中使用的特殊关键字.
I am new in NLP and I used Stanford NER tool to classify some random text to extract special keywords used in software programming.
问题是,我不知道如何更改斯坦福 NER 中的分类器和文本注释器以识别软件编程关键字.例如:
The problem is, I don't no how to do changes to the classifiers and text annotators in Stanford NER to recognize software programming keywords. For example:
today Java used in different operating systems (Windows, Linux, ..)
分类结果应如:
Java "Programming_Language"
Windows "Operating_System"
Linux "Operating_system"
您能否帮助我定制斯坦福纳分类器以满足我的需求?
Would you please help on how to customize the StanfordNER classifiers to satisfied my needs?
推荐答案
我认为斯坦福 NER 常见问题部分 http://nlp.stanford.edu/software/crf-faq.shtml#a.
I think it is quite well documented in Stanford NER faq section http://nlp.stanford.edu/software/crf-faq.shtml#a.
步骤如下:
- 在您的属性文件中更改地图以指定如何注释您的训练数据(或结构化)
map = word=0,myfeature=1,answer=2
map = word=0,myfeature=1,answer=2
在
srcedustanford lpsequencesSeqClassifierFlags.java
In
srcedustanford lpsequencesSeqClassifierFlags.java
添加一个标志,表明您要使用新功能,我们称之为 useMyFeature在 public boolean useLabelSource = false
下方,添加public boolean useMyFeature= true;
Add a flag stating that you want to use your new feature, let's call it useMyFeature
Below public boolean useLabelSource = false
, Add
public boolean useMyFeature= true;
在同一个文件中 setProperties(Properties props, boolean printProps)
方法之后else if (key.equalsIgnoreCase("useTrainLexicon")) { ..}
告诉工具,这个标志是否为你打开/关闭
In same file in setProperties(Properties props, boolean printProps)
method after
else if (key.equalsIgnoreCase("useTrainLexicon")) { ..}
tell tool, if this flag is on/off for you
else if (key.equalsIgnoreCase("useMyFeature")) {
useMyFeature= Boolean.parseBoolean(val);
}
在src/edu/stanford/nlp/ling/CoreAnnotations.java
中,添加以下内容部分
public static class myfeature implements CoreAnnotation<String> {
public Class<String> getType() {
return String.class;
}
}
在 src/edu/stanford/nlp/ling/AnnotationLookup.java
中public enumKeyLookup{..}
在底部添加
MY_TAG(CoreAnnotations.myfeature.class,"myfeature")
MY_TAG(CoreAnnotations.myfeature.class,"myfeature")
在 srcedustanford
lpieNERFeatureFactory.java
中,取决于它是特征的类型",添加
In srcedustanford
lpieNERFeatureFactory.java
, depending on the
"type" of feature it is, add in
protected Collection<String> featuresC(PaddedList<IN> cInfo, int loc)
if(flags.useRahulPOSTAGS){
featuresC.add(c.get(CoreAnnotations.myfeature.class)+"-my_tag");
}
调试:除此之外,还有一些方法可以将功能转储到文件中,使用它们来查看事情是如何在后台完成的.另外,我认为您也必须花一些时间在调试器上:P
Debugging: In addition to this, there are methods which dump the features on file, use them to see how things are getting done under hood. Also, I think you would have to spend some time with debugger too :P
这篇关于斯坦福-NER定制软件编程关键词分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!