Stanford-NER定制,用于对软件编程关键字进行分类 [英] Stanford-NER customization to classify software programming keywords
问题描述
我是NLP的新手,我使用Stanford NER工具对一些随机文本进行分类,以提取软件编程中使用的特殊关键字。
I am new in NLP and I used Stanford NER tool to classify some random text to extract special keywords used in software programming.
问题是,我不知道如何对Stanford NER中的分类器和文本注释器进行更改以识别软件编程关键字。例如:
The problem is, I don't no how to do changes to the classifiers and text annotators in Stanford NER to recognize software programming keywords. For example:
today Java used in different operating systems (Windows, Linux, ..)
分类结果如下:
Java "Programming_Language"
Windows "Operating_System"
Linux "Operating_system"
请问如何自定义StanfordNER分类器以满足我的需求?
Would you please help on how to customize the StanfordNER classifiers to satisfied my needs?
推荐答案
我认为它在斯坦福NER常见问题解答部分 http://nlp.stanford.edu/software/crf- faq.shtml#一个。
I think it is quite well documented in Stanford NER faq section http://nlp.stanford.edu/software/crf-faq.shtml#a.
以下是步骤:
- 在属性文件中将地图更改为指定训练数据的注释方式(或
结构化)
map = word = 0,myfeature = 1,answer = 2
map = word=0,myfeature=1,answer=2
-
在
src \edu\stanford \ nlp \ sequences \ SeqClassifierFlags中。 java
添加一个标志,表示您要使用新功能,我们称之为useMyFeature
public boolean useLabelSource = false
,Add
public boolean useMyFeature = true;
Add a flag stating that you want to use your new feature, let's call it useMyFeature
Below public boolean useLabelSource = false
, Add
public boolean useMyFeature= true;
在中的同一文件中setProperties(Properties props,boolean printProps)
之后的方法 else if(key.equalsIgnoreCase(useTrainLexicon)){..}
告诉工具,如果这个标志是开/关的话
In same file in setProperties(Properties props, boolean printProps)
method after
else if (key.equalsIgnoreCase("useTrainLexicon")) { ..}
tell tool, if this flag is on/off for you
else if (key.equalsIgnoreCase("useMyFeature")) {
useMyFeature= Boolean.parseBoolean(val);
}
在 src / edu / stanford / nlp / ling / CoreAnnotations.java
,添加以下
部分
In src/edu/stanford/nlp/ling/CoreAnnotations.java
, add following
section
public static class myfeature implements CoreAnnotation<String> {
public Class<String> getType() {
return String.class;
}
}
在 src /edu/stanford/nlp/ling/AnnotationLookup.java
in
public enumKeyLookup {..}
in bottom add
In src/edu/stanford/nlp/ling/AnnotationLookup.java
in
public enumKeyLookup{..}
in bottom add
MY_TAG(CoreAnnotations.myfeature.class,myfeature)
MY_TAG(CoreAnnotations.myfeature.class,"myfeature")
在 src \\ \\ uuu\stanford\\\
,取决于它的
lp\\\\NERFeatureFactory.java
类型,添加
In src\edu\stanford\nlp\ie\NERFeatureFactory.java
, depending on the
"type" of feature it is, add in
protected Collection<String> featuresC(PaddedList<IN> cInfo, int loc)
if(flags.useRahulPOSTAGS){
featuresC.add(c.get(CoreAnnotations.myfeature.class)+"-my_tag");
}
调试:
除此之外,还有一些方法可以将功能转储到文件中,使用它们来查看事情是如何完成的。另外,我认为你也需要花一些时间在调试器上:P
Debugging: In addition to this, there are methods which dump the features on file, use them to see how things are getting done under hood. Also, I think you would have to spend some time with debugger too :P
这篇关于Stanford-NER定制,用于对软件编程关键字进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!