Stanford-NER定制,用于对软件编程关键字进行分类 [英] Stanford-NER customization to classify software programming keywords

查看:265
本文介绍了Stanford-NER定制,用于对软件编程关键字进行分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是NLP的新手,我使用Stanford NER工具对一些随机文本进行分类,以提取软件编程中使用的特殊关键字。

I am new in NLP and I used Stanford NER tool to classify some random text to extract special keywords used in software programming.

问题是,我不知道如何对Stanford NER中的分类器和文本注释器进行更改以识别软件编程关键字。例如:

The problem is, I don't no how to do changes to the classifiers and text annotators in Stanford NER to recognize software programming keywords. For example:

today Java used in different operating systems (Windows, Linux, ..)

分类结果如下:

Java "Programming_Language"
Windows "Operating_System"
Linux "Operating_system"

请问如何自定义StanfordNER分类器以满足我的需求?

Would you please help on how to customize the StanfordNER classifiers to satisfied my needs?

推荐答案

我认为它在斯坦福NER常见问题解答部分 http://nlp.stanford.edu/software/crf- faq.shtml#一个

I think it is quite well documented in Stanford NER faq section http://nlp.stanford.edu/software/crf-faq.shtml#a.

以下是步骤:


  • 在属性文件中将地图更改为指定训练数据的注释方式(或
    结构化)

map = word = 0,myfeature = 1,answer = 2

map = word=0,myfeature=1,answer=2


  • src \edu\stanford \ nlp \ sequences \ SeqClassifierFlags中。 java

添加一个标志,表示您要使用新功能,我们称之为useMyFeature
public boolean useLabelSource = false ,Add
public boolean useMyFeature = true;

Add a flag stating that you want to use your new feature, let's call it useMyFeature Below public boolean useLabelSource = false , Add public boolean useMyFeature= true;

中的同一文件中setProperties(Properties props,boolean printProps)
之后的方法 else if(key.equalsIgnoreCase(useTrainLexicon)){..} 告诉工具,如果这个标志是开/关的话

In same file in setProperties(Properties props, boolean printProps) method after else if (key.equalsIgnoreCase("useTrainLexicon")) { ..} tell tool, if this flag is on/off for you

else if (key.equalsIgnoreCase("useMyFeature")) {
      useMyFeature= Boolean.parseBoolean(val);
}


  • src / edu / stanford / nlp / ling / CoreAnnotations.java ,添加以下
    部分

  • In src/edu/stanford/nlp/ling/CoreAnnotations.java, add following section

    public static class myfeature implements CoreAnnotation<String> {
      public Class<String> getType() {
        return String.class;
      }
    }
    


  • src /edu/stanford/nlp/ling/AnnotationLookup.java in
    public enumKeyLookup {..} in bottom add

  • In src/edu/stanford/nlp/ling/AnnotationLookup.java in public enumKeyLookup{..} in bottom add

    MY_TAG(CoreAnnotations.myfeature.class,myfeature)

    MY_TAG(CoreAnnotations.myfeature.class,"myfeature")

    src \\ \\ uuu\stanford\\\
    lp\\\\NERFeatureFactory.java
    ,取决于它的
    类型,添加

    In src\edu\stanford\nlp\ie\NERFeatureFactory.java, depending on the "type" of feature it is, add in

    protected Collection<String> featuresC(PaddedList<IN> cInfo, int loc)
    
    if(flags.useRahulPOSTAGS){
        featuresC.add(c.get(CoreAnnotations.myfeature.class)+"-my_tag");
    }
    


  • 调试:
    除此之外,还有一些方法可以将功能转储到文件中,使用它们来查看事情是如何完成的。另外,我认为你也需要花一些时间在调试器上:P

    Debugging: In addition to this, there are methods which dump the features on file, use them to see how things are getting done under hood. Also, I think you would have to spend some time with debugger too :P

    这篇关于Stanford-NER定制,用于对软件编程关键字进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆