斯坦福-NER定制软件编程关键词分类 [英] Stanford-NER customization to classify software programming keywords

查看:27
本文介绍了斯坦福-NER定制软件编程关键词分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 NLP 新手,我使用斯坦福 NER 工具对一些随机文本进行分类,以提取软件编程中使用的特殊关键字.

I am new in NLP and I used Stanford NER tool to classify some random text to extract special keywords used in software programming.

问题是,我不知道如何更改斯坦福 NER 中的分类器和文本注释器以识别软件编程关键字.例如:

The problem is, I don't no how to do changes to the classifiers and text annotators in Stanford NER to recognize software programming keywords. For example:

today Java used in different operating systems (Windows, Linux, ..)

分类结果应如:

Java "Programming_Language"
Windows "Operating_System"
Linux "Operating_system"

您能否帮助我定制斯坦福纳分类器以满足我的需求?

Would you please help on how to customize the StanfordNER classifiers to satisfied my needs?

推荐答案

我认为斯坦福 NER 常见问题部分 http://nlp.stanford.edu/software/crf-faq.shtml#a.

I think it is quite well documented in Stanford NER faq section http://nlp.stanford.edu/software/crf-faq.shtml#a.

步骤如下:

  • 在您的属性文件中更改地图以指定如何注释您的训练数据(或结构化)

map = word=0,myfeature=1,answer=2

map = word=0,myfeature=1,answer=2

  • srcedustanford lpsequencesSeqClassifierFlags.java

  • In srcedustanford lpsequencesSeqClassifierFlags.java

添加一个标志,表明您要使用新功能,我们称之为 useMyFeature在 public boolean useLabelSource = false 下方,添加public boolean useMyFeature= true;

Add a flag stating that you want to use your new feature, let's call it useMyFeature Below public boolean useLabelSource = false , Add public boolean useMyFeature= true;

在同一个文件中 setProperties(Properties props, boolean printProps) 方法之后else if (key.equalsIgnoreCase("useTrainLexicon")) { ..} 告诉工具,这个标志是否为你打开/关闭

In same file in setProperties(Properties props, boolean printProps) method after else if (key.equalsIgnoreCase("useTrainLexicon")) { ..} tell tool, if this flag is on/off for you

else if (key.equalsIgnoreCase("useMyFeature")) {
      useMyFeature= Boolean.parseBoolean(val);
}

  • src/edu/stanford/nlp/ling/CoreAnnotations.java中,添加以下内容部分

    public static class myfeature implements CoreAnnotation<String> {
      public Class<String> getType() {
        return String.class;
      }
    }
    

  • src/edu/stanford/nlp/ling/AnnotationLookup.javapublic enumKeyLookup{..} 在底部添加

    MY_TAG(CoreAnnotations.myfeature.class,"myfeature")

    MY_TAG(CoreAnnotations.myfeature.class,"myfeature")

    srcedustanford lpieNERFeatureFactory.java 中,取决于它是特征的类型",添加

    In srcedustanford lpieNERFeatureFactory.java, depending on the "type" of feature it is, add in

    protected Collection<String> featuresC(PaddedList<IN> cInfo, int loc)
    
    if(flags.useRahulPOSTAGS){
        featuresC.add(c.get(CoreAnnotations.myfeature.class)+"-my_tag");
    }
    

  • 调试:除此之外,还有一些方法可以将功能转储到文件中,使用它们来查看事情是如何在后台完成的.另外,我认为您也必须花一些时间在调试器上:P

    Debugging: In addition to this, there are methods which dump the features on file, use them to see how things are getting done under hood. Also, I think you would have to spend some time with debugger too :P

    这篇关于斯坦福-NER定制软件编程关键词分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆