如何利用USE SharpNlp在我的C#应用​​程序 [英] How to make use of USE SharpNlp in my c# application

查看:585
本文介绍了如何利用USE SharpNlp在我的C#应用​​程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要词性标注为我的文集文件。
我已经成功地遵循 SharpNlp <安装说明/我使用

I require POS tagging for my files in the corpus. I have successfully followed the installation instructions of SharpNlp
I am using the binary version

I created a new c# project in:       E:\sharp\sharpapp
location of Models Folder is:        E:\sharp\sharpapp\bin\Models
location of my SharpNlp Binary is:   E:\sharp\SharpNLP-1.0.2529-Bin

我也跟着指示,以修改这两个.config文件ParseTree.Exe 和ToolsExamples.Exe

I have also followed the instructions to modify both .config files "ParseTree.Exe" and "ToolsExamples.Exe"

现在在我的C#项目我有一个类叫做tagging.cs在那里我有访问我的语料库的文本文件,并做词性标注为那些文件。任何人可以帮助我,我怎么可以利用SharpNlp的这样做。

Now in my c# project I have a class called tagging.cs where I have to access my corpus text files and do POS tagging for those files. Can anybody help me how can I make use of SharpNlp to do so

PLZ提供步骤这样做。

plz provide steps to do so.

我将非常感谢ü所有。
谢谢你。结果
amey。

i will be really grateful to u all. thank you.
amey.

推荐答案

在简单地说, SharpNLP

  • a port to C# of OpenNLP Tools and OpenNLP MaxEnt
  • a connector to WordNet
  • a set of pre-computed models, mostly for the English language
  • utility modules such as integration with SQLLite

应当指出的是,OpenNLP库的端口相对非正式的,与各种类和属性的名称的变化,特征可能松动保存和语义与原始的Java项目的生命周期没有明显的联系。这种情况很可能会确保及时SharpNLP的OpenNLP部分将更加类似于远亲比孪生姐妹...

It should be noted that the port of the OpenNLP libraries is relatively informal, with various class and property name changes, possibly loose preservation of features and semantics and no apparent connection with the original Java projects' lifecycle. This situation will likely ensure that in time the OpenNLP portion of SharpNLP will be more akin to distant cousins than twin sisters...

从来没有少,这是可以用实例和文档从OpenNLP,以补充现有的比较薄的载体材料与SharpNLP 。 SharpNLP的源代码和资源类的 OpenNLP API参考 OpenNLP维基 ,人们可以大致映射的东西,相应的调整。

Never the less, it is possible to use examples and documentation from OpenNLP to complement the relatively thin support material available with SharpNLP. Between the source code of SharpNLP and resources like the OpenNLP API reference and the OpenNLP wiki, one can generally map things and adapt accordingly.

一个松散的导线可能是这个特殊的source文件它利用OpenNLP的,似乎接近你可能需要什么办法。注意OpenNLP和SharpNLP的名称更改,例如 POSTTaggerME 的类成为的 MaximumEntropyPosTagger 的和的解析()的方法及其重载转向的 TagSentence()的和这样的。

A loose conductor could be the study of this particular source file which makes use of OpenNLP in a way that seems close to what you may need. Note the name changes between OpenNLP and SharpNLP, for example POSTTaggerME class becomes MaximumEntropyPosTagger and the Parse() method and its overload turn to TagSentence() and such.

一个更一般的提示是要了解......结果
... 序列通常必要的步骤来执行词性标注。结果
这是一个非常高层次的近似描述,但是,我认为,非常有用。

A more general hint is to understand...
...the sequence of steps typically necessary to perform POS Tagging.
This is a very high-level approximate description but, I think, useful.


  • 获取文本被标记=文本串(S)

  • 初始化文本 解析器

  • 解析=一个个别的标记即词和标点符号。
  • 阵列(或其他容器)
  • 初始化POS标注器,特别告知其哪些 模式它应该使用

  • 饲料[订购]代币的POS标注器的序列

  • 钽DAH!使用POS标签为您的NLP应用的最终目的。

  • get the text to be tagged = string(s) of text
  • Initialize a text parser
  • parse it = an "array" (or other container) with individual tokens i.e. words and punctuation characters.
  • initialize the POS Tagger, in particular tell its which model it should use
  • feed the [ordered] sequence of tokens to the POS Tagger
  • Ta dah! Use the POS tags for the eventual purpose of your NLP application.

请注意上面的顺序是如何假定模型更是一应俱全。< BR>
的模式很容易标记。结果
SharpNLP一般文本统计配置文件,从与一组文字的训练标注器获得的表示附带了通用英语的典范语言,但为了标记其它语言,或者如果被标记的具体语料库属于特定域(说医疗报告或鸣叫或...)可以优选以重新训练标注器,以提高其精度。<无线电通信>
打开/ SharpNLP大多数POS标记加注者,无论是单机或他们的API,通常包括功能训练他们(=产生一个模型给定文本的样本集易于标记),也以验证质量型号/恶搞这样产生(=来比较测试集生产标签,预计这组标记)。

Note how the above sequence assumes that the model is readily available.
The model is a representation of the statistical "profile" of text in general, obtained from training the Tagger with a set of text readily tagged.
SharpNLP comes with a model for generic English language, but in order to tag other languages or if the specific corpora to be tagged belongs to a particular domain (say medical reports or Tweets or...) it may be preferable to re-train the tagger to improve its precision.
Open/SharpNLP as most POS Taggers, whether stand-alone or their API, typically include features to train them (= to produce a model given a sample set of text readily tagged) and also to verify the quality of the model/tagger so produced (= to compare the tags produced on a test set, with the tags expected for this set).

这篇关于如何利用USE SharpNlp在我的C#应用​​程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆