命令行的OpenNLP POSTagger输出 [英] OpenNLP POSTagger output from command line

查看:82
本文介绍了命令行的OpenNLP POSTagger输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 OpenNLP 来标记泰语单词.我下载了OpenNLP和泰国标记化模型并运行以下命令

I want to use OpenNLP in order to tokenize Thai words. I downloaded OpenNLP and Thai tokenize model and run the following

./bin/opennlp POSTagger -lang th -model thai.tok.bin < sentence.txt > output.txt

我将下载的thai.tok.bin放在从其调用的目录中,然后运行以下命令. sentence.txtกินอะไรยังนาย中包含此文本.但是,我得到的输出只有这些文本:

I put thai.tok.bin that I downloaded on the directory that I call from and run the following. sentence.txt has this text inside กินอะไรยังนาย. However, the output I got has only these text:

Usage: opennlp POSTagger model < sentences
Execution time: 0.000 seconds

我对OpenNLP还是陌生的,如果有人知道如何从中获取输出,请告诉我.

I'm pretty new to OpenNLP, please let me know if anyone knows how to get output from it.

推荐答案

链接已过时.首先,您需要一些手动步骤来转换模型.

The models from your link are outdated. First you need some manual steps to convert the model.

  1. 下载文件 thai.tok.bin.gz 并解压缩到一个空文件夹.将提取的文件thai.tok.bin重命名为token.model
  2. 在同一文件夹中,创建一个名为manifest.properties的文件,其内容如下:

  1. Download the file thai.tok.bin.gz and extract to an empty folder. Rename the extracted file thai.tok.bin to token.model
  2. In the same folder, create a file named manifest.properties with the following contents:

Manifest-Version=1.0.  
Language=th  
OpenNLP-Version=1.5.0  
Component-Name=TokenizerME  
useAlphaNumericOptimization=false  

  • 现在您可以压缩文件,如果您使用的是Linux,则可以使用以下命令:zip thai.tok.bin token.model manifest.properties

    尝试您的模型:

    sh bin/opennlp TokenizerME ~/Downloads/thai-token.bin/thai.tok.bin <  thai_sentence.txt
    
    
    
    Loading Tokenizer model ... done (0,097s)     
    กินอะไร ยังนาย     
    
    
    Average: 333,3 sent/s      
    Total: 1 sent     
    Runtime: 0.003s     
    Execution time: 0,108 seconds 
    

  • 现在您有了更新的令牌生成器,您可以使用POS Tagger模型执行类似的操作.

    Now that you have the updated tokenizer, you can do similar with the POS Tagger model.

    1. 下载文件 thai.tag.bin .gz 并解压缩到一个空文件夹.将提取的文件thai.tag.bin重命名为pos.model

    1. Download the file thai.tag.bin.gz and extract to a empty folder. Rename the extracted file thai.tag.bin to pos.model

    在同一文件夹中,创建一个名为manifest.properties的文件,其内容如下:

    In the same folder, create a file named manifest.properties with the following contents:

    Manifest-Version=1.0
    Language=th
    OpenNLP-Version=1.5.0
    Component-Name=POSTaggerME
    

  • 现在您可以压缩文件,如果您使用的是Linux,则可以使用以下命令:zip thai.pos.bin pos.model manifest.properties

    最后,我们可以尝试将两种模型结合使用:

    Finally, we can try the two models combined:

    sh bin/opennlp TokenizerME ~/Downloads/thai-token.bin/thai.tok.bin < thai_sentence.txt > thai_tokens.txt
    sh bin/opennlp POSTagger ~/Downloads/pt-pos-maxent/thai.pos.bin < thai_tokens.txt
    

    结果是:

    กินอะไร_VACT ยังนาย_NCMN
    

    请让我知道这是否是预期的结果.

    Please, let me know if this is the expected result.

    这篇关于命令行的OpenNLP POSTagger输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆