注释语料库(语法网) [英] Annotating a Corpus (Syntaxnet)

查看:154
本文介绍了注释语料库(语法网)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我按照 Syntax官方文档在Github上下载并安装了SyntaxNet .根据文档(注释语料库),我尝试读取SyntaxNet名为wj.conll.conll文件,并将结果写入wj-tagged.conll中,但我做不到.我的问题是:

I downloaded and installed SyntaxNet following Syntax official documentation on Github. following the documentation (annotating corpus) I tried to read a .conll file named wj.conll by SyntaxNet and write the results in wj-tagged.conll but I could not. My questions are:

  1. SyntaxNet是否总是读取.conll文件? (不是.txt文件?).我有些困惑,因为我知道SyntaxNet会读取.conll文件进行培训和测试过程,但是我有点怀疑是否有必要将.txt文件转换为.conll文件,以使它们的Speach和依赖性解析.

  1. does SyntaxNet always reads .conll files? (not .txt files?). I got a bit confused as I knew SyntaxNet reads .conll file for training and testing process but I am a bit suspicious that it is necessary to convert a .txt file to .conll file in order to have their Part Of Speach and Dependancy Parsing.

如何使SyntaxNet从文件中读取(我厌倦了GitHub文档中有关SyntaxNet的所有可能方法,但对我而言不起作用)

How can I make SyntaxNet reads from files (I tired all possible ways explain in GitHub documentation about SyntaxNet and It didn't work for me)

推荐答案

将这些声明行添加到文件末尾的"context.pbtxt"中.这里的"inp"和"out"是syntexnet根目录中存在的文本文件.

Add these declaration lines to "context.pbtxt" at the end of the file. Here "inp" and "out" are the text files present in the root directory of syntexnet.

   input {
   name: 'inp_file'
   record_format: 'english-text'
     Part {
     file_pattern: 'inp'
     }
   }
   input {
   name: 'out_file'
   record_format: 'english-text'
     Part {
     file_pattern: 'out'
     }
   }

将要添加标记的句子添加到"inp"文件中,并在下次使用--input和--output标记运行语法网时在shell中指定它们.

Add sentences to the "inp" file for which you want tagging to be done and specify them in shell the next time you run syntaxnet using --input and --output tags.

只是为了帮助您更多一点,我粘贴了一个示例shell命令.

Just to help you a bit more I am pasting an example shell command.

bazel-bin/syntaxnet/parser_eval \
--input inp_file \
--output stdout-conll \
--model syntaxnet/models/parsey_mcparseface/tagger-params \
--task_context syntaxnet/models/parsey_mcparseface/context.pbtxt \
--hidden_layer_sizes 64 \
--arg_prefix brain_tagger \
--graph_builder structured \
--slim_model \
--batch_size 1024 | bazel-bin/syntaxnet/parser_eval \
--input stdout-conll  \
--output out_file \
--hidden_layer_sizes 512,512 \
--arg_prefix brain_parser \
--graph_builder structured \
--task_context syntaxnet/models/parsey_mcparseface/context.pbtxt \
--model_path syntaxnet/models/parsey_mcparseface/parser-params \
--slim_model --batch_size 1024

在上面的脚本中,第一个shell命令的输出(POS标记)用作第二个shell命令的输入,其中两个shell命令以"|"分隔.

In the above script the output(POS tagging) of the first shell command is used as an input for the second shell command, where the two shell commands are seperated by "|"

这篇关于注释语料库(语法网)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆