斯坦福 NLP 核心 4.0.0 不再拆分西班牙语中的动词和代词 [英] Stanford NLP core 4.0.0 no longer splitting verbs and pronouns in Spanish

查看:35
本文介绍了斯坦福 NLP 核心 4.0.0 不再拆分西班牙语中的动词和代词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

非常有用的斯坦福 NLP 核心 3.9.2 用于拆分卷在一起的西班牙语动词和代词

Very helpfully Stanford NLP core 3.9.2 used to split rolled together Spanish verbs and pronouns

这是 4.0.0 的输出:

This is the 4.0.0 output:

以前的版本有更多的 .tagger 文件.这些尚未包含在 4.0.0 发行版中.

The previous version had more .tagger files. These have not been included with the 4.0.0 distribution.

是不是这个原因.他们会被加回来吗?

Is that the cause. Will be they added back?

推荐答案

Stanford CoreNLP 4.0.0 仍有一些文档更新需要进行.

There are some documentation updates that still need to be made for Stanford CoreNLP 4.0.0.

一个主要的变化是添加了一个新的多词标记注释器,使标记化符合 UD 标准.所以新的默认西班牙语管道应该运行 tokenize,ssplit,mwt,pos,depparse,ner.目前可能无法从服务器演示运行这样的管道,因为需要进行一些修改.我可以尝试尽快向您发送此类修改.我们将尝试在初夏发布一个新版本来处理我们错过的此类问题.

A major change is that a new multi-word-token annotator has been added, that makes tokenization conform with the UD standard. So the new default Spanish pipeline should run tokenize,ssplit,mwt,pos,depparse,ner. It may not be possible to run such a pipeline from the server demo at this time, as some modifications will need to be made. I can try to send you what such modifications would be soon. We will try to make a new release in early summer to handle issues like this that we missed.

不幸的是,它不会在您的示例中拆分单词,但我认为在许多情况下它会做正确的事情.西班牙语 mwt 模型仅基于大型术语词典,并经过调整以优化西班牙语训练数据的性能.

It won't split the word in your example unfortunately, but I think in many cases it will do the correct thing. The Spanish mwt model is just based off of a large dictionary of terms, and was tuned to optimize performance on the Spanish training data.

这篇关于斯坦福 NLP 核心 4.0.0 不再拆分西班牙语中的动词和代词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆