具有少量数据集(语料库)的命名实体识别 [英] Named entity recognition with a small data set (corpus)

本文介绍了具有少量数据集(语料库)的命名实体识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想开发一种波斯语命名实体识别系统,但我们有一个带有NER标签的小型语料库,用于训练ans测试。也许将来我们会有更好更好的语料库。
顺便说一句,我需要一种解决方案,只要添加新数据而无需将新数据与旧数据合并并从头进行培训,就可以逐渐获得更好的性能。
有什么解决方案吗?

I want to develop a Named entity recognition system in Persian language but we have a small NER tagged corpus for training ans test. Maybe In the future we'll have a better and bigger corpus. By the way I need a solution that get incrementally the better performance whenever the new data added without with merge the new data with old data and training from scratch. Is there any solution ?

推荐答案

是。在您的帮助下:这是一项正在进行的工作。这是JS,没有训练...

Yes. With your help: it is a work in progress. It is JS and "No training ..."

请参阅
https://github.com/redaktor/nlp_compromise/

这是我上次在NER上工作的地方

It is a fork where I worked on NER during the last days and it will be optimized for usage with different languages !!!

它是单词字典,规则字典和构建工具的组合。
在波斯语支持​​下工作真是太棒了(我正在德语学习)...
计划支持NER的

It is a combination of a dictionary for words, dictionary for rules + build tool. It would be awesome to work on persian support (I am working on german) ... It is planned to support NER of


  • 'CARDINAL'-> [就绪]

  • 'DATE'->基于日历的日历[公历已准备好]

  • ' DURATION'->参见上文[日期范围已准备就绪]

  • 'MEASURE'->基于系统的[公制和SI单位就绪,有80多个类别]

  • 'MONEY'->基于货币[几天后准备就绪]

  • 'PERSON'->基于单词/规则[英语/欧洲名称已经准备好] b $ b
  • 'ORGANIZATION'

  • 'LOCATION'

  • 'CARDINAL' -> [ready]
  • 'DATE' -> calendar based [gregorian calendar is ready]
  • 'DURATION' -> see above [date ranges are ready]
  • 'MEASURE' -> systems based [metric system and SI units ready, 80+ categories]
  • 'MONEY' -> currencies based [ready in a few days]
  • 'PERSON' -> word/rules based [english/european names are ready]
  • 'ORGANIZATION'
  • 'LOCATION'

我认为可以是一个起点?
我没有时间记录新功能-随时在github上打开问题。

I think it could be a starting point ? I did not find the time to document the new features - feel free to open issues on github.

这篇关于具有少量数据集(语料库)的命名实体识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆