使用多语种软件包进行希伯来语中的命名实体识别 [英] use polyglot package for Named Entity Recognition in hebrew
本文介绍了使用多语种软件包进行希伯来语中的命名实体识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用多语种软件包进行希伯来语中的命名实体识别.
这是我的代码:
I am trying to use the polyglot package for Named Entity Recognition in hebrew.
this is my code:
# -*- coding: utf8 -*-
import polyglot
from polyglot.text import Text, Word
from polyglot.downloader import downloader
downloader.download("embeddings2.iw")
text = Text(u"in france and in germany")
print(type(text))
text2 = Text(u"נסעתי מירושלים לתל אביב")
print(type(text2))
print(text.entities)
print(text2.entities)
这是输出:
<class 'polyglot.text.Text'>
<class 'polyglot.text.Text'>
[I-LOC([u'france']), I-LOC([u'germany'])]
Traceback (most recent call last):
File "C:/Python27/Lib/site-packages/IPython/core/pyglot.py", line 15, in <module>
print(text2.entities)
File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 20, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "C:\Python27\lib\site-packages\polyglot\text.py", line 132, in entities
for i, (w, tag) in enumerate(self.ne_chunker.annotate(self.words)):
File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 20, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "C:\Python27\lib\site-packages\polyglot\text.py", line 100, in ne_chunker
return get_ner_tagger(lang=self.language.code)
File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 30, in memoizer
cache[key] = obj(*args, **kwargs)
File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 191, in get_ner_tagger
return NEChunker(lang=lang)
File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 104, in __init__
super(NEChunker, self).__init__(lang=lang)
File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 40, in __init__
self.predictor = self._load_network()
File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 109, in _load_network
self.embeddings = load_embeddings(self.lang, type='cw', normalize=True)
File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 30, in memoizer
cache[key] = obj(*args, **kwargs)
File "C:\Python27\lib\site-packages\polyglot\load.py", line 61, in load_embeddings
p = locate_resource(src_dir, lang)
File "C:\Python27\lib\site-packages\polyglot\load.py", line 43, in locate_resource
if downloader.status(package_id) != downloader.INSTALLED:
File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 738, in status
info = self._info_or_id(info_or_id)
File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 508, in _info_or_id
return self.info(info_or_id)
File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 934, in info
raise ValueError('Package %r not found in index' % id)
ValueError: Package u'embeddings2.iw' not found in index
英语有效,但希伯来语无效.
无论我是否尝试下载软件包u'embeddings2.iw'
,我都会得到:
The english worked but not the hebrew.
Whether I try to download the package u'embeddings2.iw'
or not I get:
ValueError: Package u'embeddings2.iw' not found in index
推荐答案
我明白了!
对我来说似乎是个错误.
语言检测将语言定义为'iw'
,这是希伯来语的先前ISO 639语言代码,并已更改为'he'
.
text.entities
无法识别iw
代码,因此我将其更改为:
I got it!
It seems like a bug to me.
The language detection defined the language as 'iw'
which is the The former ISO 639 language code for Hebrew, and was changed to 'he'
.
The text.entities
did not recognize the iw
code, so i changes it like so:
text2.hint_language_code = 'he'
这篇关于使用多语种软件包进行希伯来语中的命名实体识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文