无法下载spark-nlp库提供的管道 [英] unable to download the pipeline provided by spark-nlp library

查看:97
本文介绍了无法下载spark-nlp库提供的管道的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法使用spark-nlp库提供的预定义管道"recognize_entities_dl"

i am unable to use the predefined pipeline "recognize_entities_dl" provided by the spark-nlp library

我尝试安装不同版本的pyspark和spark-nlp库

i tried installing different versions of pyspark and spark-nlp library

import sparknlp
from sparknlp.pretrained import PretrainedPipeline

#create or get Spark Session

spark = sparknlp.start()

sparknlp.version()
spark.version

#download, load, and annotate a text by pre-trained pipeline

pipeline = PretrainedPipeline('recognize_entities_dl', lang='en')
result = pipeline.annotate('Harry Potter is a great movie')

2.1.0
recognize_entities_dl download started this may take some time.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-b71a0f77e93a> in <module>
     11 #download, load, and annotate a text by pre-trained pipeline
     12 
---> 13 pipeline = PretrainedPipeline('recognize_entities_dl', 'en')
     14 result = pipeline.annotate('Harry Potter is a great movie')

d:\python36\lib\site-packages\sparknlp\pretrained.py in __init__(self, name, lang, remote_loc)
     89 
     90     def __init__(self, name, lang='en', remote_loc=None):
---> 91         self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
     92         self.light_model = LightPipeline(self.model)
     93 

d:\python36\lib\site-packages\sparknlp\pretrained.py in downloadPipeline(name, language, remote_loc)
     50     def downloadPipeline(name, language, remote_loc=None):
     51         print(name + " download started this may take some time.")
---> 52         file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
     53         if file_size == "-1":
     54             print("Can not find the model to download please check the name!")

AttributeError: module 'sparknlp.internal' has no attribute '_GetResourceSize'

推荐答案

感谢您确认Apache Spark版本.预先训练的管道和模型基于Apache Spark和Spark NLP版本.最低的Apache Spark版本必须为 2.4.x ,才能下载经过预先​​训练的模型/管道.否则,您需要先针对任何版本训练自己的模型/管道.

Thanks for confirming your Apache Spark version. The pre-trained pipelines and models are based on Apache Spark and Spark NLP versions. The lowest Apache Spark version must be 2.4.x to be able to download the pre-trained models/pipelines. Otherwise, you need to train your own models/pipelines for any version before.

这是所有管道的列表,它们全部用于Apache Spark 2.4.x: https://nlp.johnsnowlabs.com/docs/en/pipelines

This is the list of all pipelines and they all for Apache Spark 2.4.x: https://nlp.johnsnowlabs.com/docs/en/pipelines

如果您查看任何模型或管道的URL,您会看到以下信息:

If you take a look at the URL of any models or pipelines you can see this information:

recognize_entities_dl_en_2.1.0_2.4_1562946909722.zip

  • 名称: recognize_entities_dl
  • 语言: en
  • Spark NLP :必须等于 2.1.0 或更高
  • Apache Spark :等于 2.4.x 或更高版本
  • Name: recognize_entities_dl
  • Lang: en
  • Spark NLP: must be equal to 2.1.0 or greater
  • Apache Spark: equal to 2.4.x or greater

注意:正在根据Apache Spark 2.4.x 构建和编译Spark NLP库.这就是为什么模型和管道仅适用于 2.4.x 版本的原因.

NOTE: The Spark NLP library is being built and compiled against Apache Spark 2.4.x. That is why models and pipelines are being only available for the 2.4.x version.

注意2:由于您正在使用Windows,因此需要使用与Windows兼容的 _noncontrib 模型和管道:

NOTE 2: Since you are using Windows, you need to use _noncontrib models and pipelines which are compatible with Windows: Do Spark-NLP pretrained pipelines only work on linux systems?

我希望这个答案能帮助您解决问题.

I hope this answer helps and solves your issue.

2020年4月更新:显然,在Apache Spark 2.4.x上训练和上传的模型和管道也与Apache Spark 2.3.x兼容.因此,即使您无法使用 pretrained()自动下载,即使您使用的是Apache Spark 2.3.x,也可以手动下载它,而只需使用 .load().

UPDATE April 2020: Apparently the models and pipelines trained and uploaded on Apache Spark 2.4.x are compatible with Apache Spark 2.3.x as well. So if you are on Apache Spark 2.3.x even though you cannot use pretrained() for auto-download you can download it manually and just use .load() instead.

所有模型和管道的完整列表,并提供下载链接: https://github.com/JohnSnowLabs/spark-nlp-models

Full list of all models and pipelines with links to download: https://github.com/JohnSnowLabs/spark-nlp-models

更新:2.4.0发布后,所有模型和管道都是跨平台的,因此无需为任何特定的OS选择不同的模型/管道: https://github.com/JohnSnowLabs/spark-nlp/releases/标签/2.4.0

Update: After 2.4.0 release, all the models and pipelines are cross-platform and there is no need to choose a different model/pipeline for any specific OS: https://github.com/JohnSnowLabs/spark-nlp/releases/tag/2.4.0

对于较新的发行版: https://github.com/JohnSnowLabs/spark-nlp/releases

这篇关于无法下载spark-nlp库提供的管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆