Python-PySpark的Pickle Spacy [英] Python - Pickle Spacy for PySpark

查看：149 发布时间：2020/9/4 5:54:27 python apache-spark pyspark user-defined-functions

本文介绍了Python-PySpark的Pickle Spacy的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Spacy 2.0的文档提到开发人员已添加功能以允许对Spacy进行腌制，因此可以由PySpark连接的Spark群集使用它，但是，他们没有提供有关如何执行此操作的说明.

The documentation for Spacy 2.0 mentions that the developers have added functionality to allow for Spacy to be pickled so that it can be used by a Spark Cluster interfaced by PySpark, however, they don't give instructions on how to do this.

有人可以解释我如何腌制Spacy的英语NE解析器以用于udf函数吗?

Can someone explain how I can pickle Spacy's English-language NE parser to be used inside of my udf functions?

这不起作用:

from pyspark import cloudpickle
nlp = English()
pickled_nlp = cloudpickle.dumps(nlp)

推荐答案

这并不是真正的答案，但我发现了最好的解决方法:

Not really an answer, but the best workaround I've discovered:

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType, ArrayType
import spacy

def get_entities_udf():
    def get_entities(text):
        global nlp
        try:
            doc = nlp(unicode(text))
        except:
            nlp = spacy.load('en')
            doc = nlp(unicode(text))
        return [t.label_ for t in doc.ents]
    res_udf = udf(get_entities, StringType(ArrayType()))
    return res_udf

documents_df = documents_df.withColumn('entities', get_entities_udf()('text'))

这篇关于Python-PySpark的Pickle Spacy的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python-PySpark的Pickle Spacy [英] Python - Pickle Spacy for PySpark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python-PySpark的Pickle Spacy [英] Python - Pickle Spacy for PySpark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭