pickle.PicklingError: 来自 newobj args 的 args[0] 与 hadoop python 有错误的类 [英] pickle.PicklingError: args[0] from newobj args has the wrong class with hadoop python

查看：17 发布时间：2021/12/15 19:26:31 python python-2.7 hadoop pyspark pickle

本文介绍了pickle.PicklingError: 来自 __newobj__ args 的 args[0] 与 hadoop python 有错误的类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试通过spark删除停用词，代码如下

I am trying to I am tring to delete stop words via spark,the code is as follow

from nltk.corpus import stopwords
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession

sc = SparkContext('local')
spark = SparkSession(sc)
word_list=["ourselves","out","over", "own", "same" ,"shan't" ,"she", "she'd", "what", "the", "fuck", "is", "this","world","too","who","who's","whom","yours","yourself","yourselves"]

wordlist=spark.createDataFrame([word_list]).rdd

def stopwords_delete(word_list):
    filtered_words=[]
    print word_list



    for word in word_list:
        print word
        if word not in stopwords.words('english'):
            filtered_words.append(word)



filtered_words=wordlist.map(stopwords_delete)
print(filtered_words)

我得到如下错误:

pickle.PicklingError: args[0] from newobj args 有错误的类

pickle.PicklingError: args[0] from newobj args has the wrong class

我不知道为什么，有人可以帮助我.
提前致谢

I don't know why,can somebody help me.
Thanks in advance

推荐答案

与停用词模块的上传有关.作为解决在函数本身中导入停用词库的方法.请参阅下面链接的类似问题.我遇到了同样的问题，这项工作解决了这个问题.

It's to do with uploading of stop words module. As a work around import stopwords library with in the function itself. please see the similar issue linked below. I had the same issue and this work around fixed the problem.

    def stopwords_delete(word_list):
        from nltk.corpus import stopwords
        filtered_words=[]
        print word_list

类似问题

我推荐 from pyspark.ml.feature import StopWordsRemover 作为永久修复.

I would recommend from pyspark.ml.feature import StopWordsRemover as permanent fix.

这篇关于pickle.PicklingError: 来自 __newobj__ args 的 args[0] 与 hadoop python 有错误的类的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pickle.PicklingError: 来自 newobj args 的 args[0] 与 hadoop python 有错误的类 [英] pickle.PicklingError: args[0] from newobj args has the wrong class with hadoop python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pickle.PicklingError: 来自 __newobj__ args 的 args[0] 与 hadoop python 有错误的类 [英] pickle.PicklingError: args[0] from __newobj__ args has the wrong class with hadoop python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

pickle.PicklingError: 来自 newobj args 的 args[0] 与 hadoop python 有错误的类 [英] pickle.PicklingError: args[0] from newobj args has the wrong class with hadoop python

登录关闭