Pyspark'NoneType'对象没有属性'_jvm'错误 [英] Pyspark 'NoneType' object has no attribute '_jvm' error

查看：521 发布时间：2020/9/3 23:34:30 python apache-spark pyspark apache-spark-sql

本文介绍了Pyspark'NoneType'对象没有属性'_jvm'错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图使用spark 2.2在DataFrame的每个分区中打印全部元素

 from pyspark.sql.functions import *
from pyspark.sql import SparkSession

def count_elements(splitIndex, iterator):
    n = sum(1 for _ in iterator)
    yield (splitIndex, n)

spark = SparkSession.builder.appName("tmp").getOrCreate()
num_parts = 3
df = spark.read.json("/tmp/tmp/gon_s.json").repartition(num_parts)
print("df has partitions."+ str(df.rdd.getNumPartitions()))
print("Elements across partitions is:" + str(df.rdd.mapPartitionsWithIndex(lambda ind, x: count_elements(ind, x)).take(3)))

上面的代码不断失败，并出现以下错误

   n = sum(1 for _ in iterator)
  File "/home/dev/wk/pyenv/py3/lib/python3.5/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/functions.py", line 40, in _
    jc = getattr(sc._jvm.functions, name)(col._jc if isinstance(col, Column) else col)
AttributeError: 'NoneType' object has no attribute '_jvm'

在删除下面的导入后

 from pyspark.sql.functions import *

代码工作正常

 skewed_large_df has partitions.3
The distribution of elements across partitions is:[(0, 1), (1, 2), (2, 2)]

是什么原因导致此错误，该如何解决?

解决方案

这是为什么不应该使用import * 的一个很好的例子./p>

行

 from pyspark.sql.functions import *

将把pyspark.sql.functions模块中的所有功能引入您的名称空间，其中包括一些将隐藏您的内置函数的功能.

具体问题在以下行的count_elements函数中:

 n = sum(1 for _ in iterator)
#   ^^^ - this is now pyspark.sql.functions.sum

您打算打电话给__builtin__.sum，但是import *遮盖了内置函数.

相反，请执行以下一项操作:

 import pyspark.sql.functions as f

或

 from pyspark.sql.functions import sum as sum_

I was trying to print total elements in each partitions in a DataFrame using spark 2.2

from pyspark.sql.functions import *
from pyspark.sql import SparkSession

def count_elements(splitIndex, iterator):
    n = sum(1 for _ in iterator)
    yield (splitIndex, n)

spark = SparkSession.builder.appName("tmp").getOrCreate()
num_parts = 3
df = spark.read.json("/tmp/tmp/gon_s.json").repartition(num_parts)
print("df has partitions."+ str(df.rdd.getNumPartitions()))
print("Elements across partitions is:" + str(df.rdd.mapPartitionsWithIndex(lambda ind, x: count_elements(ind, x)).take(3)))

The Code above kept failing with following error

  n = sum(1 for _ in iterator)
  File "/home/dev/wk/pyenv/py3/lib/python3.5/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/functions.py", line 40, in _
    jc = getattr(sc._jvm.functions, name)(col._jc if isinstance(col, Column) else col)
AttributeError: 'NoneType' object has no attribute '_jvm'

after removing the import below

from pyspark.sql.functions import *

Code works fine

skewed_large_df has partitions.3
The distribution of elements across partitions is:[(0, 1), (1, 2), (2, 2)]

What is it causing this error and how can I fix it?

解决方案

This is a great example of why you shouldn't use import *.

The line

from pyspark.sql.functions import *

will bring in all the functions in the pyspark.sql.functions module into your namespace, include some that will shadow your builtins.

The specific issue is in the count_elements function on the line:

n = sum(1 for _ in iterator)
#   ^^^ - this is now pyspark.sql.functions.sum

You intended to call __builtin__.sum, but the import * shadowed the builtin.

Instead, do one of the following:

import pyspark.sql.functions as f

from pyspark.sql.functions import sum as sum_

这篇关于Pyspark'NoneType'对象没有属性'_jvm'错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pyspark'NoneType'对象没有属性'_jvm'错误 [英] Pyspark 'NoneType' object has no attribute '_jvm' error

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pyspark'NoneType'对象没有属性'_jvm'错误 [英] Pyspark &#39;NoneType&#39; object has no attribute &#39;_jvm&#39; error

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Pyspark'NoneType'对象没有属性'_jvm'错误 [英] Pyspark 'NoneType' object has no attribute '_jvm' error

登录关闭