Pyspark尝试使用udf时抛出IllegalArgumentException:'不支持的类文件主要版本55' [英] Pyspark throws IllegalArgumentException: 'Unsupported class file major version 55' when trying to use udf

查看:70
本文介绍了Pyspark尝试使用udf时抛出IllegalArgumentException:'不支持的类文件主要版本55'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在pyspark中使用udfs时出现以下问题.

i have following problem while using udfs in pyspark.

只要我不使用任何udfs,我的代码就可以正常工作.执行诸如选择列之类的简单操作或使用诸如concat之类的sql函数都没有问题.一旦我对使用udf的DataFrame执行操作,程序就会崩溃,并出现以下异常:

As long as I don't use any udfs my code works well. There are no problems with performing simple operations like selecting columns, or using sql functions like concat. As soon as I perform action on DataFrame that uses udf, program crash with following exception:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/szymonk/Desktop/Projects/SparkTest/venv/lib/python2.7/site-packages/pyspark/jars/spark-unsafe_2.11-2.4.3.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/06/05 09:24:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "/Users/szymonk/Desktop/Projects/SparkTest/Application.py", line 59, in <module>
    transformations.select(udf_example(col("gender")).alias("udf_example")).show()
  File "/Users/szymonk/Desktop/Projects/SparkTest/venv/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line 378, in show
    print(self._jdf.showString(n, 20, vertical))
  File "/Users/szymonk/Desktop/Projects/SparkTest/venv/lib/python2.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Users/szymonk/Desktop/Projects/SparkTest/venv/lib/python2.7/site-packages/pyspark/sql/utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'Unsupported class file major version 55'

我已尝试按照以下建议中的建议更改JAVA_HOME: Pyspark错误-不支持的主文件主要版本55 ,但没有帮助.

I've tried changing JAVA_HOME as proposed in: Pyspark error - Unsupported class file major version 55 but it didn't help.

我的代码中没有花哨的东西.我只定义了一个简单的udf函数,该函数应在性别"列中返回长度的值

There is nothing fancy in my code. I am only defining a simple udf function that should return lenght of values inside column "Gender"

from pprint import pprint
from pyspark.sql import SparkSession, Column
from pyspark.sql.functions import col, lit, struct, array, udf, concat, trim, when
from pyspark.sql.types import IntegerType

transformations = spark.read.csv("Resources/PersonalData.csv", header=True)

udf_example = udf(lambda x: len(x))
transformations.select(udf_example(col("gender")).alias("udf_example")).show()

我不确定这是否很重要,但我在Mac上使用的是Pycharm.

I'm not sure if it is significant but i'm using Pycharm on Mac.

推荐答案

我找到了解决方案,我不得不切换Pycharm的启动jdk(2xshift-> jdk->选择jdk 1.8)

I found solution, i had to switch boot jdk of Pycharm (2xshift -> jdk -> select jdk 1.8)

这篇关于Pyspark尝试使用udf时抛出IllegalArgumentException:'不支持的类文件主要版本55'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆