Pyspark 在尝试使用 udf 时抛出 IllegalArgumentException:“不支持的类文件主要版本 55" [英] Pyspark throws IllegalArgumentException: 'Unsupported class file major version 55' when trying to use udf

查看:38
本文介绍了Pyspark 在尝试使用 udf 时抛出 IllegalArgumentException:“不支持的类文件主要版本 55"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 pyspark 中使用 udfs 时遇到以下问题.

i have following problem while using udfs in pyspark.

只要我不使用任何 udfs,我的代码就运行良好.执行诸如选择列之类的简单操作,或使用诸如 concat 之类的 sql 函数都没有问题.一旦我对使用 udf 的 DataFrame 执行操作,程序就会崩溃并出现以下异常:

As long as I don't use any udfs my code works well. There are no problems with performing simple operations like selecting columns, or using sql functions like concat. As soon as I perform action on DataFrame that uses udf, program crash with following exception:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/szymonk/Desktop/Projects/SparkTest/venv/lib/python2.7/site-packages/pyspark/jars/spark-unsafe_2.11-2.4.3.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/06/05 09:24:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "/Users/szymonk/Desktop/Projects/SparkTest/Application.py", line 59, in <module>
    transformations.select(udf_example(col("gender")).alias("udf_example")).show()
  File "/Users/szymonk/Desktop/Projects/SparkTest/venv/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line 378, in show
    print(self._jdf.showString(n, 20, vertical))
  File "/Users/szymonk/Desktop/Projects/SparkTest/venv/lib/python2.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Users/szymonk/Desktop/Projects/SparkTest/venv/lib/python2.7/site-packages/pyspark/sql/utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'Unsupported class file major version 55'

我尝试按照以下建议更改 JAVA_HOME:Pyspark 错误 -不受支持的类文件主要版本 55,但它没有帮助.

I've tried changing JAVA_HOME as proposed in: Pyspark error - Unsupported class file major version 55 but it didn't help.

我的代码没有什么花哨的.我只是定义了一个简单的 udf 函数,它应该返回性别"列内的值长度

There is nothing fancy in my code. I am only defining a simple udf function that should return lenght of values inside column "Gender"

from pprint import pprint
from pyspark.sql import SparkSession, Column
from pyspark.sql.functions import col, lit, struct, array, udf, concat, trim, when
from pyspark.sql.types import IntegerType

transformations = spark.read.csv("Resources/PersonalData.csv", header=True)

udf_example = udf(lambda x: len(x))
transformations.select(udf_example(col("gender")).alias("udf_example")).show()

我不确定它是否重要,但我在 Mac 上使用 Pycharm.

I'm not sure if it is significant but i'm using Pycharm on Mac.

推荐答案

我找到了解决方案,我不得不切换 Pycharm 的启动 jdk(2xshift -> jdk -> 选择 jdk 1.8)

I found solution, i had to switch boot jdk of Pycharm (2xshift -> jdk -> select jdk 1.8)

这篇关于Pyspark 在尝试使用 udf 时抛出 IllegalArgumentException:“不支持的类文件主要版本 55"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆