pyspark py4j.Py4JException:方法and([[class java.lang.Integer])不存在 [英] pyspark py4j.Py4JException: Method and([class java.lang.Integer]) does not exist

查看:515
本文介绍了pyspark py4j.Py4JException:方法and([[class java.lang.Integer])不存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以帮助我了解以下错误吗?我是PySpark的新手,开始学习.

Can Someone help me to get understand the below error, I'm a newbie to PySpark, started learning.

当我用Google搜索它时,发生以下错误,当我们比较不同类型的数据类型时,我确实有一个称为薪金的列作为整数列吗?为什么我仍然收到此错误.

When I googled it, the below error occurs, when we compare different types of data types, I did have column called salary as an Integer column? Why am I still getting this error.

>>> df.printSchema()
root
 |-- Firstname: string (nullable = true)
 |-- middlename: string (nullable = true)
 |-- lastname: string (nullable = true)
 |-- dob: string (nullable = true)
 |-- sex: string (nullable = true)
 |-- salary: integer (nullable = true)
 |-- CopiedColumn: integer (nullable = true)
 |-- Country: string (nullable = false)
 |-- anotherColumn: string (nullable = false)

>>> df.show()
+---------+----------+--------+----------+---+------+------------+-------+-------------+
|Firstname|middlename|lastname|       dob|sex|salary|CopiedColumn|Country|anotherColumn|
+---------+----------+--------+----------+---+------+------------+-------+-------------+
|    James|          |   Smith|1991-04-01|  M|300000|     -300000|  India|Another value|
|  Michael|      Rose|        |2000-05-19|  M|400000|     -400000|  India|Another value|
|   Robert|          |Williams|1978-09-05|  M|400000|     -400000|  India|Another value|
|    Maria|      Anne|   Jones|1967-12-01|  F|400000|     -400000|  India|Another value|
|      Jen|      Mary|   Brown|1980-02-17|  F|  -100|         100|  India|Another value|
+---------+----------+--------+----------+---+------+------------+-------+-------------+


>>> df.withColumn("lit_value2", when(col("salary") >=400000 & col("salary") <= 500000,lit("100")).otherwise(lit("200"))).show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/mapr/spark/spark/python/pyspark/sql/column.py", line 115, in _
    njc = getattr(self._jc, name)(jc)
  File "/opt/mapr/spark/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/mapr/spark/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/opt/mapr/spark/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 332, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o138.and. Trace:
py4j.Py4JException: Method and([class java.lang.Integer]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
        at py4j.Gateway.invoke(Gateway.java:274)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

推荐答案

您需要将条件包装在括号中:

You need to wrap the conditions in parentheses:

when((col("salary") >= 400000) & (col("salary") <= 500000), lit("100"))

否则,由于运算符优先级-& 高于> = .您的情况将按以下方式解释.

Otherwise your condition will be interpreted as below, due to operator precedence - & is higher than >=.

col("salary") >= (400000 & col("salary")) <= 500000

这没有意义,并给出您遇到的错误.

which does not make sense and gives the error you got.

这篇关于pyspark py4j.Py4JException:方法and([[class java.lang.Integer])不存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆