带UDF的withColumn产生AttributeError：'NoneType'对象没有属性'_jvm' [英] withColumn with UDF yields AttributeError: 'NoneType' object has no attribute '_jvm'

查看：176 发布时间：2020/10/17 2:03:35 python dataframe lambda pyspark user-defined-functions

本文介绍了带UDF的withColumn产生AttributeError：'NoneType'对象没有属性'_jvm'的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用UDF替换spark数据框中的某些值，但继续出现相同的错误。

I am trying to replace some values in a spark dataframe by using a UDF, but keep on getting the same error.

在调试时，我发现它并没有确实取决于我使用的数据框，也不取决于我编写的功能。这是MWE，具有一个简单的lambda函数，我无法正常执行。基本上，应该通过将值与其自身连接来修改第一列中的所有值。

While debugging I found out it doesn't really depend on the dataframe I am using, nor the function that I write. Here is a MWE that features a simple lambda function that I can't get to execute properly. This should basically modify all the values in the first column by concatenating the value with itself.

l = [('Alice', 1)]
df = sqlContext.createDataFrame(l)
df.show()

#+-----+---+
#|   _1| _2|
#+-----+---+
#|Alice|  1|
#+-----+---+

df = df.withColumn("_1", udf(lambda x : lit(x+x), StringType())(df["_1"]))
df.show()
#Alice should now become AliceAlice

这是我得到的错误，提到了一个相当神秘的 AttributeError：'NoneType'对象没有属性'_jvm。

This is the error that I get, mentioning a rather cryptic "AttributeError: 'NoneType' object has no attribute '_jvm".

 File "/cdh/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/pyspark/worker.py", line 111, in main
    process()
  File "/cdh/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/pyspark/worker.py", line 106, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/cdh/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/cdh/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/pyspark/sql/functions.py", line 1566, in <lambda>
    func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)), it)
  File "<stdin>", line 1, in <lambda>
  File "/cdh/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/pyspark/sql/functions.py", line 39, in _
    jc = getattr(sc._jvm.functions, name)(col._jc if isinstance(col, Column) else col)
AttributeError: 'NoneType' object has no attribute '_jvm'

我确定我对语法感到困惑，并且不能正确输入类型（感谢鸭子输入！），但是我发现的withColumn和lambda函数的每个示例似乎都与此类似。

I am sure I am getting confused with the syntax and can't get types right (thanks duck typing!), but every example of withColumn and lambda functions that I found seems to be similar to this one.

推荐答案

您非常亲密，它在抱怨，因为您不能使用 lit 在udf中：) lit 用于列级别，而不用于行级别。

You are very close, it is complaining because you cannot use lit within a udf :) lit is used on column level, not on row level.

l = [('Alice', 1)]
df = spark.createDataFrame(l)
df.show()

+-----+---+
|   _1| _2|
+-----+---+
|Alice|  1|
+-----+---+

df = df.withColumn("_1", udf(lambda x: x+x, StringType())("_1"))
# this would produce the same result, but lit is not necessary here
# df = df.withColumn("_1", udf(lambda x: x+x, StringType()(lit(df["_1"])))
df.show()

+----------+---+
|        _1| _2|
+----------+---+
|AliceAlice|  1|
+----------+---+

这篇关于带UDF的withColumn产生AttributeError：'NoneType'对象没有属性'_jvm'的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

带UDF的withColumn产生AttributeError：'NoneType'对象没有属性'_jvm' [英] withColumn with UDF yields AttributeError: 'NoneType' object has no attribute '_jvm'

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

带UDF的withColumn产生AttributeError：'NoneType'对象没有属性'_jvm' [英] withColumn with UDF yields AttributeError: &#39;NoneType&#39; object has no attribute &#39;_jvm&#39;

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

带UDF的withColumn产生AttributeError：'NoneType'对象没有属性'_jvm' [英] withColumn with UDF yields AttributeError: 'NoneType' object has no attribute '_jvm'

登录关闭