标记点对象pyspark中的错误 [英] error in labelled point object pyspark

查看：70 发布时间：2020/9/4 21:19:01 python apache-spark pyspark apache-spark-sql

本文介绍了标记点对象pyspark中的错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个函数

以RDD作为输入
分割逗号分隔的值
然后将每一行转换为带标签的点对象
最终将输出作为数据帧获取

which takes a RDD as input
splits the comma separated values
then convert each row into labelled point object
finally fetch the output as a dataframe

code: 

def parse_points(raw_rdd):

    cleaned_rdd = raw_rdd.map(lambda line: line.split(","))
    new_df = cleaned_rdd.map(lambda line:LabeledPoint(line[0],[line[1:]])).toDF()
    return new_df


output = parse_points(input_rdd)

至此，如果我运行代码，则没有错误，可以正常工作.

upto this if I run the code, there is no error it is working fine.

但是在添加行时，

 output.take(5)

我遇到了错误:

org.apache.spark.SparkException: Job aborted due to stage failure: Task   0 in stage 129.0 failed 1 times, most recent failure: Lost task 0.0 in s    stage 129.0 (TID 152, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):

Py4JJavaError       Traceback (most recent call last)
<ipython-input-100-a68c448b64b0> in <module>()
 20 
 21 output = parse_points(raw_rdd)
 ---> 22 print output.show()

请告诉我这是什么错误.

Please suggest me what is the mistake.

推荐答案

在执行操作之前没有错误的原因:

The reason you had no errors until you execute the action:

 output.take(5)

是由于火花的性质，它是懒惰的. 即在您执行动作"take(5)"之前，火花中什么都没有执行

Is due to the nature of spark, which is lazy. i.e. nothing was execute in spark until you execute the action "take(5)"

您的代码中有几个问题，我认为您由于[line [1:]]中多余的"["和]"而失败了

You have a few issues in your code, and I think that you are failing due to extra "[" and "]" in [line[1:]]

因此，您需要在[line [1:]]中删除多余的"["和]"(并仅保留第[1:]行)

So you need to remove extra "[" and "]" in [line[1:]] (and keep only the line[1:])

您可能需要解决的另一个问题是缺少数据框架构.

Another issue which you might need to solve is the lack of dataframe schema.

即将"toDF()"替换为"toDF(["features"，"label"]) 这将为数据框提供一个架构.

i.e. replace "toDF()" with "toDF(["features","label"])" This will give the dataframe a schema.

这篇关于标记点对象pyspark中的错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

标记点对象pyspark中的错误 [英] error in labelled point object pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

标记点对象pyspark中的错误 [英] error in labelled point object pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭