标记点对象pyspark中的错误 [英] error in labelled point object pyspark

查看:70
本文介绍了标记点对象pyspark中的错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个函数

  1. 以RDD作为输入
  2. 分割逗号分隔的值
  3. 然后将每一行转换为带标签的点对象
  4. 最终将输出作为数据帧获取

  1. which takes a RDD as input
  2. splits the comma separated values
  3. then convert each row into labelled point object
  4. finally fetch the output as a dataframe

code: 

def parse_points(raw_rdd):

    cleaned_rdd = raw_rdd.map(lambda line: line.split(","))
    new_df = cleaned_rdd.map(lambda line:LabeledPoint(line[0],[line[1:]])).toDF()
    return new_df


output = parse_points(input_rdd)

至此,如果我运行代码,则没有错误,可以正常工作.

upto this if I run the code, there is no error it is working fine.

但是在添加行时,

 output.take(5)

我遇到了错误:

org.apache.spark.SparkException: Job aborted due to stage failure: Task   0 in stage 129.0 failed 1 times, most recent failure: Lost task 0.0 in s    stage 129.0 (TID 152, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):

Py4JJavaError       Traceback (most recent call last)
<ipython-input-100-a68c448b64b0> in <module>()
 20 
 21 output = parse_points(raw_rdd)
 ---> 22 print output.show()

请告诉我这是什么错误.

Please suggest me what is the mistake.

推荐答案

在执行操作之前没有错误的原因:

The reason you had no errors until you execute the action:

 output.take(5)

是由于火花的性质,它是懒惰的. 即在您执行动作"take(5)"之前,火花中什么都没有执行

Is due to the nature of spark, which is lazy. i.e. nothing was execute in spark until you execute the action "take(5)"

您的代码中有几个问题,我认为您由于[line [1:]]中多余的"["和]"而失败了

You have a few issues in your code, and I think that you are failing due to extra "[" and "]" in [line[1:]]

因此,您需要在[line [1:]]中删除多余的"["和]"(并仅保留第[1:]行)

So you need to remove extra "[" and "]" in [line[1:]] (and keep only the line[1:])

您可能需要解决的另一个问题是缺少数据框架构.

Another issue which you might need to solve is the lack of dataframe schema.

即将"toDF()"替换为"toDF(["features","label"]) 这将为数据框提供一个架构.

i.e. replace "toDF()" with "toDF(["features","label"])" This will give the dataframe a schema.

这篇关于标记点对象pyspark中的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆