Pyspark 上 saveAsTextFile() 命令字符串异常中的(空)条目 [英] (null) entry in command string exception in saveAsTextFile() on Pyspark

查看:40
本文介绍了Pyspark 上 saveAsTextFile() 命令字符串异常中的(空)条目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Windows 7 中的 Jupyter 笔记本(Python 2.7)上使用 PySpark.我有一个 pyspark.rdd.PipelinedRDD 类型的 RDD,名为 idSums.尝试执行 idSums.saveAsTextFile("Output") 时,我收到以下错误:

I am working in PySpark on a Jupyter notebook (Python 2.7) in windows 7. I have an RDD of type pyspark.rdd.PipelinedRDD called idSums. When attempting to execute idSums.saveAsTextFile("Output"), I receive the following error:

Py4JJavaError: An error occurred while calling o834.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 33.0 failed 1 times, most recent failure: Lost task 1.0 in stage 33.0 (TID 131, localhost): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\seride\Desktop\Experiments\PySpark\Output\_temporary\0\_temporary\attempt_201611231307_0033_m_000001_131\part-00001

在我看来,RDD 对象应该没有任何问题,因为我能够执行其他操作而不会出错,例如执行 idSums.collect() 会产生正确的输出.

There shouldn't be any problem with the RDD object, in my opinion, because I'm able to execute other actions without error, e.g. executing idSums.collect() produces the correct output.

此外,创建了Output 目录(包含所有子目录)并创建了文件part-00001,但它是0 字节.

Furthermore, the Output directory is created (with all subdirectories) and the file part-00001 is created, but it is 0 bytes.

推荐答案

您缺少 winutils.exe 一个 hadoop 二进制文件.根据 x64 位/x32 位系统下载 winutils.exe 文件 &将您的 hadoop home 设置为指向它.

You are missing winutils.exe a hadoop binary . Depending upon x64 bit / x32 bit System download the winutils.exe file & set your hadoop home pointing to it.

第一种方式:

  1. 下载文件
  2. 在您的系统中创建 hadoop 文件夹,例如 C:
  3. hadoop目录下创建bin文件夹,例如:C:\hadoop\bin
  4. winutils.exe 粘贴到bin 中,例如:C:\hadoop\bin\winutils.exe
  5. 在系统属性中的用户变量中 ->高级系统设置
  1. Download the file
  2. Create hadoop folder in Your System, ex C:
  3. Create bin folder in hadoop directory, ex : C:\hadoop\bin
  4. paste winutils.exe in bin, ex: C:\hadoop\bin\winutils.exe
  5. In User Variables in System Properties -> Advance System Settings

创建新变量名称:HADOOP_HOME路径:C:\hadoop\

Create New Variable Name: HADOOP_HOME Path: C:\hadoop\

第二种方式:

您可以使用以下命令直接在您的 Java 程序中设置 hadoop home :

You can set hadoop home directly in Your Java Program with the following Command like this :

System.setProperty("hadoop.home.dir","C:\hadoop" );

这篇关于Pyspark 上 saveAsTextFile() 命令字符串异常中的(空)条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆