Pyspark上saveAsTextFile()中命令字符串异常中的(null)条目 [英] (null) entry in command string exception in saveAsTextFile() on Pyspark

查看:43
本文介绍了Pyspark上saveAsTextFile()中命令字符串异常中的(null)条目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Windows 7的Jupyter笔记本(Python 2.7)上的PySpark中工作.我有一个pyspark.rdd.PipelinedRDD类型的RDD,称为idSums.尝试执行idSums.saveAsTextFile("Output")时,出现以下错误:

I am working in PySpark on a Jupyter notebook (Python 2.7) in windows 7. I have an RDD of type pyspark.rdd.PipelinedRDD called idSums. When attempting to execute idSums.saveAsTextFile("Output"), I receive the following error:

Py4JJavaError: An error occurred while calling o834.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 33.0 failed 1 times, most recent failure: Lost task 1.0 in stage 33.0 (TID 131, localhost): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\seride\Desktop\Experiments\PySpark\Output\_temporary\0\_temporary\attempt_201611231307_0033_m_000001_131\part-00001

我认为RDD对象应该没有任何问题,因为我能够执行其他操作而不会出错,例如执行idSums.collect()会产生正确的输出.

There shouldn't be any problem with the RDD object, in my opinion, because I'm able to execute other actions without error, e.g. executing idSums.collect() produces the correct output.

此外,创建了Output目录(包含所有子目录)并创建了文件part-00001,但文件大小为0个字节.

Furthermore, the Output directory is created (with all subdirectories) and the file part-00001 is created, but it is 0 bytes.

推荐答案

您丢失了 winutils.exe hadoop二进制文件.取决于x64位/x32位系统,下载 winutils.exe 文件&将您的hadoop回家指向它.

You are missing winutils.exe a hadoop binary . Depending upon x64 bit / x32 bit System download the winutils.exe file & set your hadoop home pointing to it.

第一种方式:

  1. 下载文件
  2. 在系统中创建hadoop文件夹,例如C:
  3. hadoop目录中创建bin文件夹,例如:C:\hadoop\bin
  4. bin中粘贴winutils.exe,例如:C:\hadoop\bin\winuitls.exe
  5. 在系统属性的用户变量中->高级系统设置
  1. Download the file
  2. Create hadoop folder in Your System, ex C:
  3. Create bin folder in hadoop directory, ex : C:\hadoop\bin
  4. paste winutils.exe in bin, ex: C:\hadoop\bin\winuitls.exe
  5. In User Variables in System Properties -> Advance System Settings

创建新变量 名称:HADOOP_HOME 路径:C:\hadoop\

Create New Variable Name: HADOOP_HOME Path: C:\hadoop\

第二种方式:

您可以使用以下命令直接在Java程序中设置hadoop主页:

You can set hadoop home directly in Your Java Program with the following Command like this :

System.setProperty("hadoop.home.dir","C:\hadoop" );

这篇关于Pyspark上saveAsTextFile()中命令字符串异常中的(null)条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆