Pyspark上saveAsTextFile()中命令字符串异常中的(null)条目 [英] (null) entry in command string exception in saveAsTextFile() on Pyspark
问题描述
我正在Windows 7的Jupyter笔记本(Python 2.7)上的PySpark中工作.我有一个pyspark.rdd.PipelinedRDD
类型的RDD,称为idSums
.尝试执行idSums.saveAsTextFile("Output")
时,出现以下错误:
I am working in PySpark on a Jupyter notebook (Python 2.7) in windows 7. I have an RDD of type pyspark.rdd.PipelinedRDD
called idSums
. When attempting to execute idSums.saveAsTextFile("Output")
, I receive the following error:
Py4JJavaError: An error occurred while calling o834.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 33.0 failed 1 times, most recent failure: Lost task 1.0 in stage 33.0 (TID 131, localhost): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\seride\Desktop\Experiments\PySpark\Output\_temporary\0\_temporary\attempt_201611231307_0033_m_000001_131\part-00001
我认为RDD对象应该没有任何问题,因为我能够执行其他操作而不会出错,例如执行idSums.collect()
会产生正确的输出.
There shouldn't be any problem with the RDD object, in my opinion, because I'm able to execute other actions without error, e.g. executing idSums.collect()
produces the correct output.
此外,创建了Output
目录(包含所有子目录)并创建了文件part-00001
,但文件大小为0个字节.
Furthermore, the Output
directory is created (with all subdirectories) and the file part-00001
is created, but it is 0 bytes.
推荐答案
您丢失了 winutils.exe
hadoop二进制文件.取决于x64位/x32位系统,下载 winutils.exe
文件&将您的hadoop回家指向它.
You are missing winutils.exe
a hadoop binary . Depending upon x64 bit / x32 bit System download the winutils.exe
file & set your hadoop home pointing to it.
第一种方式:
- 下载文件
- 在系统中创建
hadoop
文件夹,例如C:
- 在
hadoop
目录中创建bin
文件夹,例如:C:\hadoop\bin
- 在
bin
中粘贴winutils.exe
,例如:C:\hadoop\bin\winuitls.exe
- 在系统属性的用户变量中->高级系统设置
- Download the file
- Create
hadoop
folder in Your System, exC:
- Create
bin
folder inhadoop
directory, ex :C:\hadoop\bin
- paste
winutils.exe
inbin
, ex:C:\hadoop\bin\winuitls.exe
- In User Variables in System Properties -> Advance System Settings
创建新变量
名称:HADOOP_HOME
路径:C:\hadoop\
Create New Variable
Name: HADOOP_HOME
Path: C:\hadoop\
第二种方式:
您可以使用以下命令直接在Java程序中设置hadoop主页:
You can set hadoop home directly in Your Java Program with the following Command like this :
System.setProperty("hadoop.home.dir","C:\hadoop" );
这篇关于Pyspark上saveAsTextFile()中命令字符串异常中的(null)条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!