火花.简单的“在任何本地目录中没有可用空间". [英] Spark. Simple "No space available in any of the local directories."
问题描述
这是一个简单的测试程序.这显然是一个很小的测试数据程序.
Here is a simple test program. This is obviously a tiny test data program.
from pyspark.sql.types import Row
from pyspark.sql.types import *
import pyspark.sql.functions as spark_functions
schema = StructType([
StructField("cola", StringType()),
StructField("colb", IntegerType()),
])
rows = [
Row("alpha", 1),
Row("beta", 2),
Row("gamma", 3),
Row("delta", 4)
]
data_frame = spark.createDataFrame(rows, schema)
print("count={}".format(data_frame.count()))
data_frame.write.save("s3a://my-bucket/test_data.parquet", mode="overwrite")
print("done")
此操作失败,并显示以下信息:
This fails with:
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: No space available in any of the local directories.
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:366)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:416)
这在具有S3存储的Amazon EMR上运行.有足够的磁盘空间.谁能解释?
This is running on Amazon EMR with S3 storage. There is plenty of disk space. Can anyone explain?
推荐答案
在EMR上使用Spark 2.2时,我遇到了相同的错误. fs.s3a.fast.upload = true
和 fs.s3a.buffer.dir ="/home/hadoop,/tmp"
设置(或其他任何文件夹这个问题)对我没有帮助.看来我的问题与洗牌空间有关.
I ran into the same error while using Spark 2.2 on EMR. The settings, fs.s3a.fast.upload=true
and fs.s3a.buffer.dir="/home/hadoop,/tmp"
(or any other folder for that matter) did not help me. It seems like my issue was related to shuffle space.
我必须在我的spark-submit/spark-shell中添加-conf spark.shuffle.service.enabled = true
来解决此错误.
I had to add --conf spark.shuffle.service.enabled=true
to my spark-submit / spark-shell to resolve this error.
这篇关于火花.简单的“在任何本地目录中没有可用空间".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!