火花.简单的“在任何本地目录中没有可用空间". [英] Spark. Simple "No space available in any of the local directories."

查看:158
本文介绍了火花.简单的“在任何本地目录中没有可用空间".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个简单的测试程序.这显然是一个很小的测试数据程序.

Here is a simple test program. This is obviously a tiny test data program.

from pyspark.sql.types import Row
from pyspark.sql.types import *
import pyspark.sql.functions as spark_functions

schema = StructType([
    StructField("cola", StringType()),
    StructField("colb", IntegerType()),
])

rows = [
    Row("alpha", 1),
    Row("beta", 2),
    Row("gamma", 3),
    Row("delta", 4)
]

data_frame = spark.createDataFrame(rows, schema)

print("count={}".format(data_frame.count()))

data_frame.write.save("s3a://my-bucket/test_data.parquet", mode="overwrite")

print("done")

此操作失败,并显示以下信息:

This fails with:

Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: No space available in any of the local directories.
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:366)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:416)

这在具有S3存储的Amazon EMR上运行.有足够的磁盘空间.谁能解释?

This is running on Amazon EMR with S3 storage. There is plenty of disk space. Can anyone explain?

推荐答案

在EMR上使用Spark 2.2时,我遇到了相同的错误. fs.s3a.fast.upload = true fs.s3a.buffer.dir ="/home/hadoop,/tmp" 设置(或其他任何文件夹这个问题)对我没有帮助.看来我的问题与洗牌空间有关.

I ran into the same error while using Spark 2.2 on EMR. The settings, fs.s3a.fast.upload=true and fs.s3a.buffer.dir="/home/hadoop,/tmp" (or any other folder for that matter) did not help me. It seems like my issue was related to shuffle space.

我必须在我的spark-submit/spark-shell中添加-conf spark.shuffle.service.enabled = true 来解决此错误.

I had to add --conf spark.shuffle.service.enabled=true to my spark-submit / spark-shell to resolve this error.

这篇关于火花.简单的“在任何本地目录中没有可用空间".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆