Hadoop无法完成工作,因为“设备上没有剩余空间" [英] Hadoop can't finish job because "No space left on device"

查看:124
本文介绍了Hadoop无法完成工作,因为“设备上没有剩余空间"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试运行一个非常简单的hadoop工作.它是对经典wordCount的修改,它不对单词进行计数,而是对文件中的行进行计数.我想用它来清理一堆我知道有重复项的大日志文件(每个约70GB).每行都是一个记录",因此我有兴趣仅获取一次每个记录.

I am trying to run a very simple hadoop job. It is a modification of the classic wordCount which, instead of counting words, it counts lines in a file. I want to use this to clean up a bunch of big log files (around 70GB each) that I know have duplications. Each line is a "record", and hence I am interested in just getting each record once.

我知道我的代码可以工作,因为当我用普通的小文件运行它时,它可以完成它应做的事情.当我使用大文件运行它时,Hadoop表现得很严格.首先,它可以在MAP阶段正常开始工作,该阶段通常可以毫无问题地达到100%.但是,在处理REDUCE时,它永远不会超过50%.可能达到40%,然后在显示一些设备上没有剩余空间"的异常后又回到0%:

I know my code works, because it does what it should do when I run it with small normal files. When I run it with big files, Hadoop behaves stringently. First, it starts correctly working on the MAP phase, which normally reaches 100% without problems. When dealing with REDUCE, however, it never reaches more than 50%. It reaches maybe 40%, and then go back to 0% after showing some "No space left on device" exceptions:

FSError: java.io.IOException: No space left on device

然后,它尝试再次执行REDUCE,当达到40%时,它再次下降到0%,依此类推.当然,它会这样做2到3次,然后才决定结束而没有成功.

Then it tries to do REDUCE again and, when it reaches 40%, it drops to 0% again and so forth. It does this 2 or 3 times before it decides to end without success, of course.

但是,此异常的问题在于它与磁盘上的实际空间无关.磁盘空间永远不会满.不是HDFS上的总(全局)空间,也不是每个节点中的单个磁盘.我通过以下方式检查fs状态:

The problem with this exception, though, is that it can't be related to the actual space on the disks. Disk space never gets full. Not the total (global) space on the HDFS, neither the individual disks in each node. I check the fs status with:

$ hadoop dfsadmin -report > report

此报告从不显示实际节点达到100%.实际上,没有节点能与之接近.

This report never shows an actual node reaching 100%. In fact, no node come close to that.

我每个节点上都有大约60GB的磁盘可用,并且我在具有60个数据节点的群集中运行该磁盘,这使我的总空间超过3TB.我要处理的文件只有70GB.

I have around 60GB of disk available in each node for me, and I run this in a cluster with 60 data nodes, which gives me a total space of more than 3TB. The file I am trying to process is only 70GB.

在Internet上看,我发现这可能与Hadoop在处理大量数据时创建太多文件有关.原始的wordCount代码大大减少了数据(因为单词重复很多).70GB的文件可以减少到仅7MB的输出.但是,我希望仅能减少1/3的输出,或大约20-30GB的输出.

Looking out there on the internet, I found that this can be related to Hadoop creating too many files while processing a lot of data. The original wordCount code reduces the data substantially (since words repeat a lot). A file with 70GB can be reduced to an output of just 7MB. However, I am expecting something like 1/3 reduction only, or an output of around 20-30GB.

Unix类型的系统每个进程最多可以打开1024个文件:

Unix-type systems come with a limit of 1024 open files per process:

$ ulimit -n
1024

如果hadoop正在创建更多内容,则可能是一个问题.我要求系统管理员将该限制增加到65K,即现在该限制为:

If hadoop is creating more than that, it could be a problem. I asked the system admin to increment that limit to 65K, which is that the limit is now:

$ ulimit -n
65000

问题继续.难道是我需要进一步增加此限制吗?这里还有其他事情吗?

Problems continue. Can this be that I need to increment this limit further? Is there something else going on here?

非常感谢您的帮助!

此处的代码:

package ...;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class LineCountMR {

  public static class MapperClass 
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    private String token = new String();        

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {

        token = value.toString().replace(' ', '_');
        word.set(token);
        context.write(word, one);   
    }
  }

  public static class ReducerClass 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
 }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();;
    if (args.length != 2) {
      System.err.println("Parameters: <in> <out>");
      System.exit(2);
    }
    Job job = new Job(conf, "line count MR");
    job.setJarByClass(LineCountMR.class);
    job.setMapperClass(MapperClass.class);
    job.setCombinerClass(ReducerClass.class);
    job.setReducerClass(ReducerClass.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

推荐答案

在处理10TB数据时,我已经在群集上看到此问题.此问题与HDFS上的空间可用性无关,而是与本地文件系统(df -h)上的可用空间有关,该文件系统用于存储在map-reduce操作期间生成的中间数据,该中间数据存储在本地而不是HDFS中.

I have seen this issue on the cluster while processing 10TB of data. This issue is not related to the space availability on HDFS but the space available on the local file system (df -h) used for storing the intermediate data generated during the map-reduce operation which is stored locally and not in HDFS.

这篇关于Hadoop无法完成工作,因为“设备上没有剩余空间"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆