通过分布式缓存访问Mapper中的文件 [英] Accesing file in Mapper through Distributed Cache
本文介绍了通过分布式缓存访问Mapper中的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在Mapper中访问分布式文件的内容。以下是我编写的用于生成分布式缓存文件名称的代码。请帮助我访问文件的内容
public class DistCacheExampleMapper扩展MapReduceBase实现Mapper< LongWritable,Text,Text,Text>
b $ b
{
Text a = new Text();
Path []日期=新路径[0];
public void configure(JobConf conf){
try {
dates = DistributedCache.getLocalCacheFiles(conf);
String astr = dates.toString();
a = new Text(astr);
$ b $ catch(IOException ioe){
System.err.println(Cached exception while getting cached files:+
StringUtils.stringifyException(ioe));
}
}
@Override
public void map(LongWritable key,Text value,OutputCollector< Text,Text> output,
Reporter记者)抛出IOException {
String line = value.toString(); (Path cacheFile:dates){
output.collect(new Text(line),new Text(cacheFile.getName()));
}
}
}
解决方案
列表< String []>线;
Path [] files = new Path [0];
public void configure(JobConf conf){
lines = new ArrayList<>();
BufferedReader SW;
尝试{
files = DistributedCache.getLocalCacheFiles(conf);
SW = new BufferedReader(new FileReader(files [0] .toString()));
字符串行; ((line = SW.readLine())!= null){
lines.add(line.split(,)); //现在,每行条目都是一个String数组,每个元素都是一个列
}
SW.close();
$ b $ catch(IOException ioe){
System.err.println(Cached exception while getting cached files:+
StringUtils.stringifyException(ioe));
$ / code $ / pre
这样你就可以得到文件的内容(在这里是第一个文件)放在分布式缓存中的变量lines
中。每个行
条目表示一个String数组,它由','分隔。因此,第一行的第一列是lines.get(0)[0]
,第二行的第三行是lines.get 1)[2]
等。I want to access the contents of the distributed file in my Mapper. Below is the code I have written which generates the name of the file for Distributed Cache. Please help me accessing the contents of the file
public class DistCacheExampleMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text > { Text a = new Text(); Path[] dates = new Path[0]; public void configure(JobConf conf) { try { dates = DistributedCache.getLocalCacheFiles(conf); String astr = dates.toString(); a = new Text(astr); } catch (IOException ioe) { System.err.println("Caught exception while getting cached files: " + StringUtils.stringifyException(ioe)); } } @Override public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { String line = value.toString(); for(Path cacheFile: dates){ output.collect(new Text(line), new Text(cacheFile.getName())); } } }
解决方案Try this instead in your configure() method:
List<String []> lines; Path[] files = new Path[0]; public void configure(JobConf conf) { lines = new ArrayList<>(); BufferedReader SW; try { files = DistributedCache.getLocalCacheFiles(conf); SW = new BufferedReader(new FileReader(files[0].toString())); String line; while ((line = SW.readLine()) != null) { lines.add(line.split(",")); //now, each lines entry is a String array, with each element being a column } SW.close(); } catch (IOException ioe) { System.err.println("Caught exception while getting cached files: " + StringUtils.stringifyException(ioe)); } }
This way, you will have the contents of the files (in this case the first file) in the Distributed Cache, in the variable
lines
. Eachlines
entry represent a String array, which is split by ','. So the first column of the first row islines.get(0)[0]
, the third row of the second line islines.get(1)[2]
, etc.这篇关于通过分布式缓存访问Mapper中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文