我如何使用Mahout的序列文件API代码? [英] How can I use Mahout's sequencefile API code?
问题描述
在Mahout中存在一个用于创建序列文件的命令,如 bin / mahout seqdirectory -c UTF-8
。我想使用这个命令作为代码API。
-i -o<输出地址>
您可以这样做:
import org.apache.hadoop.conf.Configuration;
导入org.apache.hadoop.fs.FileSystem;
导入org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
配置conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path outputPath = new Path(c:\\temp);
Text key = new Text(); //例如,这可以是另一种类型的类
Text value = new Text(); //例如,这可以是另一种类型的类
SequenceFile.Writer writer = new SequenceFile.Writer(fs,conf,outputPath,key.getClass(),value.getClass());
while(condition){
key =一些文本;
值=一些文字;
writer.append(key,value);
}
writer.close();
您可以找到更多信息这里和 here 另外,你可以使用 然后调用如下所示: 希望这有帮助。 There exists in Mahout a command for create sequence file as You can do something like this: You can find more information here and here Additionally, you could call the exact same functionality you described from Mahout by using the Then the call looks something like this: The Hope this was of help. 这篇关于我如何使用Mahout的序列文件API代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
来调用你从Mahout描述的完全相同的功能, org.apache.mahout.text.SequenceFilesFromDirectory
ToolRunner.run(new SequenceFilesFromDirectory(),String [] args //您的参数);
org.apache.hadoop.util.ToolRunner
bin/mahout seqdirectory -c UTF-8
-i <input address> -o <output address>
. I want use this command as code API. import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path outputPath = new Path("c:\\temp");
Text key = new Text(); // Example, this can be another type of class
Text value = new Text(); // Example, this can be another type of class
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, outputPath, key.getClass(), value.getClass());
while(condition) {
key = Some text;
value = Some text;
writer.append(key, value);
}
writer.close();
org.apache.mahout.text.SequenceFilesFromDirectory
ToolRunner.run(new SequenceFilesFromDirectory(), String[] args //your parameters);
ToolRunner
comes from org.apache.hadoop.util.ToolRunner