我如何使用Mahout的序列文件API代码? [英] How can I use Mahout's sequencefile API code?

查看:119
本文介绍了我如何使用Mahout的序列文件API代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Mahout中存在一个用于创建序列文件的命令,如 bin / mahout seqdirectory -c UTF-8
-i -o<输出地址>
。我想使用这个命令作为代码API。

解决方案

您可以这样做:

  import org.apache.hadoop.conf.Configuration; 
导入org.apache.hadoop.fs.FileSystem;
导入org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;


配置conf = new Configuration();
FileSystem fs = FileSystem.get(conf);

Path outputPath = new Path(c:\\temp);

Text key = new Text(); //例如,这可以是另一种类型的类
Text value = new Text(); //例如,这可以是另一种类型的类

SequenceFile.Writer writer = new SequenceFile.Writer(fs,conf,outputPath,key.getClass(),value.getClass());

while(condition){

key =一些文本;
值=一些文字;

writer.append(key,value);
}

writer.close();

您可以找到更多信息这里 here

另外,你可以使用来调用你从Mahout描述的完全相同的功能, org.apache.mahout.text.SequenceFilesFromDirectory



然后调用如下所示:

  ToolRunner.run(new SequenceFilesFromDirectory(),String [] args //您的参数); 

ToolRunner 来自 org.apache.hadoop.util.ToolRunner



希望这有帮助。


There exists in Mahout a command for create sequence file as bin/mahout seqdirectory -c UTF-8 -i <input address> -o <output address>. I want use this command as code API.

解决方案

You can do something like this:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;


Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);

Path outputPath = new Path("c:\\temp");

Text key = new Text(); // Example, this can be another type of class
Text value = new Text(); // Example, this can be another type of class

SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, outputPath, key.getClass(), value.getClass());

while(condition) {

    key = Some text;
    value = Some text;

    writer.append(key, value);
}

writer.close();

You can find more information here and here

Additionally, you could call the exact same functionality you described from Mahout by using the org.apache.mahout.text.SequenceFilesFromDirectory

Then the call looks something like this:

ToolRunner.run(new SequenceFilesFromDirectory(), String[] args //your parameters);

The ToolRunner comes from org.apache.hadoop.util.ToolRunner

Hope this was of help.

这篇关于我如何使用Mahout的序列文件API代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆