Apache Giraph中具有复杂值的顶点 [英] Vertices with complex values in Apache Giraph

查看:171
本文介绍了Apache Giraph中具有复杂值的顶点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将包含相关顶点信息的文本文件读入Giraph:每行都是

vertex_id attribute_1 attribute_2 ..... attribute_n

strong>



其中每个属性都是一个字符串。

目标是创建一个顶点,其中所有这些属性都是顶点值的一部分。



查找各种输入格式,我无法找到任何开箱即用的东西,所以我假设我必须从 VertexValueInputFormat 派生我的顶点输入类>(我有一个单独的阅读器边缘)。

问题是:如何?我创建了一个Value类,其中包含一个String []数组,但是如何将它交给Giraph / Hadoop?这里是一个单行的阅读器:



https://giraph.apache.org/giraph-core/apidocs/org/apache/giraph/io/formats/TextVertexValueInputFormat.TextVertexValueReaderFromEachLine.html a>



受保护的抽象V getValue(org.apache.hadoop.io.Text行)



思路是,V将是一个 ArrayWritable ,但似乎并不喜欢它。



任何线索?如果你的顶点有一个自定义值(在你的case字符串数组中),那么你需要有一个自定义的顶点值类和自定义顶点输入格式。
作为一个例子,看看一个非常简单的自定义顶点类。这个类有一个 double 值,一个 int 和一个 long
https://gist.github.com/sar-vivek/df09cca17cc3f6b5ac60
note - 您必须相应地覆盖 readFields() write()

然后你需要有一个自定义的顶点输入格式。对于上面的顶点类,我已经修改了一些内置的json顶点读取器。以下是示例 - https://gist.github.com/sar-vivek/f39edacec6d9a43c3717 [注意如何在第68行上设置顶点的值]。

I am trying to read some text file containing relevant vertices information into Giraph: each line is

vertex_id attribute_1 attribute_2 .....attribute_n

where each attribute is a string.

The goal would be to create a vertex where all these attributes are part of vertex's value.

Looking up the various input formats I could not find anything out of the box, so I assume I have to derive my vertex input class from VertexValueInputFormat (I have a separate reader for edges).

Problem is: how? I have created a a Value class which contains a String[] array, but how do I hand it over to Giraph/Hadoop? Here is a reader for a single line:

https://giraph.apache.org/giraph-core/apidocs/org/apache/giraph/io/formats/TextVertexValueInputFormat.TextVertexValueReaderFromEachLine.html

protected abstract V getValue(org.apache.hadoop.io.Text line)

The thought was, V will be an ArrayWritable, but does not seem to like it.

Any clue? Thanks

解决方案

If your vertex has a custom value (in your case array of string), then you need to have a custom vertex value class and a custom vertex input format. As an example, take a look at a very simple custom vertex class. This class has a double value, an int, and a long : https://gist.github.com/sar-vivek/df09cca17cc3f6b5ac60 note - you must override readFields() and write() accordingly.

Then you need to have a custom vertex input format. For above vertex class, I have modified the in-built json vertex reader a little bit. Here is the example - https://gist.github.com/sar-vivek/f39edacec6d9a43c3717 [notice how the value of a vertex is set on line 68].

这篇关于Apache Giraph中具有复杂值的顶点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆