Apache Giraph中具有复杂值的顶点 [英] Vertices with complex values in Apache Giraph
问题描述
我试图将包含相关顶点信息的文本文件读入Giraph:每行都是
vertex_id attribute_1 attribute_2 ..... attribute_n
strong>其中每个属性都是一个字符串。
目标是创建一个顶点,其中所有这些属性都是顶点值的一部分。
查找各种输入格式,我无法找到任何开箱即用的东西,所以我假设我必须从 VertexValueInputFormat 派生我的顶点输入类>(我有一个单独的阅读器边缘)。
问题是:如何?我创建了一个Value类,其中包含一个String []数组,但是如何将它交给Giraph / Hadoop?这里是一个单行的阅读器:
受保护的抽象V getValue(org.apache.hadoop.io.Text行)
思路是,V将是一个 ArrayWritable ,但似乎并不喜欢它。
任何线索?如果你的顶点有一个自定义值(在你的case字符串数组中),那么你需要有一个自定义的顶点值类和自定义顶点输入格式。
作为一个例子,看看一个非常简单的自定义顶点类。这个类有一个 double
值,一个 int
和一个 long
: https://gist.github.com/sar-vivek/df09cca17cc3f6b5ac60
note - 您必须相应地覆盖 readFields()
和 write()
。
然后你需要有一个自定义的顶点输入格式。对于上面的顶点类,我已经修改了一些内置的json顶点读取器。以下是示例 - https://gist.github.com/sar-vivek/f39edacec6d9a43c3717 [注意如何在第68行上设置顶点的值]。
I am trying to read some text file containing relevant vertices information into Giraph: each line is
vertex_id attribute_1 attribute_2 .....attribute_n
where each attribute is a string.
The goal would be to create a vertex where all these attributes are part of vertex's value.
Looking up the various input formats I could not find anything out of the box, so I assume I have to derive my vertex input class from VertexValueInputFormat (I have a separate reader for edges).
Problem is: how? I have created a a Value class which contains a String[] array, but how do I hand it over to Giraph/Hadoop? Here is a reader for a single line:
protected abstract V getValue(org.apache.hadoop.io.Text line)
The thought was, V will be an ArrayWritable, but does not seem to like it.
Any clue? Thanks
If your vertex has a custom value (in your case array of string), then you need to have a custom vertex value class and a custom vertex input format.
As an example, take a look at a very simple custom vertex class. This class has a double
value, an int
, and a long
: https://gist.github.com/sar-vivek/df09cca17cc3f6b5ac60
note - you must override readFields()
and write()
accordingly.
Then you need to have a custom vertex input format. For above vertex class, I have modified the in-built json vertex reader a little bit. Here is the example - https://gist.github.com/sar-vivek/f39edacec6d9a43c3717 [notice how the value of a vertex is set on line 68].
这篇关于Apache Giraph中具有复杂值的顶点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!