将CSV数据转换为图形数据 [英] Convert csv data to graph data

查看:1151
本文介绍了将CSV数据转换为图形数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在试验Apache Giraph.我需要为驻留在HDFS中的csv文件创建一个简单图形,该图形显示2列之间的关系.(与商店名称有关的受害者) 我的数据大小超过1Gb csv格式.最初尝试通过Java与本地文件一起使用neo4j,但是它只能加载小数据,不能直接从HDFS导入数据.我的数据可能会增加.因此请考虑使用Apache Giraph.

但是如何实现相同呢?

希望Apache giraph仅以纯文本格式输入.我的数据为csv格式.因此,有没有什么工具可以使我的csv变为图形格式,并将其作为输入提供给Giraph,以便在图形中进行计算.

解决方案

我也有同样的疑问,尽管很多答复似乎都建议将图形重写为Giraph之外的标准格式,但这不是必需的. /p>

您应该检查标准类的实现:

Text"部分),该文件包含成对的整数顶点ID对(这是"Int"部分),格式为:

1   2
2   4
3   2
4   1
...

不考虑边缘元数据,仅考虑一对顶点(这是"Null"部分).

通过更改SEPARATOR可以很容易地将此​​示例适应CSV,或者通过将IntWritable转换为Text(对于其他类型也是如此)来考虑字符串ID.

稍后将选择输入格式作为传递给框架的属性(给出您希望用于解析输入数据的类的完全限定名称).

I am experimenting Apache Giraph.I need to create a simple graph for my csv file residing in HDFS,which shows a relationship between 2 columns.(victim related to store name) My data size is of above 1Gb csv format.Initially tried to use neo4j using java with local file.But it is only capable of loading small data and cannot import data directly from HDFS. My data may increase.So thought of using Apache Giraph.

But how to achieve the same?

Hope apache giraph only takes input in vertext format .My data is in csv format.so Is there any tool to make my csv to graph format and supply it as input to Giraph for computations in graph.

解决方案

I had the same doubts, and while a lot of responses seem to suggest to rewrite the graph into a standard format outside of Giraph, this is not necessary.

You should check out the implementation of the standard class:

https://apache.googlesource.com/giraph/+/refs/heads/trunk/giraph-core/src/main/java/org/apache/giraph/io/formats/IntNullTextEdgeInputFormat.java

This reads a TSV file (this is the "Text" part of the class name) containing pairs of integer vertex IDs (this is the "Int" part) of the form:

1   2
2   4
3   2
4   1
...

No edge meta-data is considered, just a pair of vertexes (this is the "Null" part).

This example can be readily adapted to CSV by changing the SEPARATOR, or to consider string ids by converting IntWritable to Text (likewise for other types).

The input format is selected later as a property you pass to the framework (giving the fully qualified name of the class you wish to use to parse the input data).

这篇关于将CSV数据转换为图形数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆