如何使用Bulkloadervertextprogram将数百万个顶点从CSV加载到Titan 1.0.0中? [英] How to load millions of vertices from CSV into Titan 1.0.0 using Bulkloadervertextprogram?

查看:252
本文介绍了如何使用Bulkloadervertextprogram将数百万个顶点从CSV加载到Titan 1.0.0中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用JAVA中的Cassandra后端将数百万个节点从CSV文件加载到Titan 1.0.0。如何加载它们?

I am trying to load millions of nodes from CSV files to Titan 1.0.0 with Cassandra backend in JAVA. How to load them?

我检查了是否可以使用 BulkLoaderVertexProgram ,但它以GraphSON格式加载数据。

I checked we can load them using BulkLoaderVertexProgram, but it loads the data from GraphSON format.

如何开始编写Java代码以批量加载CSV数据?您可以指定一些我可以研究并开始编写代码的开始参考吗?

How do I start writing a JAVA code to bulk load the data from CSV? Can you specify some starting reference where I can look into and start writing code?

我是否必须在系统上运行Spark / Hadoop才能使用Bulkloaderprogram使用的SparkComputerGraph?

Do I have to have Spark /Hadoop running on my system to use SparkComputerGraph which is used by Bulkloaderprogram?

我无法开始编写代码,因为我不了解如何使用bulkloderprogram从CSV读取数据。您可以提供一些开始链接以继续进行Java代码吗?

I am not able to start writing code, as I am not understanding how to read data from CSV using bulkloderprogram. Can you provide some starting links to proceed for Java code?

谢谢。

推荐答案

a href = https://groups.google.com/d/msg/aureliusgraphs/BawnoCvhKEk/gb1MkSHGCgAJ rel = nofollow> Titan邮件列表 ...

This was cross-posted on the Titan mailing list...

如果您想使用Java代码,请查看Alex和Matthew的Marvel图形示例:

If you're looking to use Java code, check out Alex's and Matthew's Marvel graph example:

https://github.com/ awslabs / dynamodb-titan-storage-backend / blob / 1.0.0 / src / main / java / com / amazon / titan / example / MarvelGraphFactory.java

它将创建Titan模式,解析CSV,然后使用基本的Gremlin addVertex()和addEdge()来构建图形。您会注意到TitanGraph本身并未在工厂中实例化,因此即使它位于Titan-DynamoDB示例中,您也可以将其与任何Titan后端(Cassandra,HBase,Berkeley)一起使用。

It creates a Titan schema, parses a CSV, and then uses basic Gremlin addVertex() and addEdge() to build the graph. You'll notice that the TitanGraph isn't instantiated in the factory itself, so even though it is inside a Titan-DynamoDB example, you can use this with any Titan backend (Cassandra, HBase, Berkeley).

如果图形数据在数百万美元以下,则可以在自己的计算机上使用Titan-BerkeleyJE图形,这可能比起Cassandra群集起初更容易使用后端。我建议您不要一开始就着迷于加载大量数据-先熟悉如何在OLTP中使用Titan和TinkerPop,然后再使用OLAP方法。

If your graph data is in the low millions, you could use a Titan-BerkeleyJE graph on your own machine, which might be an easier backend to use at first rather than a Cassandra cluster. I'd recommend that you do not get too caught up on loading a lot of data initially -- get comfortable with how to use Titan and TinkerPop with OLTP first and then move into OLAP approaches.

这篇关于如何使用Bulkloadervertextprogram将数百万个顶点从CSV加载到Titan 1.0.0中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆