有什么方法将数百万个节点和边从0.44迁移到0.5? [英] What are the methods to migrate millions of nodes and edges from 0.44 to 0.5?

查看:63
本文介绍了有什么方法将数百万个节点和边从0.44迁移到0.5?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要将整个Titan图形数据库从0.44迁移到0.5.大约有1.2亿个节点和9000万个边缘(千兆字节).我尝试了GraphML格式,但是没有用.

I'm migrating the entire Titan graph database from 0.44 to 0.5. There are about 120 million nodes and 90 million edges that's gigabytes of data. I tried the GraphML format, but it didn't work.

您能建议进行迁移的方法吗?

Can you suggest methods to do the migration?

推荐答案

按照您描述的大小,您可能会使用Titan-Hadoop/Faunus执行最有效的迁移.一般过程将是:

At the size you are describing you would probably execute the most efficient migration by using Titan-Hadoop/Faunus. The general process would be to:

  1. 使用Faunus 0.4.x从图形中以 GraphSON 并将其存储在HDFS中
  2. 使用Titan-Hadoop 0.5.x读取GraphSON并写回到您的存储后端.
  1. Use Faunus 0.4.x to extract the data from your graph as GraphSON and store that in HDFS
  2. Use Titan-Hadoop 0.5.x to read the GraphSON and write back to your storage backend.

在执行步骤2之前,请确保已在目标后端中创建了架构.

Make sure that you've created your schema in your target backend prior to executing step 2.

顺便说一句,对于这种大小的图形,GraphML并不是一种很好的格式-如果要完全工作,它将花费很长时间并且需要大量资源.您可能会想知道,如果使用Faunus/Titan Hadoop,为什么不使用Sequence文件……在这种情况下之所以无法使用,是因为我认为在0.4.x和0.5.x之间存在版本差异序列文件的文件格式.换句话说,0.5.x无法读取0.4.x序列文件.两种版本均可读取GraphSON,因此它是理想的迁移格式.

As an aside, GraphML is not a good format for a graph of this size - it's will take too long and require a lot of resources if it would work at all. You might wonder why you wouldn't use Sequence files if you are using Faunus/Titan Hadoop...the reason you can't in this case is because I believe that there were version differences between 0.4.x and 0.5.x with respect to the file format of Sequence files. In other words, 0.5.x can't read 0.4.x sequence files. GraphSON is readable by both versions so it makes for an ideal migration format.

这篇关于有什么方法将数百万个节点和边从0.44迁移到0.5?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆