用于Spark集群和Cassandra的JanusGraph的设置和配置 [英] Setup and configuration of JanusGraph for a Spark cluster and Cassandra
问题描述
我在一台机器上运行带有Spark(1.6.1)的JanusGraph(0.1.0). 我按照此处. 当使用SparkGraphComputer访问gremlin控制台上的图形时,该图形始终为空.我在日志文件中找不到任何错误,它只是一个空图.
I am running JanusGraph (0.1.0) with Spark (1.6.1) on a single machine. I did my configuration as described here. When accessing the graph on the gremlin-console with the SparkGraphComputer, it is always empty. I cannot find any error in the logfiles, it is just an empty graph.
是否有人可以将JanusGraph与Spark一起使用,并且可以共享其配置和属性?
使用JanusGraph,我得到了预期的输出:
Using a JanusGraph, I get the expected Output:
gremlin> graph=JanusGraphFactory.open('conf/test.properties')
==>standardjanusgraph[cassandrathrift:[127.0.0.1]]
gremlin> g=graph.traversal()
==>graphtraversalsource[standardjanusgraph[cassandrathrift:[127.0.0.1]], standard]
gremlin> g.V().count()
14:26:10 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [()]. For better performance, use indexes
==>1000001
gremlin>
使用Spark作为GraphComputer的HadoopGraph,该图为空:
Using a HadoopGraph with Spark as GraphComputer, the graph is empty:
gremlin> graph=GraphFactory.open('conf/test.properties')
==>hadoopgraph[cassandrainputformat->gryooutputformat]
gremlin> g=graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cassandrainputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().count()
==>0==============================================> (14 + 1) / 15]
我的conf/test.properties:
My conf/test.properties:
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.cassandra.CassandraInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
#
# Titan Cassandra InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=cassandrathrift
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.keyspace=janusgraph
storage.backend=cassandrathrift
storage.hostname=127.0.0.1
storage.keyspace=janusgraph
#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.keyspace=janusgraph
cassandra.input.predicate=0c00020b0001000000000b000200000000020003000800047fffffff0000
cassandra.input.columnfamily=edgestore
cassandra.range.batch.size=2147483647
#
# SparkGraphComputer Configuration
#
spark.master=spark://127.0.0.1:7077
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.executor.memory=100g
gremlin.spark.persistContext=true
gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer
HDFS似乎按照此处所述正确配置:>
HDFS seems to be configured correctly as described here:
gremlin> hdfs
==>storage[DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_178390072_1, ugi=cassandra (auth:SIMPLE)]]]
推荐答案
尝试修复以下属性:
janusgraphmr.ioformat.conf.storage.keyspace=janusgraph
storage.keyspace=janusgraph
替换为:
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph
storage.cassandra.keyspace=janusgraph
默认键空间名称为janusgraph
,因此,尽管属性名称存在错误,但除非您使用其他键空间名称加载数据,否则我认为您不会观察到该问题.
The default keyspace name is janusgraph
, so despite the mistakes on the property names, I don't think you would have observed that problem unless you loaded your data using a different keyspace name.
后一个属性在配置参考中进行了描述.另外,请密切注意未解决的问题,以改善Hadoop-Graph使用情况的文档.
The latter property is described in the Configuration Reference. Also, keep an eye on this open issue to improve the docs for Hadoop-Graph usage.
这篇关于用于Spark集群和Cassandra的JanusGraph的设置和配置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!