使用cassandra和elasticsearch后端制作我的titan db图 [英] Making my titan db graph with cassandra and elasticsearch backend
问题描述
我的问题是,我想存储产品,客户和卖家数据在泰坦图数据库,其中cassandra作为存储后端和elasticsearch作为索引后端。然后,我将查询该数据,以向客户和卖方提出建议。我不能达到我可以存储我自己的数据的点。由于数据将是巨大的,我将使用cassandra和elasticsearch。
My problem is that I want to store Product, customer and seller data in titan graph database which has cassandra as storage backend and elasticsearch as indexing backend. Then I ll be querying that data to make recommendations to both customer and seller. I am not able to get to the point where I can store my own data .Since data is going to be huge I ll be using cassandra and elasticsearch .
我有什么做到目前为止,我有cassandra,弹性搜索设置。
现在我可以运行bin / titan.sh start启动cassandra,es和gremlin服务器
我也可以通过
What I have done so far is that I have cassandra , elasticsearch set up. Now I can run bin/titan.sh start to start cassandra,es and gremlin server I can also play with graph of the gods data by
gremlin> graph = TitanFactory.open('conf/titan-cassandra-es.properties')
==>standardtitangraph[cassandrathrift:[127.0.0.1]]
gremlin> GraphOfTheGodsFactory.load(graph)
==>null
一种存储我的产品,客户和卖家图表数据的方式。这样它存储在cassandra和索引是在elasticsearch。
Now I am trying to find a way to store my product,customer and seller graph data. such that its stored on cassandra and indices are on elasticsearch.
我应该采取什么步骤来做到这一点。我的项目的主要语言是nodejs和java是出于问题,由于项目的限制。
What steps should I take to do that. My main language for the project is nodejs and java is out of question due to project constraints.
我的问题简而言之
- 如何存储我自己的数据以供数据库处理
- 一旦数据可用于处理。我会暴露一些http apis提出建议。写在java是出了问题
由于一些约束。我应该如何进行(我想我只有gremlin作为替代)
我会感激你能点发现我的错误,并按正确的方向丢弃一些面包屑
I ll be grateful if you can point out my mistakes and drop some bread crumbs in the correct direction
推荐答案
如果不能使用Java, Groovy。至于
If you can't use Java then you are limited to using Groovy. As for
如何存储我自己的数据以供数据库处理
how to store my own data for titan db to process
侧面注
对于图形DB,有多种方式来存储此数据。如果你想真正地形式化你的数据的结构,我建议查看本体, OWL 和主题地图这些可以作为如何在图形DB中正式化和结构化数据的极大灵感。
With a graph DB there are a multitudes of ways of storing this data. If you want to really formalise the structure of your data I would recommend looking into Ontologies, OWL, and Topic Maps these can serve as great inspiration for how to formalise and structure the data in a graph DB. These reads are only good if you looking for ways of very formally structuring data in graphs.
结构示例
现在假设您只想跟踪客户和他们购买的产品。一个简单的结构是客户和产品都是具有从客户到产品的边缘的顶点,作为客户购买该产品的事实。我们甚至可以在这个边缘添加额外的数据,例如购买时间和数量。下面是一个在Groovy中如何做的例子:
For now let's assume you just want to to track customers and the products they have bought. One simple structure is that both customers and products are vertices with an edge from a customer to a product serving as the fact that a customer has bought that product. We can even put additional data on that edge such as time of purchase and quantity. Here is an example of how to do that in Groovy:
g = TitanFactory.open("titan-cassandra-es.properties")
gremlin> customerBob = g.addVertex("Bob");
==>v[12]
gremlin> customerAlice = g.addVertex("Alice");
==>v[13]
gremlin> productFish = g.addVertex("Fish");
==>v[14]
gremlin> productMeat = g.addVertex("Meat");
==>v[15]
gremlin> edge = customerBob.addEdge("purchased", productMeat, "Day", "Friday", "Qauntity", 2);
==>e[16][12-purchased->15]
gremlin> edge = customerBob.addEdge("purchased", productFish, "Day", "Friday", "Qauntity", 1);
==>e[17][12-purchased->14]
gremlin> edge = customerAlice.addEdge("purchased", productMeat, "Day", "Monday", "Qauntity", 3);
==>e[18][13-purchased->15]
上面基本上说,鲍勃周五买了一些肉和鱼,而爱丽丝周一买了一些肉。如果我们想知道鲍勃周五买的东西,我们可以进行下面的遍历:
The above basically says that Bob bought some Meat and Fish on Friday while Alice bought some Meat on Monday. If we wanted to find out what Bob bought on Friday, we could make the following traversal
gremlin> g.traversal().V().hasLabel("Bob").outE("purchased").has("Day", "Friday").otherV().label();
==>Meat
==>Fish
索引
在深入了解索引之前,先了解一下结构。以下是关于使用Elasticsearch和Titan建立索引的非常骨架的解释:
Before really diving into indexing play around with understanding the structure. The following is a VERY skeletal explanation on indexing with Elasticsearch and Titan:
关于索引,知道titan有不同类型的索引, Composite ,以顶点为中心和混合都符合其目的,您应该阅读此了解更多信息。
With regards to indexing, know that titan has different types of indices, Composite, Vertex-Centric, and Mixed all serve their purpose and you should read this for more info.
索引用于加快遍历和查找。所以你需要决定什么索引。对于我们的示例,我们希望快速了解在不同日期进行的所有购买。这意味着我们可以在边上放置一个混合索引来帮助我们(复合索引也很好,但是你在询问elasticsearch,所以我们要使用一个混合索引)。
Indexing is used to speed up traversals and lookups. So you need to decide what to index. For our example we want to quickly know all purchases made on different days. This means that we can put a mixed index on edges to help us (composite indices serve just as well but you are asking about elasticsearch so we going to use a mixed index).
要定义混合索引,首先定义一个简单的模式(更多信息此处):
To define a mixed index we start by defining a simple schema (more info here):
mgmt = graph.openManagement();
purchased = mgmt.makeEdgeLabel("purchased").multiplicity(MULTI).make();
day = mgmt.makePropertyKey("Day").dataType(String.class).make();
您不需要明确定义所有的模式,指数。现在您可以创建索引:
You don't need to explicitly define the schema for everything but it is essential for anything you want to index. Now you can create your index:
mgmt.buildIndex("productsPurchased", Edge.class).addKey(day).buildMixedIndex("search")
mgmt.commit() //"search" is defined in your titan-conf.properties file
使用此索引查询,例如:
With this index queries such as:
g.traversal().E().has("Day", "Friday")
会更快。
注意:您应该在加载数据之前创建索引和模式。它只是使事情更简单的长期。
Note: You should make your indices and schema before loading data. It just makes things simpler in the long run.
这篇关于使用cassandra和elasticsearch后端制作我的titan db图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!