使用cassandra和elasticsearch后端制作我的titan db图 [英] Making my titan db graph with cassandra and elasticsearch backend

查看:484
本文介绍了使用cassandra和elasticsearch后端制作我的titan db图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是,我想存储产品,客户和卖家数据在泰坦图数据库,其中cassandra作为存储后端和elasticsearch作为索引后端。然后,我将查询该数据,以向客户和卖方提出建议。我不能达到我可以存储我自己的数据的点。由于数据将是巨大的,我将使用cassandra和elasticsearch。

My problem is that I want to store Product, customer and seller data in titan graph database which has cassandra as storage backend and elasticsearch as indexing backend. Then I ll be querying that data to make recommendations to both customer and seller. I am not able to get to the point where I can store my own data .Since data is going to be huge I ll be using cassandra and elasticsearch .

我有什么做到目前为止,我有cassandra,弹性搜索设置。
现在我可以运行bin / titan.sh start启动cassandra,es和gremlin服务器
我也可以通过

What I have done so far is that I have cassandra , elasticsearch set up. Now I can run bin/titan.sh start to start cassandra,es and gremlin server I can also play with graph of the gods data by

gremlin> graph = TitanFactory.open('conf/titan-cassandra-es.properties')
==>standardtitangraph[cassandrathrift:[127.0.0.1]]
gremlin> GraphOfTheGodsFactory.load(graph)
==>null

一种存储我的产品,客户和卖家图表数据的方式。这样它存储在cassandra和索引是在elasticsearch。

Now I am trying to find a way to store my product,customer and seller graph data. such that its stored on cassandra and indices are on elasticsearch.

我应该采取什么步骤来做到这一点。我的项目的主要语言是nodejs和java是出于问题,由于项目的限制。

What steps should I take to do that. My main language for the project is nodejs and java is out of question due to project constraints.

我的问题简而言之


  1. 如何存储我自己的数据以供数据库处理

  2. 一旦数据可用于处理。我会暴露一些http apis提出建议。写在java是出了问题
    由于一些约束。我应该如何进行(我想我只有gremlin作为替代)

我会感激你能点发现我的错误,并按正确的方向丢弃一些面包屑

I ll be grateful if you can point out my mistakes and drop some bread crumbs in the correct direction

推荐答案

如果不能使用Java, Groovy。至于

If you can't use Java then you are limited to using Groovy. As for


如何存储我自己的数据以供数据库处理

how to store my own data for titan db to process

侧面注

对于图形DB,有多种方式来存储此数据。如果你想真正地形式化你的数据的结构,我建议查看本体 OWL 主题地图这些可以作为如何在图形DB中正式化和结构化数据的极大灵感。

With a graph DB there are a multitudes of ways of storing this data. If you want to really formalise the structure of your data I would recommend looking into Ontologies, OWL, and Topic Maps these can serve as great inspiration for how to formalise and structure the data in a graph DB. These reads are only good if you looking for ways of very formally structuring data in graphs.

结构示例

现在假设您只想跟踪客户和他们购买的产品。一个简单的结构是客户产品都是具有从客户到产品的边缘的顶点,作为客户购买该产品的事实。我们甚至可以在这个边缘添加额外的数据,例如购买时间数量。下面是一个在Groovy中如何做的例子:

For now let's assume you just want to to track customers and the products they have bought. One simple structure is that both customers and products are vertices with an edge from a customer to a product serving as the fact that a customer has bought that product. We can even put additional data on that edge such as time of purchase and quantity. Here is an example of how to do that in Groovy:

g = TitanFactory.open("titan-cassandra-es.properties")
gremlin> customerBob = g.addVertex("Bob"); 
==>v[12]
gremlin> customerAlice = g.addVertex("Alice");
==>v[13]
gremlin> productFish = g.addVertex("Fish");
==>v[14]
gremlin> productMeat = g.addVertex("Meat");
==>v[15]
gremlin> edge = customerBob.addEdge("purchased", productMeat, "Day", "Friday", "Qauntity", 2);
==>e[16][12-purchased->15]
gremlin> edge = customerBob.addEdge("purchased", productFish, "Day", "Friday", "Qauntity", 1);
==>e[17][12-purchased->14]
gremlin> edge = customerAlice.addEdge("purchased", productMeat, "Day", "Monday", "Qauntity", 3);
==>e[18][13-purchased->15]

上面基本上说,鲍勃周五买了一些肉和鱼,而爱丽丝周一买了一些肉。如果我们想知道鲍勃周五买的东西,我们可以进行下面的遍历:

The above basically says that Bob bought some Meat and Fish on Friday while Alice bought some Meat on Monday. If we wanted to find out what Bob bought on Friday, we could make the following traversal

gremlin> g.traversal().V().hasLabel("Bob").outE("purchased").has("Day", "Friday").otherV().label();
==>Meat
==>Fish

索引

在深入了解索引之前,先了解一下结构。以下是关于使用Elasticsearch和Titan建立索引的非常骨架的解释:

Before really diving into indexing play around with understanding the structure. The following is a VERY skeletal explanation on indexing with Elasticsearch and Titan:

关于索引,知道titan有不同类型的索引, Composite 以顶点为中心混合都符合其目的,您应该阅读了解更多信息。

With regards to indexing, know that titan has different types of indices, Composite, Vertex-Centric, and Mixed all serve their purpose and you should read this for more info.

索引用于加快遍历和查找。所以你需要决定什么索引。对于我们的示例,我们希望快速了解在不同日期进行的所有购买。这意味着我们可以在边上放置一个混合索引来帮助我们(复合索引也很好,但是你在询问elasticsearch,所以我们要使用一个混合索引)。

Indexing is used to speed up traversals and lookups. So you need to decide what to index. For our example we want to quickly know all purchases made on different days. This means that we can put a mixed index on edges to help us (composite indices serve just as well but you are asking about elasticsearch so we going to use a mixed index).

要定义混合索引,首先定义一个简单的模式(更多信息此处):

To define a mixed index we start by defining a simple schema (more info here):

mgmt = graph.openManagement();
purchased = mgmt.makeEdgeLabel("purchased").multiplicity(MULTI).make();
day = mgmt.makePropertyKey("Day").dataType(String.class).make();

您不需要明确定义所有的模式,指数。现在您可以创建索引:

You don't need to explicitly define the schema for everything but it is essential for anything you want to index. Now you can create your index:

mgmt.buildIndex("productsPurchased", Edge.class).addKey(day).buildMixedIndex("search")
mgmt.commit() //"search" is defined in your titan-conf.properties file

使用此索引查询,例如:

With this index queries such as:

g.traversal().E().has("Day", "Friday")

会更快。

注意:您应该在加载数据之前创建索引和模式。它只是使事情更简单的长期。

Note: You should make your indices and schema before loading data. It just makes things simpler in the long run.

这篇关于使用cassandra和elasticsearch后端制作我的titan db图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆