使用Neo4j构建主数据管理 [英] Using Neo4j to build a Master Data Management

查看:241
本文介绍了使用Neo4j构建主数据管理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Neo4j来构建MDM.我只是想用某些属性(例如电子邮件,documentNumber,地址,电话,手机等)来对我们的客户数据库进行建模.

I am trying to use Neo4j to build an MDM. I am just trying to model our customer database with some properties, like email, documentNumber, address, phone, mobilephone and so on.

问题是我们的数据库太脏了.例如,我的用户具有相同的documentNumber(就像一个ssn.).当我查看这些注册表时,我可以看到它们实际上是同一个人.

The problem is that our database is too dirty. For example, I have users with same documentNumber (it is like a ssn.). And when I look to these registries I can see that they are actually the same person.

要通过关系发现模式,我需要删除/清除记录.但是当我删除记录时,我会担心丢失信息.

For discover pattern through relationship I need to dedup/clean records. But I am afraid of loosing information when I dedup the records.

第一种方法:

<customer>
    <name>Maria da Silva</name>
    <document>108518037-92</document>
    <phone>
        <areaCode>21</areaCode>
        <number>2247223A<number>
    <phone>
</customer>

<customer>
    <name>Maria da S.</name>
    <document>10851803792</document>
    <phone>
        <areaCode>21</areaCode>
        <number>2247-2236<number>
    <phone>
</customer>

所以我可以存储图形:(使用密码"语言)

So i could store the graph: (using "cypher" language)

person1:Person {name:"Maria da Silva", document:"108518037-92"}
phone1:Phone {areaCode:"21", number:"2247223A"}
person1-[owns]->phone1

person2:Person {name:"Maria da S", document:"10851803792"}
phone2:Phone {areaCode:"21", number:"2247-2236"}
person2-[owns]->phone2

然后我可以创建一个归一化/清理的节点:

And then I could create a normalized/cleaned nodes:

person_mdm:PersonMdm {name:"MARIA DA SILVA", document:"10851803792"} // now i have to choose a name
phone_mdm:PhoneMdm {areaCode:"21", number:"22472236"} // and choose a phone too

,然后将原始节点链接到标准化节点:

and then link the original nodes to the normalized nodes:

person_mdm-[references]->person1
person_mdm-[references]->person2

phone_mdm-[references]->phone1
phone_mdm-[references]->phone2
person_mdm-[owns]->phone_mdm

第二种方法

存储mdm节点以及包含哈希值的属性列表. 这些哈希引用其他数据库(例如MongoDB)中的一条记录:

Store the mdm nodes with a list of properties holding a hashes. These hashes references a record in other database (MongoDB for example):

person_mdm:PersonMdm {name:"MARIA DA SILVA", document:"10851803792", hash:[XXX, YYY]}
phone_mdm:PhoneMdm {areaCode:"21", number:"22472236", hash: [ZZZ, KKK]} 
person_mdm-[owns]->phone_mdm

第一种方法:

(+)与第二种方法相比很容易实现

(+) Its simple to implement in comparison of second approach

(+)我将所有节点都放在一个数据库中

(+) I will have all nodes in a single database

(-)节点爆炸数量

(-)查询更复杂

第二种方法:

(+)干净且易于查询

(-)MDM信息存储在两个不同的数据库中(维护)

(-) The MDM information are stored in two different database (maintenance)

(-)必须维护两个单独的数据库

(-) Must maintain two separate databases

推荐答案

我们通常采用第一种方法.类似于

We typically go for first approach. Something along the lines of

person1:Person {name:"Maria da Silva", document:"108518037-92"}
phone1:Phone {areaCode:"21", number:"2247223A"}
person1-[:OWNS]->phone1

person2:Person {name:"Maria da S", document:"10851803792"}
phone2:Phone {areaCode:"21", number:"2247-2236"}
person2-[:OWNS]->phone2

person1-[:SAME_AS]->person2

只要您没有数十亿个节点,我就不会担心节点的数量. Neo4j占用的空间非常小,因此可以处理许多节点.

I wouldn't worry about the number of nodes, as long as you don't have billions. Neo4j can handle a lot of nodes as they have a very small footprint.

当然,查询变得有些复杂.但另一方面,您必须在某处进行清除/重复数据删除,并且在查询时执行此操作可确保您不会丢失任何原始信息.它还使您可以灵活地更改/发展重复数据删除逻辑,甚至每个用例都有不同的逻辑.

Queries get a little more complicated, sure. But on the other hand, you have to do the cleanup/de-duplication somewhere, and doing that at query time ensures you don't lose any of the original information. It also and gives you the flexibility to change/evolve the de-duplication logic, or even have a different one per use-case.

这篇关于使用Neo4j构建主数据管理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆