合并相同的数据库到一个 [英] Merge identical databases into one

查看:154
本文介绍了合并相同的数据库到一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有75桌15个数据库具有一百万行的avarage。所有具有相同的模式,但不同的数据。我们现在已经被客户端提供的要求,使所有16个到一个数据库中。每一套过滤用户的登录数据。

We have 15 databases of 75 tables with an avarage of a million rows. all with the same schema but different data. We have now been given the requirements by the client to bring all 15 into one database. Each set of data filtered by the user’s login.

在更改应用程序已完成,做过滤。我们现在只剩下合并的所有数据库到一个任务。

The changes to the application have been completed to do the filtering. We are now left with the task of merging all databases into one.

该问题是矛盾的PK和FK作为PK的和FK的都是int类型,所以我们将有1 15 PK标识。

The issue is conflicting PK and FK as the PK’s and the FK’s are of type int so we will have 15 PK ids of 1.

一种想法是使用。网和DBML的记录作为新记录插入到新的数据库出租LINQ处理的PK和FK,用code处理重复数据。

One idea is to use. net and the DBML to insert the records as new records into the new database letting linq deal with the PK and FK and using code to deal with duplicate data.

在那里做什么别的办法吗?

What other ways are there to do this?

推荐答案

这是从来没有一个很简单的工作,整合数据库时的记录并不一定在所有数据库中的唯一主键。几个星期前,我为我决定使用Entity Framework的一个类似的集成脚本。

It's never a trivial job to integrate databases when the records don't have unique primary keys in all databases. A few weeks ago I built a similar integration script for which I decided to use Entity Framework.

首先在好消息。随着EF的的DbContext API是可笑容易插入一个完整的对象图,使EF采取一切新生成主键的外键的照顾。之所以这样,是很容易的是,当一个对象的状态更改为添加所有的,其秉承的对象变成添加以及和EF计算出刀片的正确顺序。这是真正伟大的!这让我在几个小时内,这将是许多的天,如果我应该在T-SQL做了,例如打造的拷贝过程的核心。后者是非常非常容易出错了。

First the good news. With EF's DbContext API it's ridiculously easy to insert a complete object graph and make EF take care of all newly generated primary keys an foreign keys. The reason why this is so easy is that when an object's state is changed to Added all of its adhering objects become Added as well and EF figures out the right order of inserts. This is truly great! It made me build the core of the copy routine in a few hours, which would have been many days if I should have done it in T-SQL for example. The latter is much much more error prone too.

当然,生活不是的的方便。现在的坏消息:

Of course life isn't that easy. Now the bad news:

  1. 这需要吨计算机资源。当然,我用了一个新的上下文实例的每个副本的一步,但我仍然不得不与一个体面的处理器和内存相当数量的机器上执行程序。确切的规格并不重要,消息是:测试的最大的数据库,看看什么样的野兽,你所需要的。如果内存消耗不能被任何机器在您的处置管理,你要分裂成更小的块常规,但这需要更多的节目。

  1. This takes tons of machine resources. Of course I used a new context instance for each copy step, but still I had to execute the program on a machine with a decent processor and a fair amount of internal memory. The exact specifications don't matter, the message is: test with the largest databases and see what kind of beast you need. If the memory consumption can't be managed by any machine at your disposal, you have to split up the routine in smaller chunks, but that will take more programming.

这已经改变,以添加的对象图必须发散。我的意思是说,应该只有从根开始 1-N 的关联。其原因是,EF真的会标记所有对象为添加。因此,如果某个地方在图中的几个分支指回到同一个对象(因为有一个 N-1 协会),这些新的对象将成倍增加,因为EF不知道他们的身份。这方面的一个例子是公司 - <客户 - < 订单> - 订单类型:当只有2订单类型,插入一个根公司,拥有10个客户,每个10单将在2代替创建100个订单类型的记录。

The object graph that's changed to Added must be divergent. By this I mean that there should only be 1-n associations starting from the root. The reason is, EF will really mark all objects as Added. So if somewhere in the graph a few branches refer back to the same object (because there is a n-1 association), these "new" objects will be multiplied, because EF doesn't know their identity. An example of this could be Company -< Customer -< Order >- OrderType: when there are only 2 order types, inserting one root company with 10 customers with 10 orders each will create 100 order type records in stead of 2.

所以,难的是找到路径的阶级结构是不同的,尽可能。这将不总是可能的。如果是这样,你必须先添加收敛路径的叶子。在这个例子中:首先​​插入订单类型。当一个新的公司被插入你先装入现有的订单类型到右键,然后加入该公司。现在,新订单链接到现有的订单类型。这只能做,如果你能匹配自然键的对象(在这个例子中:订单类型名称),但通常这是可能的。

So the hard part is to find paths your class structure that are divergent as much as possible. This won't always be possible. If so, you'll have to add the leaves of the converging paths first. In the example: first insert order types. When a new company is inserted you first load the existing order types into the context and then add the company. Now link the new orders to the existing order types. This can only be done if you can match objects by natural keys (in this example: the order type names), but usually this is possible.

您必须注意不要插入主数据多个副本。假设在previous例如,订单类型在所有数据库中相同(虽然他们的主键可能有所不同)。从源数据库中的订单类型不应该被重新插入目标数据库中。此外,必须修复源数据到目标数据库中正确的记录的参考值(再次通过自然键匹配)。

You must take care not to insert multiple copies of master data. Suppose the order types in the previous example are the same in all databases (although their primary keys may differ!). The order types from the source database should not be reinserted in the target database. Moreover, you must fix the references in the source data to the correct records in the target database (again by matching by natural key).

因此​​,虽然这不是微不足道它是可行的,且作业是在相对较短的时间内完成。我敢肯定,其他的替代品(T-SQL,集成服务,出价,如果可行的话)会采取更多的时间,或者会被更多的越野车。并在此领域有缺陷的问题是,它们可能会变得明显要晚得多。

So although it wasn't trivial it was doable and the job was done in a relatively short time. I'm sure that other alternatives (t-SQL, integration services, BIDS, if doable at all) would have taken more time or would have been more buggy. And the problem with bugs in this area is that they may become apparent much later.

后来我才发现,我在2说明问题)都涉及到获取源对象与 AsNoTracking 。看到这个有趣的帖子:实体框架6 - 用我的getHash code()。我用 AsNoTracking ,因为它执行得更好,并减少内存消耗。

I later found out that the issues I describe under 2) are related to fetching the source objects with AsNoTracking. See this interesting post: Entity Framework 6 - use my getHashCode(). I used AsNoTracking because it performs better and it reduces memory consumption.

这篇关于合并相同的数据库到一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆