将多个表复制到一个表(从多个数据库) [英] Copy Multiple Tables into ONE Table (From Multiple Databases)

查看:40
本文介绍了将多个表复制到一个表(从多个数据库)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个相同的数据库(分布在多个服务器上),需要将它们收集到一个点进行数据挖掘等.

I've got multiple identical databases (distributed on several servers) and need to gather them to one single point to do data mining, etc.

想法是从每个数据库中获取 Table1 Table2 ,..., TableN ,并将它们合并并将结果合并为一个一个大数据库.

The idea is to take Table1, Table2, ..., TableN from each database and merge them and put the result into one single big database.

为了能够编写查询并知道每一行来自哪个数据库,我们将在目标表中添加一列 DatabaseID ,以描述行的来源.不能选择编辑源表,它属于某些专有软件.

To be able to write queries, and to know from which database each row came from we will add a single column DatabaseID to target table, describing where the row came from. Editing the source tables is not an option, it belongs to some proprietary software.

我们有约40个服务器,约170个数据库,并且需要复制约40个表.

We've got ~40 servers, ~170 databases and need to copy ~40 tables.

现在,鉴于它应该是:

  • 易于设置
  • 易于维护
  • 如果数据库架构发生更改,最好进行调整
  • 可靠,如果出现故障则记录/报警
  • 添加更多要复制的表并不难

我们已经研究了SSIS,但似乎我们必须将每个表都添加为源/转换/目标.我猜它也将与数据库架构紧密联系在一起.对吧?

We've looked into SSIS, but it seemed that we would have to add each table as a source/transformation/destination. I'm guessing it would also be quite tied to the database schema. Right?

另一个选择是使用SQL Server复制,但是我看不到如何在每个表中添加 DatabaseID 列.似乎只能复制数据,而不能修改数据.也许我们可以将所有数据复制到单独的数据库中,然后在目标服务器上运行本地作业以合并表?如果我们需要添加更多表进行复制,这似乎也需要大量工作,因为我们必须为每个数据库重新分发新出版物(手动工作?).

Another option would be to use SQL Server Replication, but I don't see how to add the DatabaseID column to each table. It seems it's only possible to copy data, not modify it. Maybe we could copy all the data into separate databases, and then to run a local job on the target server to merge the tables? It also seems like a lot of work if we'd need to add more tables to copy, as we'd have to redistribute new publications for each database (manual work?).

最后一个选项(?)是编写满足我们需求的自定义应用程序.投入了更多时间,但至少可以完全满足我们的要求.

Last option (?) is to write a custom application to our needs. Bigger time investment, but it'd at least do precisely what we'd like.

更糟糕的是,我们正在使用Microsoft SQL Server 2000.我们将在6个月内升级到SQL Server 2008 R2,但我们希望该项目能尽快上线.

To make it worse... we're using Microsoft SQL Server 2000. We will upgrade to SQL Server 2008 R2 within 6 months, but we'd like the project to be usable sooner.

让我知道你们的想法!

更新20110721

我们最终得到了一个F#程序,该程序打开了一个到SQL Server的连接,我们希望使用该数据库来聚合数据库.从那里,我们查询40个链接的SQL Server,以从某些表中获取所有行(但不是所有列),并在每个表中添加额外的一行以说明该行来自哪个DatabaseID.要从中获取哪些服务器,哪些表和哪些列的服务器配置是文本文件配置和硬编码值的组合(heh:D).它不是超级快(到目前为止,是顺序提取),但是绝对是可管理的,我们之后进行的数据处理将花费更长的时间.

We ended up with a F# program opening a connection to the SQL Server where we would like the aggregated databases. From there we query the 40 linked SQL Servers to fetch all rows (but not all columns) from some tables, and add an extra row to each table to say which DatabaseID the row came from. Configuration of servers to fetch from, which tables and which columns, is a combination of text file configuration and hard coded values (heh :D). It's not super fast (sequential fetching so far) but it's absolutely manageable, and the data processing we do afterwards takes far longer time.

未来的改进可能是

  • 如果发现有问题(例如服务器不在线等),则改进错误处理.
  • 实施并行获取,以减少完成获取的总时间.
  • 弄清楚是否足以提取某些行,例如仅添加/更新的行.

总而言之,它非常简单,不依赖于其他产品,并且在实践中效果很好.

All in all it turned out to be quite simple, no dependencies to other products, and it works well in practice.

推荐答案

没什么好想的,但您不能做类似的事情

Nothing fancy but couldn't you do something like

DROP TABLE dbo.Merged

INSERT INTO dbo.Merged
SELECT  [DatabaseID] = "Database1", * FROM ServerA.dbo.Table
UNION ALL SELECT  [DatabaseID] = "Database2", * FROM ServerB.dbo.Table
...
UNION ALL SELECT  [DatabaseID] = "DatabaseX", * FROM ServerX.dbo.Table

优势

  • 易于设置
  • 易于维护
  • 易于调整
  • 易于添加更多表

缺点

  • 性能
  • 可靠的日志记录

这篇关于将多个表复制到一个表(从多个数据库)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆