我应该使用什么型号/模式来处理多个数据源? [英] What model/pattern should I use for handling multiple data sources?

查看:94
本文介绍了我应该使用什么型号/模式来处理多个数据源?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为一个电子商务系统,我需要设计并实现各种各样的黑盒的一部分。这需要接受客户和订单从各种内部和外部数据源(例如,网站,外联网,雅虎商店,亚马逊提供XML等),并插入/更新后端系统。 API的插入/更新客户和订单数据在地方网站和运作良好。现在,我们需要增加处理来自其他数据源订单的能力。

As part of an ecommerce system I need to design and implement a blackbox of sorts. This needs to accept customers and orders from various internal and external data sources (i.e., web site, extranet, yahoo store, amazon xml feeds, etc.) and insert/update a backend system. The api's to insert/update customer and order data are in place for the web site and working well. Now we need to add the ability to process orders from other data sources.

我一直倾向于提供者模型(一个供应商为每个数据源),并用它调用API的实际添加客户下订单之前的数据到SQL Server表规范。是否有其他型号或者说,我应该考虑的模式?你有没有处理过这个问题,你是如何解决的呢?是否有任何资源(文章,书籍,项目等)我应该看看?

I've been leaning towards the provider model (one provider for each data source) and using it to standardize data into sql server tables before calling the api's to actually add customers and place orders. Are there other models or patterns that I should consider? Have you dealt with this issue before and how did you solve it? Are there any resources (articles, books, projects, etc) I should look at?

推荐答案

您可能会发现一个ETL(提取 - 转换 - 加载)工具会让你的生活不是试图解决在code您的问题更容易。

You may find an ETL (Extract-Transform-Load) tool will make your life easier than trying to solve your problem in code.:

  • SSIS (SQL Server Integration Services)
  • ODI (Oracle Data Integrator)
  • Informatica PowerCenter
  • Many others

这些都是你所描述的数据加载工作类型而设计的。

These are designed specifically for the type of data loading work that you described.

修改

虽然我仍然认为,一个ETL工具将最好地满​​足您的需求,如果你坚持在code这样做,你应该想想实施ETL作为图案。这样做的原因是,ETL是从各种来源的数据加载一个完善的最佳做法。你应该花一些时间来研究ETL是如何实现的。

While I still maintain that an ETL tool will best serve your needs, if you insist on doing it in code, you should think about implementing ETL as a pattern. The reason for this is that ETL is a well established best practice for loading data from various sources. You should take some time to study how ETL is implemented.

在基本层面,你应该有三个层次,提取层,转化层和装载层。

At a basic level, you should have three layers, an extraction layer, a transformation layer, and a loading layer.

提取层应负责从源检索的数据。它不应该担心数据在这个点的形状。为了保持清洁层,你应该只执行code,这里获取的数据。担心在转换层塑造它。

The extraction layer should be responsible for retrieving the data from the source. It should not worry about the shape of the data at this points. To keep the layer clean, you should only implement code that "gets" the data here. Worry about shaping it in the transformation layer.

转化层应负责接收来自各种来源提取的数据并将其转换到目标的形状。 ETL工具,通过处理数据管道这样做非常有效。这些可以分开,并行化。你可能不会有时间或资源来做到这一点。另一种可能是加载数据到临时表(数据的标准化少再presentation)。

The transformation layer should be responsible for taking data extracted from various sources and transforming it to the destination's shape. ETL tools do this very efficiently by treating the data as pipelines. These can be split and parallelized. You probably won't have the time or resources to do this. An alternative may be to load the data into staging tables (a less normalized representation of the data).

负荷层发生在最终目标位置(从临时表在上述情况下),将转化数据并加载它们。

The load layer takes the transformed data (in the above case from the staging tables) and loads them in to the final destination location.

这充分分离图层,这样就可以保护自己免受未来变化。请记住,但是,你真的只是在做ETL工具会为你做什么现成的。

This sufficiently separates your layers so that you can protect yourself from future change. Keep in mind, however, that you're really just doing what an ETL tool will do for you out of the box.

这篇关于我应该使用什么型号/模式来处理多个数据源?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆