拉动"Delta"的最佳方法是什么?数据从高度事务处理的数据库导入Analytics(分析)数据库? [英] What is the best approach to pull "Delta" data into Analytics DB from a highly transactional DB?

查看:80
本文介绍了拉动"Delta"的最佳方法是什么?数据从高度事务处理的数据库导入Analytics(分析)数据库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从高度事务性数据库中仅将Delta加载到分析数据库中的最佳方法是什么?

What is the best approach to load only the Delta into the analytics DB from a highly transactional DB?

注意: 我们有一个高度事务性的系统,并且我们正在用它建立一个分析数据库.目前,我们正在清除分析数据库中的所有事实和维度表,并在午夜加载整个已处理"数据.这种方法的问题在于,我们每次都一次又一次地加载相同的数据,以及在该特定日期添加/更新的少量新数据.我们需要单独加载"Delta"(新插入的行和已更新的旧行).有什么有效的方法吗?

Note: We have a highly transactional system and we are building an analytic database out of it. At present, we are wiping off all the fact and dimension tables from the analytics DB and loading the entire "processed" data at midnight. Problem with this approach is that, we are loading the same data again and again every time along with the few new data that got added/updated on that particular day. We need to load the "Delta" alone (rows which are inserted newly & the old rows which got updated). Any efficient way to do this?

推荐答案

在不知道细节的情况下很难说出一些东西,例如数据库模式,数据库引擎...然而,对我而言,最自然的方法是使用时间戳.此解决方案假定从事务数据库加载/迁移到分析数据库的实体(表中的单个记录或一组相关记录)具有时间戳.

It is difficult to tell something without knowing the details e.g. the database schema, the database engine... However the most natural approach for me is to use timestamps. This solution assumes that entities (single record in a table, or group of related records) that are loaded/migrated from a transactional DB into an analytic one have a timestamp.

此时间戳表示上一次创建或更新给定实体的时间.在加载/迁移数据时,对于每个时间戳>上次迁移的日期,您应仅考虑这些实体.这种方法的优点是非常简单,不需要任何特定工具.问题是您的数据库中是否已经有时间戳.

This timestamp says when given entity was created or updated the last time. While loading/migrating data you should take into account only these entities for each the timestamp > the date of the last migration. This approach has this advantage that is quite simple and does not require any specific tool. The question is if you already have timestamps in your DB.

另一种方法可能是利用某种变更跟踪机制.例如,MMSQL服务器具有类似的内容(请参见文章).但是,我必须承认我从未使用过它,因此不确定在这种情况下是否合适.如果您的数据库不支持更改跟踪,则可以尝试根据触发器自行创建数据库,但是通常这并不容易.

Another approach might be to utilize some kind of change tracking mechanism. For example MMSQL server has something like that (see this article). However, I have to admit that I've never used it so I'm not sure if it is suitable in this case. If your database doesn't support change tracking, you can try to create it on your own based on triggers, but in general it is not easy thing to do.

这篇关于拉动"Delta"的最佳方法是什么?数据从高度事务处理的数据库导入Analytics(分析)数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆