如何存储历史数据 [英] How to Store Historical Data

查看:170
本文介绍了如何存储历史数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一些同事和我就辩论最好的方式存储历史数据。目前,对于某些系统,我使用单独的表来存储历史数据,并且保留当前活动记录的原始表。所以说,我有桌子FOO。在我的系统下,所有活跃的记录将进入FOO,所有历史记录将进入FOO_Hist。 FOO中的许多不同的字段可以由用户更新,所以我想保持对所有更新的准确帐户。 FOO_Hist保存与FOO完全相同的字段,但自动递增HIST_ID除外。每次更新FOO时,我会在FOO_Hist中执行一个插入语句,类似于:插入FOO_HIST select * from FOO,其中id = @id



我的同事说这是糟糕的设计,因为历史原因我不应该有一个精确的表格副本,而应该将另一个记录插入活动表有一个标志,表示它是为了历史目的。



是否有处理历史数据存储的标准?在我看来,我不想在同一张表中将我的所有历史记录与我的活动记录混在一起,因为它可能有超过一百万条记录(我在想长期)。



您或您的公司如何处理此问题?



我正在使用MS SQL Server 2008,但我想保留回答任何DBMS的通用和任意数据。



提前感谢任何输入。

解决方案

在操作系统中直接支持历史数据将使您的应用程序比其他应用程序复杂得多。一般来说,我不建议这么做,除非你有很强的要求来操纵系统中的记录的历史版本。



如果您仔细查看,历史数据的大部分要求可归入两类:




  • 审核日志记录:这更适合审核表。编写一个工具,通过从系统数据字典读取元数据来生成脚本来创建审计日志表和触发器是相当容易的。这种类型的工具可用于将审核日志记录更新到大多数系统上。如果要实施数据仓库,请使用此子系统进行更改的数据捕获(见下文)。


  • 历史报告:对历史状态进行报告,随时随地进行as-at职位或分析报告。可以通过查询上述类型的审核日志记录表来完成简单的历史报告要求。如果您有更复杂的要求,那么实施报告的数据集市可能比经济学直接集成到操作系统中更为经济。

    缓慢变化的维度是迄今为止最简单的跟踪和可以自动查询历史状态和大部分历史跟踪。通用处理程序不是很难写。通常,历史报告不必使用最新数据,因此批量刷新机制通常是正常的。这使您的核心和报告系统架构相对简单。




如果您的要求属于这两类,在您的操作系统中不存储历史数据可能更好。将历史功能分解成另一个子系统可能总体上较少的工作量,并生成对其预期目的更好的事务和审计/报告数据库。


Some co-workers and I got into a debate on the best way to store historical data. Currently, for some systems, I use a separate table to store historical data, and I keep an original table for the current, active record. So, let's say I have table FOO. Under my system, all active records will go in FOO, and all historical records will go in FOO_Hist. Many different fields in FOO can be updated by the user, so I want to keep an accurate account of everything updated. FOO_Hist holds the exact same fields as FOO with the exception of an auto-incrementing HIST_ID. Every time FOO is updated, I perform an insert statement into FOO_Hist similar to: insert into FOO_HIST select * from FOO where id = @id.

My co-worker says that this is bad design because I shouldn't have an exact copy of a table for historical reasons and should just insert another record into the active table with a flag indicating that it's for historical purposes.

Is there a standard for dealing with historical data storage? It seems to me that I don't want to clutter my active records with all of my historical records in the same table considering that it may be well over a million records (I'm thinking long term).

How do you or your company handle this?

I'm using MS SQL Server 2008, but I'd like to keep the answer generic and arbitrary of any DBMS.

Thanks in advance for any input.

解决方案

Supporting historical data directly within an operational system will make your application much more complex than it would otherwise be. Generally, I would not recommend doing it unless you have a hard requirement to manipulate historical versions of a record within the system.

If you look closely, most requirements for historical data fall into one of two categories:

  • Audit logging: This is better off done with audit tables. It's fairly easy to write a tool that generates scripts to create audit log tables and triggers by reading metadata from the system data dictionary. This type of tool can be used to retrofit audit logging onto most systems. You can also use this subsystem for changed data capture if you want to implement a data warehouse (see below).

  • Historical reporting: Reporting on historical state, 'as-at' positions or analytical reporting over time. It may be possible to fulfil simple historical reporting requirements by quering audit logging tables of the sort described above. If you have more complex requirements then it may be more economical to implement a data mart for the reporting than to try and integrate history directly into the operational system.

    Slowly changing dimensions are by far the simplest mechanism for tracking and querying historical state and much of the history tracking can be automated. Generic handlers aren't that hard to write. Generally, historical reporting does not have to use up-to-the-minute data, so a batched refresh mechanism is normally fine. This keeps your core and reporting system architecture relatively simple.

If your requirements fall into one of these two categories, you are probably better off not storing historical data in your operational system. Separating the historical functionality into another subsystem will probably be less effort overall and produce transactional and audit/reporting databases that work much better for their intended purpose.

这篇关于如何存储历史数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆