数据库中的历史行管理 [英] history rows management in database

查看:103
本文介绍了数据库中的历史行管理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在许多数据库中,我设计的数据库应该记录每个表中已更改的行的先前版本。



这个问题的标准解决方案是为每个数据表
保存历史表,并且每当需要在数据表中更新行时,将当前行的副本插入到历史表中,并且更新数据表中的行。



这个解决方案的缺点:




  • (如果表格的结构需要更改)

  • 应用程序需要知道这两个表格,而不是一个

  • 表可能需要缩短以保持表名和历史表名的约定(例如SOME_TABLE,SOME_TABLE_HIST)。



我正在考虑一个不同的解决方案,并想知道如果它是确定。
为每个表,我们添加列IS_LAST




  • 当一行插入到表中,它将插入与IS_LAST = 1。

  • 当行更新时,原始行的副本将被复制到同一个表,更改为IS_LAST = 0,原始行将根据需要更新(仍然保持IS_LAST = 1)。



假设在我的情况下,行平均更新10次。
还假定应用程序执行的操作中至少有90%仅发生在最新版本的行上。



我的数据库是一个Oracle 10g所以为了保持活动表slim,我们可以将表拆分为2个分区:IS_LAST = 1分区,IS_LAST = 0分区。



分区是解决历史数据保留问题的好方法吗?



解决方案

第一个问题应该是:你会用这些数据做什么?如果你没有明确的业务需求,不要这样做。



我做了类似的事情,经过3年的运行,有大约20%的有效数据和rest是以前的版本。它是1000万+ 4000万记录。在过去三年中,我们有两(2)个调查历史的请求,两个时间的请求都是愚蠢的 - 我们记录了记录更改的时间戳,我们被要求检查是否加班(下午5点后)。



现在,我们停留在超大的数据库中,它包含没有人需要的80%的数据。



strong>



因为你问可能的解决方案,我会描述我们做了什么。它与您正在考虑的解决方案有点不同。


  1. 所有表都具有替代主键。

  2. 所有主键都是从单个序列生成的。这工作正常,因为Oracle可以生成和缓存数字,所以在这里没有性能问题。我们使用ORM,我们希望内存中的每个对象(以及数据库中的相应记录)都有唯一的标识符。

  3. 我们使用ORM和数据库表和类之间的映射信息以属性的形式。 / li>

我们使用以下列记录单个归档表中的所有更改:




  • id(代理主键)

  • 时间戳

  • 原始表



  • 将原始记录的记录
  • varchar2字段


    • 这是以字段名/值对形式的实际数据。




Thing的工作方式如下:




  • ORM有插入/删除命令。

  • 我们为所有覆盖插入/更新和删除命令的业务对象创建了一个基类


    • 插入/更新/ delete命令以字段名/值对的形式使用反射创建字符串。代码查找映射信息并读取字段名称,关联值和字段类型。然后我们创建类似于JSON的东西(我们添加了一些修改)。当创建对象的当前状态的字符串时,将其插入到存档表中。


  • ,它被保存到其目标表,同时我们将一个具有当前值的记录插入到存档表中。

  • 当对象被删除时,我们将其从目标表中删除同时我们在归档表中插入一个具有交易类型为DELETE的记录



Pro:




  • 我们没有数据库中每个表的归档表。我们也不需要担心在模式更改时更新归档表。

  • 完整存档与当前数据分开,因此存档不会对数据库施加任何性能影响。

  • 我们创建了两个用于查看存档的表单:


    • >通用查看器,可以根据归档表上的过滤器列出归档表。过滤数据用户可以在表单上输入(时间跨度,用户,...)。我们以表单字段名/值显示每个记录,每个更改都用颜色编码。用户可以看到每个记录的所有版本,他们可以看到谁和什么时候进行更改。

    • 发票查看器 - 这个很复杂,但我们创建了一个表单,显示的发票与原始发票输入表单非常相似,但带有一些可显示不同代的附加按钮。创建此表单需要付出相当大的努力。形式被使用了几次,然后被遗忘,因为它在当前工作流程中不需要。


  • 用于创建归档记录的代码位于单个C#类中。在数据库中的每个表上都不需要触发器。

  • 性能非常好。在高峰时间,系统由大约700-800个用户使用。这是ASP.Net应用程序。 ASP.Net和Oracle都在一个具有8Gb RAM的双XEON上运行。



缺点:





  • 搜索归档表中的非id字段很难 - 我们只能使用 LIKE 字符串上的运算符。



。这不是微不足道的任务,但获得和使用可以是最少的。


As in many databases, i am designing a database that should keep record of previous versions of the rows changed in each table.

The standard solution to this problem is to keep a history table for each data table, and whenever a row needs to be updated in the data table, a copy of the current row gets inserted to the history table and than the row in the data table gets updated.

the disadvantages of this solution for me:

  • maintenance of 2 tables instead of 1, (in case the structure of the table needs change)
  • the application needs to know both of the tables instead of one
  • names of the tables might need to be short to keep a convention of the table name and the history table name (SOME_TABLE, SOME_TABLE_HIST for example)

I am considering a different solution and would like to know if it's ok. for each table, we add the column IS_LAST

  • when a row gets inserted to the table, it will get inserted with IS_LAST=1.
  • when a row gets updated, a copy of the original row will be duplicated to the same table with the change of IS_LAST=0, and the original row will get updated as needed (still keeping IS_LAST=1).

assume that in my case, rows are updated at an average of 10 times. also assume that at least 90% of the actions performed by the application happens only on the recent version of the rows.

my database is an Oracle 10g so to keep the "active" table slim, we can split the table to 2 partitions: the IS_LAST=1 partition, and the IS_LAST=0 partition.

Is partitioning a good way of solving the history data keeping problem?

Does this solution limits other partition potential to these tables?

thanks!

解决方案

First question should be: what would you do with that data? If you don't have clear business requirement, don't do it.

I did something similar and after 3 years of running there is about 20% of "valid data" and rest is "previous versions". And it is 10 million + 40 million records. In last three years we had 2 (two) requests to investigate history of changes and both times requests were silly - we record time stamp of record change and we were asked to check if persons worked overtime (after 5pm).

Now, we are stuck with oversized database that contains 80% of data that nobody needs.

EDIT:

Since you asked for possible solutions, I'll describe what we did. It's a bit different than solution you are considering.

  1. All tables have surrogate primary key.
  2. All primary keys are generated from single sequence. This works fine because Oracle can generate and cache numbers, so no performance problems here. We use ORM and we wanted each object in memory (and corresponding record in database) to have unique identifier
  3. We use ORM and mapping information between database table and class is in form of attributes.

We record all changes in single archive table with following columns:

  • id (surrogate primary key)
  • time stamp
  • original table
  • id of original record
  • user id
  • transaction type (insert, update, delete)
  • record data as varchar2 field
    • this is actual data in form of fieldname/value pairs.

Thing works this way:

  • ORM has insert/update and delete comands.
  • we created one base class for all our business objects that overrides insert/update and delete commands
    • insert/update/delete commands create string in form of fieldname/value pairs using reflection. Code looks for mapping information and reads field name, associated value and field type. Then we create something similar to JSON (we added some modifications). When string representing current state of object is created, it is inserted into archive table.
  • when new or updated object is saved to database table, it is saved to his target table and at the same time we insert one record with current value into archive table.
  • when object is deleted, we delete it from his target table and at the same time we insert one record in archive table that have transaction type = "DELETE"

Pro:

  • we don't have archive tables for each table in database. We also don't need to worry about updating archive table when schema changes.
  • complete archive is separated from "current data", so archive does not impose any performance hit on database. We put it onto separate tablespace on separate disk and it works fine.
  • we created 2 forms for viewing archive:
    • general viewer that can list archive table according to filter on archive table. Filter data user can enter on form (time span, user, ...). We show each record in form fieldname/value and each change is color coded. Users can see all versions for each record and they can see who and when made changes.
    • invoice viewer - this one was complex, but we created form that shows invoice very similar to original invoice entry form, but with some additional buttons that can show different generations. It took considerable effort to create this form. Form was used few times and then forgotten because it was not needed in current workflow.
  • code for creating archive records is located in single C# class. There is no need for triggers on every table in database.
  • performance is very good. At peak times, system is used by around 700-800 users. This is ASP.Net application. Both ASP.Net and Oracle are running on one dual XEON with 8Gb RAM.

Cons:

  • single table archive format is harder to read than solution where there is one archive table for each of the data tables.
  • search on non-id field in archive table is hard - we can use only LIKE operator on string.

So, again, check the requirements on archive. It is not trivial task, but gains and use can be minimal.

这篇关于数据库中的历史行管理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆