计算整个表哈希的最快方法 [英] Fastest way to compute for hash of a whole table

查看:96
本文介绍了计算整个表哈希的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们需要能够为外部环境计算表哈希,并将其与内部环境中的预先计算的哈希进行比较.使用此方法是为了确保外部环境中的数据不会被流氓"数据库管理员篡改.用户坚持使用此功能.

We need to be able to compute table hashes for an external environment and compare it to pre-computed hash from an internal environment. The use of this is to ensure that data in the external environment is not tampered by a "rogue" database administrator. Users insist this feature.

当前,我们通过计算每个列值的各个哈希值,对列哈希值执行位异或以获取行哈希,然后对所有行哈希值进行位异或以得出表哈希来实现此目的.伪脚本如下:

Currently, we do this by computing the individual hashes of each column value, perform bit-xor on the column hashes to get the row hash, then perform bit-xor on all the row hashes to come up with the table hash. Pseudo-script below:

cursor hash_cur is
select /*+ PARALLEL(4)*/ dbms_crypto.mac(column1_in_raw_type, HMAC_SH512, string_to_raw('COLUMN1_NAME')) as COLUMN1_NAME
       ...
from TABLE_NAME;

open hash_cur;
fetch hash_cur bulk collect into hashes;
close hash_cur;

for i in 1..hashes.count
loop
  rec := hashes(i);
  record_xor = rec.COLUMN1;
  record_xor = bit_xor(record_xor, rec.COLUMN2);
  ...
  record_xor = bit_xor(record_xor, rec.COLUMNN);

  table_xor = bit_xor(table_xor, record_xor);
end loop;

上面的伪脚本将通过使用dbms_job并行运行.

The pseudo-script above will be run in parallel by using dbms_job.

问题是某些表的数据量为TB,目前的性能无法达到我们想要达到的性能.当用户希望自己执行哈希检查时,必须即时"进行哈希处理.

Problem with this is that we have terabytes of data for certain tables and currently the performance does not meet the performance we want to achieve. Hashing must be done "on-the-fly" as users want to perform hash checking themselves.

  1. 你们是否有更好的方法来执行整个表的哈希处理,或者基本上是比较通过低延迟和相对低带宽网络连接的不同环境中的表?

在我看来,该操作更多地是受CPU约束而不是I/O约束.我正在考虑将表数据存储在Blob中,在该数据中,数据是按记录正确排列的,然后按列正确排列.然后在输出文件上执行哈希.这应该使操作完全受I/O约束.

It seems to me that the operation is more CPU-bound than I/O bound. I am thinking of storing the table data in a blob instead, where data is properly arranged by record, then by column. Then perform hash on the output file. This should make the operation completely I/O bound.

  1. 最快的方法是什么?无论如何,要在查询的select子句中执行此操作以删除任何开销的PL/SQL-to-SQL引擎上下文切换?
    • 我当时正在考虑为此修改一个全局blob
    • 还希望消除批量收集结果的I/O开销.

任何能使我获得更好性能的脚本的建议,将不胜感激.谢谢.

Any suggestions that can lead me to a better performing script would be greatly appreciated. Thanks.

推荐答案

首先,我认为处理恶意管理员"的方法是结合使用Oracle的审计跟踪Database Vault 功能.

First of all, I think the way to approach "rogue administrators" is with a combination of Oracle's audit trail and Database Vault features.

也就是说,我可以尝试以下方法:

That said, here's what I might try:

1)创建一个自定义ODCI聚合函数,以将多个行的哈希计算为一个聚合.2)在表上创建一个 VIRTUAL NOT NULL 列,该列是表中所有列的SHA哈希-或您关心的所有列.您将始终保持这种状态-基本上要牺牲一些 insert/update/delete 性能,以换取能够更快地计算哈希值.3)在该虚拟列上创建一个非唯一索引4)从my_table中选择my_aggregate_hash_function(virtual_hash_column)以获得结果.

1) Create a custom ODCI aggregate function to compute a hash of multiple rows as an aggregate. 2) Create a VIRTUAL NOT NULL column on the table that was an SHA hash of all the columns in the table -- or all the one's you care about protecting. You'd keep this around all the time -- basically trading away some insert/update/delete performance in exchange to be able to compute hashes more quickly. 3) Create a non-unique index on that virtual column 4) SELECT my_aggregate_hash_function(virtual_hash_column) FROM my_table to get the results.

这里的代码:

CREATE OR REPLACE TYPE matt_hash_aggregate_impl AS OBJECT
(
  hash_value RAW(32000),
  CONSTRUCTOR FUNCTION matt_hash_aggregate_impl(SELF IN OUT NOCOPY matt_hash_aggregate_impl ) RETURN SELF AS RESULT,  
-- Called to initialize a new aggregation context
-- For analytic functions, the aggregation context of the *previous* window is passed in, so we only need to adjust as needed instead 
-- of creating the new aggregation context from scratch
  STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_hash_aggregate_impl) RETURN NUMBER,
-- Called when a new data point is added to an aggregation context  
  MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_hash_aggregate_impl, value IN raw ) RETURN NUMBER,
-- Called to return the computed aggragate from an aggregation context
  MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_hash_aggregate_impl, returnValue OUT raw, flags IN NUMBER) RETURN NUMBER,
-- Called to merge to two aggregation contexts into one (e.g., merging results of parallel slaves) 
  MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_hash_aggregate_impl, ctx2 IN matt_hash_aggregate_impl) RETURN NUMBER,
  -- ODCIAggregateDelete
  MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_hash_aggregate_impl, value raw) RETURN NUMBER  
);

/

CREATE OR REPLACE TYPE BODY matt_hash_aggregate_impl IS

CONSTRUCTOR FUNCTION matt_hash_aggregate_impl(SELF IN OUT NOCOPY matt_hash_aggregate_impl ) RETURN SELF AS RESULT IS
BEGIN
  SELF.hash_value := null;
  RETURN;
END;


STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_hash_aggregate_impl) RETURN NUMBER IS
BEGIN
  sctx := matt_hash_aggregate_impl ();
  RETURN ODCIConst.Success;
END;


MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_hash_aggregate_impl, value IN raw ) RETURN NUMBER IS
BEGIN
  IF self.hash_value IS NULL THEN
    self.hash_value := dbms_crypto.hash(value, dbms_crypto.hash_sh1);
  ELSE 
      self.hash_value := dbms_crypto.hash(self.hash_value || value, dbms_crypto.hash_sh1);
  END IF;
  RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_hash_aggregate_impl, returnValue OUT raw, flags IN NUMBER) RETURN NUMBER IS
BEGIN
  returnValue := dbms_crypto.hash(self.hash_value,dbms_crypto.hash_sh1);
  RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_hash_aggregate_impl, ctx2 IN matt_hash_aggregate_impl) RETURN NUMBER IS
BEGIN
    self.hash_value := dbms_crypto.hash(self.hash_value || ctx2.hash_value, dbms_crypto.hash_sh1);
  RETURN ODCIConst.Success;
END;

-- ODCIAggregateDelete
MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_hash_aggregate_impl, value raw) RETURN NUMBER IS
BEGIN
  raise_application_error(-20001, 'Invalid operation -- hash aggregate function does not support windowing!');
END;  

END;
/

CREATE OR REPLACE FUNCTION matt_hash_aggregate ( input raw) RETURN raw
PARALLEL_ENABLE AGGREGATE USING matt_hash_aggregate_impl;
/

创建要使用的测试表(由于您拥有真实的表,因此请跳过此表)

create table mattmsi as select * from mtl_system_items where rownum <= 200000;

为每行数据创建一个虚拟列哈希.确保它是 NOT NULL

alter table mattmsi add compliance_hash generated always as ( dbms_crypto.hash(to_clob(inventory_item_id || segment1 || last_update_date || created_by || description), 3 /*dbms_crypto.hash_sh1*/) ) VIRTUAL not null ;

在虚拟列上创建索引;这样,您可以通过对窄索引的完整扫描而不是对胖表的完整扫描来计算哈希值

create index msi_compliance_hash_n1 on mattmsi (compliance_hash);  

将所有内容放在一起以计算您的哈希值

SELECT matt_hash_aggregate(compliance_hash) from (select compliance_hash from mattmsi order by compliance_hash);

一些评论:

  1. 我认为使用散列来计算聚合很重要(不仅仅是在行级哈希上执行 SUM(),因为攻击者可以很容易地伪造正确的金额.
  2. 我认为您不能(轻松地)使用并行查询,因为它是重要的是,将行馈送到聚合函数中顺序一致,否则哈希值将改变.
  1. I think it is important to use a hash to compute the aggregate (rather than merely doing a SUM() over the row-level hashes, because an attacker could forge the correct sum very easily.
  2. I don't think you can (easily?) use parallel query because it is important that the rows be fed to the aggregate function in a consistent order, or else the hash value will change.

这篇关于计算整个表哈希的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆