在Oracle中使用大量数据在同一张表中进行字段比较的最快方法 [英] Fastest way of doing field comparisons in the same table with large amounts of data in oracle

查看:665
本文介绍了在Oracle中使用大量数据在同一张表中进行字段比较的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从一个部门的csv文件中接收信息,以便与不同部门的相同信息进行比较以检查差异(每百万行数据中约3/4的行,每行44列).将数据存储在表格中之后,我就有了一个程序,该程序将获取数据并基于总部发送报告.我觉得我要解决的方法不是最有效的.我正在使用oracle进行此比较.

I am recieving information from a csv file from one department to compare with the same inforation in a different department to check for discrepencies (About 3/4 of a million rows of data with 44 columns in each row). After I have the data in a table, I have a program that will take the data and send reports based on a HQ. I feel like the way I am going about this is not the most efficient. I am using oracle for this comparison.

这就是我所拥有的:

  • 我有一个vb.net程序,该程序可以解析数据并将其插入到提取表中

  • I have a vb.net program that parses the data and inserts it into an extract table

我运行一个过程,对两个表进行完全外部联接,使其成为新表,其中一个部门中的字段以'_c'

I run a procedure to do a full outer join on the two tables into a new table with the fields in one department prefixed with '_c'

我运行另一个过程来比较旧/新数据,并使用详细信息和摘要信息更新2个不同的表.这是过程内部的代码:

I run another procedure to compare the old/new data and update 2 different tables with detail and summary information. Here is code from inside the procedure:

DECLARE 
  CURSOR Cur_Comp IS SELECT * FROM T.AEC_CIS_COMP;
BEGIN 
FOR compRow in Cur_Comp LOOP

    --If service pipe exists in CIS but not in FM and the service pipe has status of retired in CIS, ignore the variance
    If(compRow.pipe_num = '' AND cis_status_c = 'R')
        continue
    END IF

    --If there is not a summary record for this HQ in the table for this run, create one
    INSERT INTO t.AEC_CIS_SUM (HQ, RUN_DATE)
    SELECT compRow.HQ, to_date(sysdate, 'DD/MM/YYYY') from dual WHERE NOT EXISTS
    (SELECT null FROM t.AEC_CIS_SUM WHERE HQ = compRow.HQ AND RUN_DATE = to_date(sysdate, 'DD/MM/YYYY'))

    -- Check fields and update the tables accordingly
    If (compRow.cis_loop <> compRow.cis_loop_c) Then
        --Insert information into the details table
        INSERT INTO T.AEC_CIS_DET( Fac_id, Pipe_Num, Hq, Address, AutoUpdatedFl, 
                                              DateTime, Changed_Field, CIS_Value, FM_Value)
        VALUES(compRow.Fac_ID, compRow.Pipe_Num, compRow.Hq, compRow.Street_Num || ' ' || compRow.Street_Name,
               'Y', sysdate, 'Cis_Loop', compRow.cis_loop, compRow.cis_loop_c); 

        -- Update information into the summary table        
        UPDATE AEC_CIS_SUM                 
        SET cis_loop = cis_loop + 1
        WHERE Hq = compRow.Hq
          AND Run_Date = to_date(sysdate, 'DD/MM/YYYY')               
    End If;       
END LOOP;

END;

有没有建议为表的所有44列提供一个比if语句更简单的方法? (如果需要的话,每周运行一次)

Any suggestions of an easier way of doing this rather than an if statement for all 44 columns of the table? (This is run once a week if it matters)

更新:仅需澄清一下,有88列数据(44个重复项与后缀_c进行比较).一张表列出了一行中每个字段的不同之处,因此一行可以表示该表中写入了30多个记录.另一个表记录了每周的差异数量.

Update: Just to clarify, there are 88 columns of data (44 of duplicates to compare with one suffixed with _c). One table lists each field in a row that is different so one row can mean 30+ records written in that table. The other table keeps tally of the number of discrepencies for each week.

推荐答案

首先,我相信您的任务可以使用稳定的SQL实现(并且应该实际上是).没有花哨的光标,没有循环,只需选择,插入和更新.我将从取消数据源入手开始(目前尚不清楚您是否有主键来连接两组数据集):

First of all I believe that your task can be implemented (and should be actually) with staight SQL. No fancy cursors, no loops, just selects, inserts and updates. I would start with unpivotting your source data (it is not clear if you have primary key to join two sets, I guess you do):

Col0_PK    Col1    Col2    Col3    Col4
----------------------------------------
Row1_val   A       B       C       D
Row2_val   E       F       G       H

以上是您的源数据.使用 UNPIVOT子句,我们将其转换为:

Above is your source data. Using UNPIVOT clause we convert it to:

Col0_PK     Col_Name    Col_Value
------------------------------
Row1_val    Col1        A
Row1_val    Col2        B
Row1_val    Col3        C
Row1_val    Col4        D
Row2_val    Col1        E
Row2_val    Col2        F
Row2_val    Col3        G
Row2_val    Col4        H

我想你明白了.假设我们的table1具有一组数据,而相同的结构化table2具有第二组数据.最好使用索引组织的表.

I think you get the idea. Say we have table1 with one set of data and the same structured table2 with the second set of data. It is good idea to use index-organized tables.

下一步是将行相互比较并存储差异明细.像这样:

Next step is comparing rows to each other and storing difference details. Something like:

insert into diff_details(some_service_info_columns_here)
 select some_service_info_columns_here_along_with_data_difference
  from table1 t1 inner join table2 t2
     on t1.Col0_PK = t2.Col0_PK
    and t1.Col_name = t2.Col_name
    and nvl(t1.Col_value, 'Dummy1') <> nvl(t2.Col_value, 'Dummy2');

在最后一步,我们更新差异汇总表:

And on the last step we update difference summary table:

insert into diff_summary(summary_columns_here)
 select diff_row_id, count(*) as diff_count
  from diff_details
 group by diff_row_id;

这只是显示我的方法的粗略草稿,我相信应该考虑更多细节.总而言之,我建议两件事:

It's just rough draft to show my approach, I'm sure there is much more details should be taken into account. To summarize I suggest two things:

  1. UNPIVOT数据
  2. 使用SQL语句代替游标
  1. UNPIVOT data
  2. Use SQL statements instead of cursors

这篇关于在Oracle中使用大量数据在同一张表中进行字段比较的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆