数据加载到巨大的分区表 [英] Data load to huge partitioned table

查看:71
本文介绍了数据加载到巨大的分区表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张大桌子.首先是按price_date划分的范围,然后是按fund_id划分的哈希.该表有4.3亿行.每天我都有一个批处理工作,每天要插入150万到300万行.

I have a huge table. First range partitioned by price_date, then hash partitioned by fund_id. The table has 430 million rows. Every day I have a batch job in which insert 1.5 million to 3 million rows, every day.

我们正在寻找用于启用和禁用本地索引(不是所有索引,而是基于数据的分区,仅数据可触及该分区)

We are looking the for enable and disable local indexes(not all indexes but based on data which partitions are touched by data only)

有没有人有使大表中的插入操作更快而无需删除和重建技术的经验?

Does anyone has experience in making insert into large table run faster without drop and rebuild technique?

有人对这种情况有什么建议吗?

Does anyone have any suggestions for this situation only?

推荐答案

请阅读以下内容:

http://www.evdbt.com/TGorman%20TD2005%20DWScale.doc

这有效.

您是否面临着使登台区域可以进行在线查询或延迟到达数据的挑战(例如,您今天除今天/昨天以外的任何一天都可以排行)吗?

Do you have the challenge of having the staging area be accessible to online query, or late-arriving data (for example, can you get a row today for any day than today/yesterday)?

我有代码可以扫描要加载的记录数据集,并在要修改表子分区时标记本地索引子分区. (我之所以使用它而不是上面的Tim Gorman的参考,是因为我有较晚到达的数据,并且需要同时为最终用户提供适当的登台区域和仓库.)

I've got code which does scan through my data set of records which I'm going to be loading, and marks the local index subpartitions if the table subpartition is going to be modified. (I'm using this instead of Tim Gorman's reference above because I've got late-arriving data and the need to have the staging area and the warehouse proper available to end users simultaneously.)

我的表格是范围/列表,而不是范围/哈希.因此您将不得不对其进行一些修改,可能使用ORA_HASH函数来找到正确的子分区.我还写了一个表,将要标记为不可用的子分区,因此我可以一次性完成所有这些工作.在单个ALTER TABLE语句中将所有子分区的索引标记为不可用可能会稍微更有效.我最初只是禁用BITMAP索引,但是即使在数据加载期间使单个B * tree索引脱机也可以显着提高效率.

My table is range/list, not range/hash. so you're going to have to modify it some, probably using the ORA_HASH function to find the right subpartition(s). I also write out to a table which subpartitions I'm going to mark as unusable, so I can do all of that in a single pass. It may be slightly more efficient to mark all the subpartition's indexes as unusable in a single ALTER TABLE statement; I was originally only disabling the BITMAP indexes, but even having a single B*tree indexes offline during the data load improved efficiency significiantly.

  procedure DISABLE_LOCAL_INDEXES as
     l_part_name varchar2(30);
     l_subpart_name varchar2(30);
     l_sql varchar2(2000);
     type partition_rec_type is record
     (table_name         varchar2(30),
      partition_name     varchar2(30),
      subpartition_name  varchar2(30),
      list_value         varchar2(10),
      min_ts             timestamp,
      max_ts             timestamp);
     type partition_recs_type
                         is table of partition_rec_type;
     l_partition_recs    partition_recs_type := partition_recs_type();
     l_partition_rec     partition_rec_type;
     l_subpart_id        number := 1;
     l_start_ts          timestamp;
     l_end_ts            timestamp;
     l_found_list_part boolean;
   begin
     -- build set of subpartitions
     l_start_ts := to_timestamp ('1970-01-01', 'yyyy-mm-dd');
     for i in (select p.table_name, p.partition_name, sp.subpartition_name,
                      p.high_value as part_high_value, 
                      sp.high_value as subpart_high_value,
                      p.partition_position, sp.subpartition_position
                 from user_tab_subpartitions sp
                      inner join user_tab_partitions p
                         on p.table_name     = sp.table_name
                        and p.partition_name = sp.partition_name
                where p.table_name = 'MY_TARGET_TABLE'
                order by p.partition_position, sp.subpartition_position)
     loop
       if ( (i.partition_position <> 1) and (i.subpartition_position = 1) ) then
         l_start_ts    := l_end_ts + to_dsinterval('0 00:00:00.000000001');
       end if;
       if (i.subpartition_position = 1) then
         l_end_ts := high_val_to_ts (i.part_high_value);
         l_end_ts := l_end_ts - to_dsinterval('0 00:00:00.000000001');
       end if;
       l_partition_rec.table_name        := i.table_name;
       l_partition_rec.partition_name    := i.partition_name;
       l_partition_rec.subpartition_name := i.subpartition_name;
       l_partition_rec.list_value        := i.subpart_high_value;
       l_partition_rec.min_ts            := l_start_ts;
       l_partition_rec.max_ts            := l_end_ts;
       l_partition_recs.extend();
       l_partition_recs(l_subpart_id) := l_partition_rec;
       l_subpart_id := l_subpart_id + 1;
     end loop;
     -- for every combination of list column and date column
     -- which is going to be pushed to MY_TARGET_TABLE
     -- find the subpartition
     -- otherwise find the partition and default subpartition
     for i in (select distinct LIST_COLUMN, DATE_COLUMN as DATE_VALUE
                 from MY_SOURCE_TABLE
                where IT_IS_BEING_MOVED_TO_TARGET IS TRUE)
     loop
       -- iterate over the partitions
       l_found_list_part := false;
       for k in l_partition_recs.first..l_partition_recs.last
       loop
         -- find the right partition / subpartition for list_value / date_value
         if (    (i.DATE_VALUE >= l_partition_recs(k).min_ts)
             and (i.DATE_VALUE <= l_partition_recs(k).max_ts) ) then
           if (l_found_list_value = false) then
             if (to_char(i.LIST_COLUMN, '9999') = l_partition_recs(k).LIST_COLUMN) then
               l_found_list_value := true;
             elsif (l_partition_recs(k).LIST_COLUMN = 'DEFAULT') then
               l_partition_rec := l_partition_recs(k);
             end if;
           end if;
         end if;
       end loop;  -- over l_partition_recs
       -- log those partitions for later index rebuild
       begin
         insert into index_subpart_rebuild
           (table_name, partition_name, subpartition_name)
         values
           (l_partition_rec.table_name, l_partition_rec.partition_name,
            l_partition_rec.subpartition_name);
       exception
         when dup_val_on_index then null;
         when others then raise;
       end;
     end loop;  -- over MY_TARGET_TABLE.DATE_VALUE values
     commit;
     for i in (select ui.index_name, uis.subpartition_name
                 from user_indexes ui
                      inner join user_ind_subpartitions uis
                         on ui.index_name = uis.index_name
                      inner join index_subpart_rebuild re
                         on re.subpartition_name = uis.subpartition_name
                where ui.table_name = 'MY_TARGET_TABLE')
     loop
       l_sql := 'alter index ' || i.index_name ||
                ' modify subpartition ' || i.subpartition_name || ' unusable';
       execute immediate l_sql;
     end loop;
   end DISABLE_LOCAL_INDEXES;

这篇关于数据加载到巨大的分区表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆