将Oracle BLOBS批量提取到文件中-需要建议/调优帮助 [英] Bulk extraction of Oracle BLOBS into files - advice/tuning help needed

查看:98
本文介绍了将Oracle BLOBS批量提取到文件中-需要建议/调优帮助的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在一个需要将现有Oracle Blob迁移到文件中的项目中.要读取的环境是共享的Oracle 10gR2服务器.目前,我有一个使用UTL_FILE的脚本.但是,该过程非常缓慢.提取25 GB的样本数据大约需要3个小时.实际要移动的数据约为1 TB.我需要帮助/建议来对此进行重大调整.

I am working on a project that needs migration of existing Oracle blobs into files. The environment to read from is a shared Oracle 10gR2 server. Currently I have a script using UTL_FILE. However, the process is pretty slow. It takes around 3 hours to extract 25 GB of sample data. The actual data to be moved is in the order of 1 TB. I need help/advice in tuning this significantly.

这是我的流程:

  1. 打开一个游标以获取blob ID和名称的列表
  2. 开始循环遍历每个斑点
  3. 使用自定义存储过程BLOB2FILE提取blob(从网站上提取并稍加修改)

下面是代码:

create or replace
PROCEDURE BLOB2File(
    lngBlobID IN NUMBER,
    sFileName IN VARCHAR2,
    sDir      IN VARCHAR2)
AS
  iFileLen INTEGER;
  iLineLen INTEGER := 32000; -- max line size for utl_file
  vStart   NUMBER  := 1;
  vBlob BLOB;
  l_output utl_file.file_type;
  my_vr RAW(32000);
  iTmp INTEGER;
BEGIN
  -- get blob details
  LOG_IT('Entered. Blob Id: ' || lngBlobID || ', File Name: ' || sFileName || ', Directory: ' || sDir);
  SELECT blobData,
    lengthb(blobData)
  INTO vBlob,
    iFileLen
  FROM blobTable
  WHERE id = lngBlobID;
  LOG_IT('Acquired the blob. Blob size: ' || TO_CHAR(iFileLen));
  l_output := utl_file.fopen(sDir, sFileName,'wb', iLineLen);
  vStart   := 1;
  iTmp     := iFileLen;
  -- if small enough for a single write
  IF iFileLen < iLineLen THEN
    utl_file.put_raw(l_output,vBlob);
    utl_file.fflush(l_output);
  ELSE -- write in pieces
    vStart      := 1;
    WHILE vStart < iFileLen AND iLineLen > 0
    LOOP
      dbms_lob.read(vBlob,iLineLen,vStart,my_vr);
      utl_file.put_raw(l_output,my_vr);
      utl_file.fflush(l_output);
      -- set the start position for the next cut
      vStart := vStart + iLineLen;
      -- set the end position if less than 32000 bytes
      iTmp       := iTmp - iLineLen;
      IF iTmp     < iLineLen THEN
        iLineLen := iTmp;
      END IF;
    END LOOP;
  END IF;
  utl_file.fclose(l_output);
  LOG_IT('Exited');

  EXCEPTION
  WHEN OTHERS THEN
  LOG_IT('**ERROR** ' || SQLERRM, SQLCODE, DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
END;

LOG_IT是记录到表的存储过程.那里不应有任何重大打击. 我尝试通过使用BULK FETCH(而不是普通的FETCH)来优化第1步.但是,它没有产生任何明显的结果.

LOG_IT is a stored proc logging to a table. There should not be any significant hit there. I tried optimizing Step 1 by using BULK FETCH instead of a normal FETCH. However, it didn't yield any significant result.

有人可以提出任何改进的想法,甚至更好的方法来解决这个问题吗?

Can anybody suggest any ideas for improvement or, even better, a more performant way of approaching this?

推荐答案

假设您的硬件足以处理每小时至少8 GB的对sDir的持续写入(并处理从并处理系统需要的其他任何I/O),最简单的选择可能是产生几个并行会话,每个会话都在调用此过程.例如,如果要并行运行三个作业,每个作业都提取一个LOB,则可以执行以下操作.

Assuming that your hardware is sufficient to handle far more than 8 GB/hour of sustained writes to sDir (and to handle reading a similar amount from blobTable and to handle whatever other I/O your system needs), the simplest option would likely to be to spawn a few parallel sessions each of which is calling this procedure. For example, if you wanted to run three jobs in parallel each of which was extracting one LOB, you could do something like this.

DECLARE
  l_jobno INTEGER;
BEGIN
  dbms_job.submit( l_jobno, 'begin BLOB2File( 1, ''1.lob'', ''DIRECTORY'' ); end;', sysdate + interval '5' second );
  dbms_job.submit( l_jobno, 'begin BLOB2File( 2, ''2.lob'', ''DIRECTORY'' ); end;', sysdate + interval '5' second );
  dbms_job.submit( l_jobno, 'begin BLOB2File( 3, ''3.lob'', ''DIRECTORY'' ); end;', sysdate + interval '5' second );
  commit;
END;

您可能实际上并不希望为每个BLOB都有一个单独的线程-您可能希望生成较少的作业,并为每个作业分配一个范围为lngBlobID的值. Oracle一次可以运行的作业数受JOB_QUEUE_PROCESSES参数的限制,因此您可以提交数千个作业,而让Oracle限制同时运行的作业数.

You probably don't want to have a separate thread for every BLOB in reality-- you probably want to generate a smaller number of jobs and give them each a range of lngBlobID values to work on. The number of jobs Oracle will run at any one time is limited by the JOB_QUEUE_PROCESSES parameter so you could submit thousands of jobs and just let Oracle limit how many it will run simultaneously.

这篇关于将Oracle BLOBS批量提取到文件中-需要建议/调优帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆