使用Perl将8位以外的GIF图像添加到PDF [英] Use Perl to Add GIF Image Other Than 8-bit to PDF

查看:93
本文介绍了使用Perl将8位以外的GIF图像添加到PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将8位以外的非隔行GIF图像添加到PDF文档中,而无需使用

I am attempting to add non-interlaced GIF images other than 8-bit to a PDF document without having to fully decode the bitstream using PDF::Create for Perl.

LZWDecode算法是 PDF标准的一部分要求所有图像的LZW码最小为8位,并且PDF::Create硬编码为仅嵌入8位图像.

The LZWDecode algorithm that is part of the PDF standard requires all images to have a minimum LZW code size of 8-bits, and PDF::Create is hard-coded to only embed 8-bit images.

到目前为止,我已经修改了PDF::Create中的>图像加载器读取5位图像并完全解码LZW流.然后,我可以使用PDF::Create中的编码器算法将图像重新打包为8位.

So far, I have adapted the image loader from PDF::Create to read a 5-bit image and to fully decode the LZW stream. I am then able to use the encoder algorithm from PDF::Create to re-pack the image as 8-bit.

我想做的是消除占用大量内存的解码/编码步骤. 此线程建议通过加宽或移位位"以使LZW代码具有适当的长度来实现用于LZWDecode.

What I'd like to do is to eliminate the memory-intensive decode/encode step. This thread suggests that this is possible by "widening or shifting bits" to make LZW codes the proper length for LZWDecode.

我联系了线程作者,他提供了一些其他详细信息,特别是颜色索引的代码保持不变,但用零填充(例如,[10000]变为[000010000]),即<Clear><End>代码分别更改为<256><257>,所有其他代码都偏移256-原始的<Clear>代码.

I contacted the thread author and he provided some additional details, in particular that codes for color indices remain the same but are padded with zeros (e.g., [10000] becomes [000010000]), that <Clear> and <End> codes are changed to <256> and <257> respectively, and that all other codes are offset by 256 - original <Clear> code.

但是,由于雇主的限制,他无法进一步详细说明.特别是,我不确定在修改后的值超过<4095>(LZW代码表的最大索引)时如何处理代码.我也不确定如何将修订后的代码重新打包到比特流中.

However, he was unable to elaborate further due to restrictions by his employer. In particular, I am uncertain how to handle a code when its modified value exceeds <4095> (the maximum index for the LZW code table). I am also unsure how to re-pack the revised codes into a bitstream.

我当前正在使用的算法如下.

The algorithms I am currently using are below.

# Read 5-bit data stream

sub ReadData5 {

    my $data = shift;

    my $c_size = 6;                # minimium LZW code size
    my $t_size = 33;               # initial code table size
    my ($i_buff,$i_bits) = (0,0);  # input buffer
    my ($o_buff,$o_bits) = (0,0);  # output buffer

    my $stream = '';               # bitstream
    my $pos    = 0;

    SUB_BLOCK: while (1){
        my $s = substr($data, $pos++, 1);

        # get sub-block size
        my $n_bytes  = unpack('C', $s) or last SUB_BLOCK;
        my $c_mask   = (1 << $c_size) - 1;

        BYTES: while (1){
            # read c_size bits
            while ($i_bits < $c_size){

                # end of sub-block
                !$n_bytes-- and next SUB_BLOCK;

                $s = substr($data, $pos++, 1);
                my $c = unpack('C', $s);

                $i_buff |= $c << $i_bits;
                $i_bits += 8;
            }

            # write c_size bits
            my $code   = $i_buff & $c_mask;

            my $w_bits = $c_size;
            $i_buff  >>= $c_size;
            $i_bits   -= $c_size;
            $t_size++;

            if ($o_bits > 0){
                $o_buff |= $code >> ($c_size - 8 + $o_bits);
                $w_bits -= 8 - $o_bits;
                $stream .= pack('C', $o_buff & 0xFF);
            }

            if ($w_bits >= 8){
                $w_bits -= 8;
                $stream .= pack('C', ($code >> $w_bits) & 0xFF);
            }

            if (($o_bits = $w_bits) > 0){
                $o_buff = $code << (8 - $o_bits);
            }

            # clear code
            if ($code == 32){
                $c_size   = 6;
                $t_size   = 33;
                $c_mask   = (1 << $c_size) - 1;
            }

            # end code
            if ($code == 33){
                $stream .= pack('C', $o_buff & 0xFF);
                last SUB_BLOCK;
            }

            if ($t_size == (1 << $c_size)){
                if (++$c_size > 12){
                    $c_size--;
                } else {
                    $c_mask = (1 << $c_size) - 1;
                }
            }
        }
    }

    # Pad with zeros to byte boundary
    $stream .= '0' x (8 - length($stream) % 8);

    return $stream;
}

#---------------------------------------------------------------------------

# Decode 5-bit data stream

sub UnLZW5 {
    my $data = shift;

    my $c_size = 6;                 # minimium LZW code size
    my $t_size = 33;                # initial code table size
    my ($i_buff,$i_bits) = (0,0);   # input buffer

    my $stream = '';                # bitstream
    my $pos    = 0;

    # initialize code table
    my @table  = map { chr($_) } 0..$t_size-2;
    $table[32] = '';
    my $prefix = '';
    my $suffix = '';

    # get first code word
    while ($i_bits < $c_size){
        my $d     = unpack('C', substr($data, $pos++, 1));
        $i_buff   = ($i_buff << 8) + $d;
        $i_bits += 8;
    }

    my $c2     = $i_buff >> ($i_bits - $c_size);
    $i_bits   -= $c_size;
    my $c_mask = (1 << $i_bits) - 1;
    $i_buff   &= $c_mask;

    # get remaining code words
    DECOMPRESS: while ($pos < length($data)){
        my $c1 = $c2;

        while ($i_bits < $c_size){
            my $d     = unpack('C', substr($data, $pos++, 1));
            $i_buff   = ($i_buff << 8) + $d;
            $i_bits  += 8;
        }

        $c2      = $i_buff >> ($i_bits - $c_size);
        $i_bits -= $c_size;
        $c_mask  = (1 << $i_bits) - 1;
        $i_buff &= $c_mask;

        # clear code
        if ($c2 == 32){
            $stream  .= $table[$c1];
            $#table   = 32;
            $c_size   = 6;
            $t_size   = 33;
            next DECOMPRESS;
        }

        # end code
        if ($c2 == 33){
            $stream .= $table[$c1];
            last DECOMPRESS;
        }

        # get prefix and suffix
        $prefix = $table[$c1] if $c1 < $t_size;
        $suffix = $c2 < $t_size ? substr($table[$c2], 0, 1) : substr($prefix, 0, 1);

        # write prefix
        $stream .= $prefix;

        # write multiple-character sequence
        $table[$t_size++] = $prefix . $suffix;

        # increase code size
        if ($t_size == 2 ** $c_size){
            if (++$c_size > 12){
                $c_size--;
            }
        }
    }

    return $stream;
}

推荐答案

一次执行一次很慢.一次完成所有这些操作会占用您太多内存.一次做一大块.

Doing one at a time is slow. Doing them all at once takes too much memory for you. Do them a chunk at a time.

my $BUFFER_SIZE = 5 * 50_000;  # Must be a multiple of 5.

my $in_bytes = ...;
my $out_bytes = '';
while (my ($bytes) = $in_bytes =~ s/^(.{1,$BUFFER_SIZE})//s) {
   # Unpack from 5 bit fields.
   my @vals = map { pack('B*', "000$_") } unpack('B*', $bytes) =~ /(.{5})/g;

   # Transform @vals into 8 bit values here.

   # Pack to 8 bit fields.
   $out_bytes .= pack('C*', @vals);

}

由于您根本不需要转换值(只是存储值的方式),因此可以简化为:

Since you're not transforming the values at all (just how they are stored), that simplifies to:

my $BUFFER_SIZE = 5 * 50_000;  # Must be a multiple of 40.

my $in_bytes = ...;
my $out_bytes = '';
while (my ($bytes) = $in_bytes =~ s/^(.{1,$BUFFER_SIZE})//s) {
   # Unpack from 5 bit fields.
   my $bits = unpack('B*', $bytes);
   $bits =~ s/(.{5})/000$1/g;
   $out_bytes .= pack('B*', $bits);

}

您没有说明如何处理这些多余的位.我只是忽略了它们.

You didn't say what to do with the extra bits. I simply ignored them.

不创建位字符串的替代方法:

Alternative approach with no bit string creation:

my $in_bytes = ...;
my $out_bytes = '';
while (my ($bytes) = $in_bytes =~ s/^(.{1,5})//s) {
    my @bytes = map ord, split //, $bytes;

    # 00000111 11222223 33334444 45555566 66677777

    $out_bytes .= chr(                            (($bytes[0] >> 3) & 0x1F));
    last if @bytes == 1;
    $out_bytes .= chr((($bytes[0] << 2) & 0x1C) | (($bytes[1] >> 6) & 0x03));
    $out_bytes .= chr(                            (($bytes[1] >> 1) & 0x1F));
    last if @bytes == 2;
    $out_bytes .= chr((($bytes[1] << 4) & 0x10) | (($bytes[2] >> 4) & 0x0F));
    last if @bytes == 3;
    $out_bytes .= chr((($bytes[2] << 1) & 0x1E) | (($bytes[3] >> 7) & 0x01));
    $out_bytes .= chr(                            (($bytes[3] >> 2) & 0x1F));
    last if @bytes == 4;
    $out_bytes .= chr((($bytes[3] << 3) & 0x18) | (($bytes[4] >> 5) & 0x07));
    $out_bytes .= chr(                            ( $bytes[4]       & 0x1F));
}

上述解决方案的优点是它在C语言中特别有效.

The advantage of the above solution is that it's particularly efficient in C.

STRLEN in_len;
const char* in = SvPVbyte(sv, in_len);

STRLEN out_len = (in_len * 8 / 5) * 8;
char* out = (char*)malloc(out_len);

char* out_cur = out;
char* in_end = in + in_len;

while (in != in_end) {
    *(out_cur++) =                          ((*in >> 3) & 0x1F));
    if (++in == in_end) break;
    *(out_cur++) = ((in[-1] << 2) & 0x1C) | ((*in >> 6) & 0x03));
    *(out_cur++) =                          ((*in >> 1) & 0x1F));
    if (++in == in_end) break;
    *(out_cur++) = ((in[-1] << 4) & 0x10) | ((*in >> 4) & 0x0F));
    if (++in == in_end) break;
    *(out_cur++) = ((in[-1] << 1) & 0x1E) | ((*in >> 7) & 0x01));
    *(out_cur++) =                          ((*in >> 2) & 0x1F));
    if (++in == in_end) break;
    *(out_cur++) = ((in[-1] << 3) & 0x18) | ((*in >> 5) & 0x07));
    *(out_cur++) =                          ( *in       & 0x1F));
}

return newSVpvn(out, out_len);

这篇关于使用Perl将8位以外的GIF图像添加到PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆