如何在python中修改现有文件的压缩itxt记录? [英] How to modify a compressed itxt record of an existing file in python?

查看:98
本文介绍了如何在python中修改现有文件的压缩itxt记录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这看起来太简单了,但是我找不到直接的解决方案.

保存后,应再次压缩itxt.

解决方案

它并不像您盯着它那么简单.如果是这样,您可能会发现没有没有直接的解决方案.

让我们从基础开始.

PyPNG可以读取所有块吗?

一个重要的问题,因为修改现有的PNG文件是一项艰巨的任务.阅读其文档,它的开始并不顺利:

PNG:逐块打包

辅助块

.. iTXt
阅读时被忽略.未生成.

( https://pythonhosted.org/pypng/chunk.html )

但是在该页面的下方,救恩!

非标准块
通常,不可能生成具有任何其他块类型的PNG图像.读取PNG图像时,使用块接口png.Reader.chunks对其进行处理将允许处理任何块(通过用户代码).

因此,我所要做的就是编写此用户代码",PyPNG可以完成其余的工作. (哦)

iTXt块怎么样?

让我们来看看您感兴趣的东西.

4.2.3.3. iTXt国际文本数据

..文本数据采用Unicode字符集的UTF-8编码,而不是Latin-1.此块包含:

Keyword:             1-79 bytes (character string)
Null separator:      1 byte
Compression flag:    1 byte
Compression method:  1 byte
Language tag:        0 or more bytes (character string)
Null separator:      1 byte
Translated keyword:  0 or more bytes
Null separator:      1 byte
Text:                0 or more bytes

( http://www. libpng.org/pub/png/spec/1.2/PNG-Chunks.html#C.iTXt )

让我明白.可选压缩应该不是问题,因为

.. [t]当前为压缩方法字节定义的唯一值为0,表示zlib ..

我非常有信心Python可以为我做些事情.

然后返回到PyPNG的块处理.

我们可以看到块数据吗?

PyPNG提供了 iterator ,因此检查PNG是否包含iTXt块确实很容易:

chunks()
返回一个迭代器,该迭代器将每个块作为(块类型,内容)对产生.

( https://pythonhosted.org/pypng/png.html ?#png.Reader.chunks )

因此,让我们以交互方式编写一些代码并进行检查.我从 http://pmt.sourceforge.net/itxt/中获得了示例图片,在此重复为了方便. (如果此处iTXt数据未保存,请下载并使用原始数据.)

>>> import png
>>> imageFile = png.Reader("itxt.png")
>>> print imageFile
<png.Reader instance at 0x10ae1cfc8>
>>> for c in imageFile.chunks():
...   print c[0],len(c[1])
... 
IHDR 13
gAMA 4
sBIT 4
pCAL 44
tIME 7
bKGD 6
pHYs 9
tEXt 9
iTXt 39
IDAT 4000
IDAT 831
zTXt 202
iTXt 111
IEND 0

成功!

回写呢?好吧,PyPNG通常用于创建完整的图像,但幸运的是,它还提供了一种从自定义块中显式创建图像的方法:

png.write_chunks(出块)
通过写出块来创建PNG文件.

因此,我们可以遍历这些块,更改所需的块,然后写回修改后的PNG.

解包和打包iTXt数据

这本身就是一项任务.数据格式已被很好地描述,但不适用于Python的本机unpackpack方法.所以我们必须自己发明一些东西.

文本字符串以ASCIIZ格式存储:以零字节结尾的字符串.我们需要一个小的函数来拆分第一个0:

def cutASCIIZ(str):
   end = str.find(chr(0))
   if end >= 0:
      result = str[:end]
      return [str[:end],str[end+1:]]
   return ['',str]

此快捷方式函数返回[之前之后]对的数组,并丢弃零本身.

为了尽可能透明地处理iTXt数据,我将其设为一个类:

class Chunk_iTXt:
  def __init__(self, chunk_data):
    tmp = cutASCIIZ(chunk_data)
    self.keyword = tmp[0]
    if len(tmp[1]):
      self.compressed = ord(tmp[1][0])
    else:
      self.compressed = 0
    if len(tmp[1]) > 1:
      self.compressionMethod = ord(tmp[1][1])
    else:
      self.compressionMethod = 0
    tmp = tmp[1][2:]
    tmp = cutASCIIZ(tmp)
    self.languageTag = tmp[0]
    tmp = tmp[1]
    tmp = cutASCIIZ(tmp)
    self.languageTagTrans = tmp[0]
    if self.compressed:
      if self.compressionMethod != 0:
        raise TypeError("Unknown compression method")
      self.text = zlib.decompress(tmp[1])
    else:
      self.text = tmp[1]

  def pack (self):
    result = self.keyword+chr(0)
    result += chr(self.compressed)
    result += chr(self.compressionMethod)
    result += self.languageTag+chr(0)
    result += self.languageTagTrans+chr(0)
    if self.compressed:
      if self.compressionMethod != 0:
        raise TypeError("Unknown compression method")
      result += zlib.compress(self.text)
    else:
      result += self.text
    return result

  def show (self):
    print 'iTXt chunk contents:'
    print '  keyword: "'+self.keyword+'"'
    print '  compressed: '+str(self.compressed)
    print '  compression method: '+str(self.compressionMethod)
    print '  language: "'+self.languageTag+'"'
    print '  tag translation: "'+self.languageTagTrans+'"'
    print '  text: "'+self.text+'"'

由于它使用zlib,因此在程序顶部需要import zlib.

类构造函数接受太短"的字符串,在这种情况下,它将为所有未定义的内容使用默认值.

show方法列出了用于调试目的的数据.

使用我的自定义类

有了所有这些,现在检查,修改和添加iTXt块最终 很简单:

import png
import zlib

# insert helper and class here

sourceImage = png.Reader("itxt.png")
chunkList = []
for chunk in sourceImage.chunks():
  if chunk[0] == 'iTXt':
    itxt = Chunk_iTXt(chunk[1])
    itxt.show()
    # modify existing data
    if itxt.keyword == 'Author':
      itxt.text = 'Rad Lexus'
      itxt.compressed = 1
    chunk = [chunk[0], itxt.pack()]
  chunkList.append (chunk)

# append new data
newData = Chunk_iTXt('')
newData.keyword = 'Custom'
newData.languageTag = 'nl'
newData.languageTagTrans = 'Aangepast'
newData.text = 'Dat was leuk.'
chunkList.insert (-1, ['iTXt', newData.pack()])

with open("foo.png", "wb") as file:
  png.write_chunks(file, chunkList)

添加全新的块时,请小心不要append,因为它会出现在 所需的最后一个IEND块之后,这是一个错误.我没有尝试,但是您也可能不应该将其插入所需的第一个IHDR块之前,或者(如Glenn Randers-Pehrson所述)插入连续的IDAT块之间.

请注意,根据规范,iTXt中的所有文本均应使用UTF8编码.

I know this looks too simple but I couldn’t find a straight forward solution.

Once saved, the itxt should be compressed again.

解决方案

It's not so simple as you eyeballed it. If it were, you might have found out there is no straightforward solution.

Let's start with the basics.

Can PyPNG read all chunks?

An important question, because modifying an existing PNG file is a large task. Reading its documentation, it doesn't start out well:

PNG: Chunk by Chunk

Ancillary Chunks

.. iTXt
Ignored when reading. Not generated.

(https://pythonhosted.org/pypng/chunk.html)

But lower on that page, salvation!

Non-standard Chunks
Generally it is not possible to generate PNG images with any other chunk types. When reading a PNG image, processing it using the chunk interface, png.Reader.chunks, will allow any chunk to be processed (by user code).

So all I have to do is write this 'user code', and PyPNG can do the rest. (Oof.)

What about the iTXt chunk?

Let's take a peek at what you are interested in.

4.2.3.3. iTXt International textual data

.. the textual data is in the UTF-8 encoding of the Unicode character set instead of Latin-1. This chunk contains:

Keyword:             1-79 bytes (character string)
Null separator:      1 byte
Compression flag:    1 byte
Compression method:  1 byte
Language tag:        0 or more bytes (character string)
Null separator:      1 byte
Translated keyword:  0 or more bytes
Null separator:      1 byte
Text:                0 or more bytes

(http://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html#C.iTXt)

Looks clear to me. The optional compression ought not be a problem, since

.. [t]he only value presently defined for the compression method byte is 0, meaning zlib ..

and I am pretty confident there is something existing for Python that can do this for me.

Back to PyPNG's chunk handling then.

Can we see the chunk data?

PyPNG offers an iterator, so indeed checking if a PNG contains an iTXt chunk is easy:

chunks()
Return an iterator that will yield each chunk as a (chunktype, content) pair.

(https://pythonhosted.org/pypng/png.html?#png.Reader.chunks)

So let's write some code in interactive mode and check. I got a sample image from http://pmt.sourceforge.net/itxt/, repeated here for convenience. (If the iTXt data is not conserved here, download and use the original.)

>>> import png
>>> imageFile = png.Reader("itxt.png")
>>> print imageFile
<png.Reader instance at 0x10ae1cfc8>
>>> for c in imageFile.chunks():
...   print c[0],len(c[1])
... 
IHDR 13
gAMA 4
sBIT 4
pCAL 44
tIME 7
bKGD 6
pHYs 9
tEXt 9
iTXt 39
IDAT 4000
IDAT 831
zTXt 202
iTXt 111
IEND 0

Success!

What about writing back? Well, PyPNG is usually used to create complete images, but fortunately it also offers a method to explicitly create one from custom chunks:

png.write_chunks(out, chunks)
Create a PNG file by writing out the chunks.

So we can iterate over the chunks, change the one(s) you want, and write back the modified PNG.

Unpacking and packing iTXt data

This is a task in itself. The data format is well described, but not suitable for Python's native unpack and pack methods. So we have to invent something ourself.

The text strings are stored in ASCIIZ format: a string ending with a zero byte. We need a small function to split on the first 0:

def cutASCIIZ(str):
   end = str.find(chr(0))
   if end >= 0:
      result = str[:end]
      return [str[:end],str[end+1:]]
   return ['',str]

This quick-and-dirty function returns an array of a [before, after] pair, and discards the zero itself.

To handle the iTXt data as transparently as possible, I make it a class:

class Chunk_iTXt:
  def __init__(self, chunk_data):
    tmp = cutASCIIZ(chunk_data)
    self.keyword = tmp[0]
    if len(tmp[1]):
      self.compressed = ord(tmp[1][0])
    else:
      self.compressed = 0
    if len(tmp[1]) > 1:
      self.compressionMethod = ord(tmp[1][1])
    else:
      self.compressionMethod = 0
    tmp = tmp[1][2:]
    tmp = cutASCIIZ(tmp)
    self.languageTag = tmp[0]
    tmp = tmp[1]
    tmp = cutASCIIZ(tmp)
    self.languageTagTrans = tmp[0]
    if self.compressed:
      if self.compressionMethod != 0:
        raise TypeError("Unknown compression method")
      self.text = zlib.decompress(tmp[1])
    else:
      self.text = tmp[1]

  def pack (self):
    result = self.keyword+chr(0)
    result += chr(self.compressed)
    result += chr(self.compressionMethod)
    result += self.languageTag+chr(0)
    result += self.languageTagTrans+chr(0)
    if self.compressed:
      if self.compressionMethod != 0:
        raise TypeError("Unknown compression method")
      result += zlib.compress(self.text)
    else:
      result += self.text
    return result

  def show (self):
    print 'iTXt chunk contents:'
    print '  keyword: "'+self.keyword+'"'
    print '  compressed: '+str(self.compressed)
    print '  compression method: '+str(self.compressionMethod)
    print '  language: "'+self.languageTag+'"'
    print '  tag translation: "'+self.languageTagTrans+'"'
    print '  text: "'+self.text+'"'

Since this uses zlib, it requires an import zlib at the top of your program.

The class constructor accepts 'too short' strings, in which case it will use defaults for everything undefined.

The show method lists the data for debugging purposes.

Using my custom class

With all of this, now examining, modifying, and adding iTXt chunks finally is straightforward:

import png
import zlib

# insert helper and class here

sourceImage = png.Reader("itxt.png")
chunkList = []
for chunk in sourceImage.chunks():
  if chunk[0] == 'iTXt':
    itxt = Chunk_iTXt(chunk[1])
    itxt.show()
    # modify existing data
    if itxt.keyword == 'Author':
      itxt.text = 'Rad Lexus'
      itxt.compressed = 1
    chunk = [chunk[0], itxt.pack()]
  chunkList.append (chunk)

# append new data
newData = Chunk_iTXt('')
newData.keyword = 'Custom'
newData.languageTag = 'nl'
newData.languageTagTrans = 'Aangepast'
newData.text = 'Dat was leuk.'
chunkList.insert (-1, ['iTXt', newData.pack()])

with open("foo.png", "wb") as file:
  png.write_chunks(file, chunkList)

When adding a totally new chunk, be careful not to append it, because then it will appear after the required last IEND chunk, which is an error. I did not try but you should also probably not insert it before the required first IHDR chunk or (as commented by Glenn Randers-Pehrson) in between consecutive IDAT chunks.

Note that according to the specifications, all texts in iTXt should be UTF8 encoded.

这篇关于如何在python中修改现有文件的压缩itxt记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆