python-非纯文本文件的Steganographer文件处理错误 [英] python - Steganographer File Handling Error for non plain-text files

查看:105
本文介绍了python-非纯文本文件的Steganographer文件处理错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经构建了 Python Steganographer ,并正在尝试将GUI添加到它。在我关于读取各种文件的上一个问题我之前的问题之后在Python中因为,隐写术者只能对图像中的字节进行编码。我想添加支持以直接编码任何扩展名的文件并在其中进行编码。为此,我正在读取二进制文件并尝试对其进行编码。对于基本上包含纯文本UTF-8的文件,它可以正常工作,因为它可以轻松编码 .txt .py 文件。



我更新的代码是:

  from PIL import Image 

import os

class StringTooLongException(Exception):
pass

class InvalidBitValueException(Exception):
pass

def str2bin(message):
binary = bin(int.from_bytes(message,'big'))
return binary [2:]

def bin2str(二进制):
n = int(binary,2)
return n.to_bytes((n.bit_length()+ 7)// 8,'big')

def隐藏(文件名,消息,位= 2):
image = Image.open(文件名)
二进制= str2bin(消息)+'00000000'

if(len(binary) )%8!= 0:
二进制='0'*(8-((len(binary))%8))+二进制

data = list(image.getdata() )

newData = []

如果len(data)*位< len(binary):
如果位>引发StringTooLongException

8:
提高InvalidBitValueException

索引= 0
表示数据中的像素:
(如果索引< len(二进制):
像素=列表(像素)
像素[0]>> =位
像素[0]< ==位
像素[0 ] + = int('0b'+ binary [index:index + bits],2)
像素=元组(pixel)
索引+ =位

newData.append(像素)

image.putdata(newData)
image.save(os.path.dirname(文件名)+'/ code-'+ os.path.basename(文件名),'PNG ')

return len(binary)

def unhide(filename,bits = 2):
image = Image.open(文件名)
数据= image.getdata()

(如果位> 8:
引发InvalidBitValueException

二进制=''

索引= 0

而不是(len(binary)%8 == 0和binary [-8:] =='00000000'):
value ='00000000'+ bin(data [index] [0])[2:]
binary + = value [-bits :]
索引+ = 1

消息= bin2str(binary)
返回消息

现在,当我尝试隐藏 .pdf .docx 时出现问题文件。发生了几件事:



1)Microsoft Word或Adobe Acrobat显示文件已损坏。



2 )文件大小从40KB减少到3KB,这是明显的错误迹象。



我认为其原因可能是该文件包含NULL字符读取,而我的程序不再对此进行读取。您有其他选择吗?



我有一个更改结束字节的想法,但它的结果仍然与文件可能包含该字节的结果相同。 / p>

再次感谢!

解决方案

您可以使用和结束-stream(EOS)标记,当您确定标记序列不会显示在消息流中时。当您无法保证时,有两种选择:




  • 创建一个更复杂的EOS标记,由许多字节组成。证明不会像以前那样出现相同的问题可能很麻烦,或者

  • 在邮件的开头添加一个标头,该标头编码要读取的位/字节数完整的消息提取。



通常,只要我事先知道要传输的信息并且仅依靠它,我都会使用标头当我不知道我的字节流何时终止时(例如动态压缩)的EOS标记。



要进行嵌入,您应该瞄准:




  • 获取二进制字符串

  • 测量其长度

  • 将其转换为整数到固定大小的二进制文件,例如32位

  • 在邮件前面附加该位字符串bitli

  • 将所有这些嵌入到封面中中



并提取:




  • 提取前32位

  • 将其转换为整数以获取消息的位字符串长度

  • 开始m索引32并提取必要的位数

  • 转换回字节流并保存到文件



作为奖励,您可以在标题中添加各种信息,例如原始文件的名称。只要所有内容都以某种方式编码,您以后就可以提取它。例如。

  header = 4个字节表示消息字符串的长度+ 
1个字节表示字符数在文件名+
中,文件名


的字节数

I've built a Python Steganographer and am trying to add a GUI to it. After my previous question regarding reading all kinds of files in Python. Since, the steganographer can only encode bytes in image. I want to add support to directly encode a file of any extension and encoding in it. For this, I am reading the file in binary and trying to encode it. It works fine for files which basically contains plain-text UTF-8 because it can easily encode .txt and .py files.

My updated code is:

from PIL import Image

import os

class StringTooLongException(Exception):
    pass

class InvalidBitValueException(Exception):
    pass

def str2bin(message):       
    binary = bin(int.from_bytes(message, 'big'))
    return binary[2:]

def bin2str(binary):
    n = int(binary, 2)
    return n.to_bytes((n.bit_length() + 7) // 8, 'big')

def hide(filename, message, bits=2):
    image = Image.open(filename)
    binary = str2bin(message) + '00000000'

    if (len(binary)) % 8 != 0:
        binary = '0'*(8 - ((len(binary)) % 8)) + binary

    data = list(image.getdata())

    newData = []

    if len(data) * bits < len(binary):
        raise StringTooLongException

    if bits > 8:
        raise InvalidBitValueException

    index = 0
    for pixel in data:
        if index < len(binary):
            pixel = list(pixel)
            pixel[0] >>= bits
            pixel[0] <<= bits
            pixel[0] += int('0b' + binary[index:index+bits], 2)
            pixel = tuple(pixel)
            index += bits

        newData.append(pixel)

    image.putdata(newData)
    image.save(os.path.dirname(filename) + '/coded-'+os.path.basename(filename), 'PNG')

    return len(binary)

def unhide(filename, bits=2):
    image = Image.open(filename)
    data = image.getdata()

    if bits > 8:
        raise InvalidBitValueException

    binary = ''

    index = 0

    while not (len(binary) % 8 == 0 and binary[-8:] == '00000000'):
        value = '00000000' + bin(data[index][0])[2:]
        binary += value[-bits:]
        index += 1

    message = bin2str(binary)
    return message

Now, the problem comes when I try to hide .pdf or .docx files in it. Several things are happening:

1) Microsoft Word or Adobe Acrobat shows that the file is corrupt.

2)The file size is considerable reduced from 40KB to 3KB which is a clear sign of error.

I think that the reason behind this could be that the file contains a NULL character reading which my program does not read further. Do you have any alternative idea for it?

I have an idea to change the ending byte but it may still have the same result as a file may contain that byte.

Thanks, again!

解决方案

You can use and end-of-stream (EOS) marker when you are certain the marker sequence will not show up in your message stream. When you can't guarantee that, you have two options:

  • create a more complicated EOS marker, comprised of many bytes. This can be quite the nuisance to prove the same problem won't arise as before, or
  • Add a header at the beginning of your message, which encodes how many bits/bytes to read for the complete message extraction.

Generally, I'd use a header whenever I have information beforehand that I want to transmit and only rely on EOS markers when I don't know when my byte stream will terminate, e.g., on-the-fly compression.

For embedding, you should aim to:

  • get your binary string
  • measure its length
  • convert that integer to a binary of fixed size, say, 32 bits
  • attach that bitstring in front of your message bitstring
  • embed all of that to your cover medium

And for extraction:

  • extract the first 32 bits
  • convert those to an integer to get your message bitstring length
  • start from index 32 and extract the neccessary number of bits
  • convert back to a bytestream and save to a file

As a bonus, you can add all sorts of information to your header, e.g., the name of the original file. As long as it's all encoded in a way you can extract it later. For example.

header = 4 bytes for the length of the message string +
         1 byte for the number of characters in the filename +
         that many bytes for the filename

这篇关于python-非纯文本文件的Steganographer文件处理错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆