python-非纯文本文件的Steganographer文件处理错误 [英] python - Steganographer File Handling Error for non plain-text files
问题描述
我已经构建了 Python Steganographer ,并正在尝试将GUI添加到它。在我关于读取各种文件的上一个问题我之前的问题之后在Python中因为,隐写术者只能对图像中的字节进行编码。我想添加支持以直接编码任何扩展名的文件并在其中进行编码。为此,我正在读取二进制文件并尝试对其进行编码。对于基本上包含纯文本UTF-8的文件,它可以正常工作,因为它可以轻松编码 .txt
和 .py
文件。
我更新的代码是:
from PIL import Image
import os
class StringTooLongException(Exception):
pass
class InvalidBitValueException(Exception):
pass
def str2bin(message):
binary = bin(int.from_bytes(message,'big'))
return binary [2:]
def bin2str(二进制):
n = int(binary,2)
return n.to_bytes((n.bit_length()+ 7)// 8,'big')
def隐藏(文件名,消息,位= 2):
image = Image.open(文件名)
二进制= str2bin(消息)+'00000000'
if(len(binary) )%8!= 0:
二进制='0'*(8-((len(binary))%8))+二进制
data = list(image.getdata() )
newData = []
如果len(data)*位< len(binary):
如果位>引发StringTooLongException
8:
提高InvalidBitValueException
索引= 0
表示数据中的像素:
(如果索引< len(二进制):
像素=列表(像素)
像素[0]>> =位
像素[0]< ==位
像素[0 ] + = int('0b'+ binary [index:index + bits],2)
像素=元组(pixel)
索引+ =位
newData.append(像素)
image.putdata(newData)
image.save(os.path.dirname(文件名)+'/ code-'+ os.path.basename(文件名),'PNG ')
return len(binary)
def unhide(filename,bits = 2):
image = Image.open(文件名)
数据= image.getdata()
(如果位> 8:
引发InvalidBitValueException
二进制=''
索引= 0
而不是(len(binary)%8 == 0和binary [-8:] =='00000000'):
value ='00000000'+ bin(data [index] [0])[2:]
binary + = value [-bits :]
索引+ = 1
消息= bin2str(binary)
返回消息
现在,当我尝试隐藏 .pdf
或 .docx
时出现问题文件。发生了几件事:
1)Microsoft Word或Adobe Acrobat显示文件已损坏。
2 )文件大小从40KB减少到3KB,这是明显的错误迹象。
我认为其原因可能是该文件包含NULL字符读取,而我的程序不再对此进行读取。您有其他选择吗?
我有一个更改结束字节的想法,但它的结果仍然与文件可能包含该字节的结果相同。 / p>
再次感谢!
您可以使用和结束-stream(EOS)标记,当您确定标记序列不会显示在消息流中时。当您无法保证时,有两种选择:
- 创建一个更复杂的EOS标记,由许多字节组成。证明不会像以前那样出现相同的问题可能很麻烦,或者
- 在邮件的开头添加一个标头,该标头编码要读取的位/字节数完整的消息提取。
通常,只要我事先知道要传输的信息并且仅依靠它,我都会使用标头当我不知道我的字节流何时终止时(例如动态压缩)的EOS标记。
要进行嵌入,您应该瞄准:
- 获取二进制字符串
- 测量其长度
- 将其转换为整数到固定大小的二进制文件,例如32位
- 在邮件前面附加该位字符串bitli
- 将所有这些嵌入到封面中中
并提取:
- 提取前32位
- 将其转换为整数以获取消息的位字符串长度
- 开始m索引32并提取必要的位数
- 转换回字节流并保存到文件
作为奖励,您可以在标题中添加各种信息,例如原始文件的名称。只要所有内容都以某种方式编码,您以后就可以提取它。例如。
header = 4个字节表示消息字符串的长度+
1个字节表示字符数在文件名+
中,文件名
的字节数
I've built a Python Steganographer and am trying to add a GUI to it. After my previous question regarding reading all kinds of files in Python. Since, the steganographer can only encode bytes in image. I want to add support to directly encode a file of any extension and encoding in it. For this, I am reading the file in binary and trying to encode it. It works fine for files which basically contains plain-text UTF-8 because it can easily encode .txt
and .py
files.
My updated code is:
from PIL import Image
import os
class StringTooLongException(Exception):
pass
class InvalidBitValueException(Exception):
pass
def str2bin(message):
binary = bin(int.from_bytes(message, 'big'))
return binary[2:]
def bin2str(binary):
n = int(binary, 2)
return n.to_bytes((n.bit_length() + 7) // 8, 'big')
def hide(filename, message, bits=2):
image = Image.open(filename)
binary = str2bin(message) + '00000000'
if (len(binary)) % 8 != 0:
binary = '0'*(8 - ((len(binary)) % 8)) + binary
data = list(image.getdata())
newData = []
if len(data) * bits < len(binary):
raise StringTooLongException
if bits > 8:
raise InvalidBitValueException
index = 0
for pixel in data:
if index < len(binary):
pixel = list(pixel)
pixel[0] >>= bits
pixel[0] <<= bits
pixel[0] += int('0b' + binary[index:index+bits], 2)
pixel = tuple(pixel)
index += bits
newData.append(pixel)
image.putdata(newData)
image.save(os.path.dirname(filename) + '/coded-'+os.path.basename(filename), 'PNG')
return len(binary)
def unhide(filename, bits=2):
image = Image.open(filename)
data = image.getdata()
if bits > 8:
raise InvalidBitValueException
binary = ''
index = 0
while not (len(binary) % 8 == 0 and binary[-8:] == '00000000'):
value = '00000000' + bin(data[index][0])[2:]
binary += value[-bits:]
index += 1
message = bin2str(binary)
return message
Now, the problem comes when I try to hide .pdf
or .docx
files in it. Several things are happening:
1) Microsoft Word or Adobe Acrobat shows that the file is corrupt.
2)The file size is considerable reduced from 40KB to 3KB which is a clear sign of error.
I think that the reason behind this could be that the file contains a NULL character reading which my program does not read further. Do you have any alternative idea for it?
I have an idea to change the ending byte but it may still have the same result as a file may contain that byte.
Thanks, again!
You can use and end-of-stream (EOS) marker when you are certain the marker sequence will not show up in your message stream. When you can't guarantee that, you have two options:
- create a more complicated EOS marker, comprised of many bytes. This can be quite the nuisance to prove the same problem won't arise as before, or
- Add a header at the beginning of your message, which encodes how many bits/bytes to read for the complete message extraction.
Generally, I'd use a header whenever I have information beforehand that I want to transmit and only rely on EOS markers when I don't know when my byte stream will terminate, e.g., on-the-fly compression.
For embedding, you should aim to:
- get your binary string
- measure its length
- convert that integer to a binary of fixed size, say, 32 bits
- attach that bitstring in front of your message bitstring
- embed all of that to your cover medium
And for extraction:
- extract the first 32 bits
- convert those to an integer to get your message bitstring length
- start from index 32 and extract the neccessary number of bits
- convert back to a bytestream and save to a file
As a bonus, you can add all sorts of information to your header, e.g., the name of the original file. As long as it's all encoded in a way you can extract it later. For example.
header = 4 bytes for the length of the message string +
1 byte for the number of characters in the filename +
that many bytes for the filename
这篇关于python-非纯文本文件的Steganographer文件处理错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!