Python的 - 我怎样才能改变字节的文件 [英] Python - How can I change bytes in a file
问题描述
我正在做一个加密程序,我需要以二进制方式打开文件访问非ASCII和非打印字符,我需要检查是否从文件字符是字母,数字,符号或不可显示字符。这意味着我必须在1到1检查是否字节(当他们去coded到ASCII)匹配任何字符:
{^ 9,dzEV = Q4ciT + / S};!fnq3BFh%#2 K7>&YSU LT; GYD \\ I] |OC_e.W0M~ua-jR5lv1wA`@8t*xr K[P)及b:G $ P(mX6Ho JNZL
我想我可以带code这些字符以上为二进制,然后将它们用字节进行比较。我不知道如何做到这一点。
P.S。对不起,我英语不好和二进制误解。 (我希望你
知道我的字节的意思是,我的意思是像二进制模式字符
这一点):
块引用>\\ X01 \\ X00 \\ x9a \\ x9c \\ X18 \\ X00
解决方案有两种主要的字符串类型的Python:该重新present二进制数据和统一code字符串字节串(字节序列)(一统一code codepoints)的重新present可读文本序列。它是简单的将一个到另一个(☯):
UNI code_text = bytestring.de code(character_encoding)
字节字符串= UNI code_text.en code(character_encoding)如果您打开二进制模式如文件,
RB
然后file.read()
返回字节字符串(字节
键入):>>> b'A'== B'\\ X41'== CHR(0b1000001).EN code()
真正有可用于分类字节的几种方法:
字符串的方法,如
bytes.isdigit()
:>>> b'1'.isdigit()
真正
字符串常量,如
string.printable
>>>进口字符串
>>> B'!在string.printable.en code()
真正
常规EX pressions如
\\ D
>>>进口重
>>>布尔(re.match(BR'\\ D + $',b'123'))
真正
在curses.ascii
模块如curses.ascii.isprint()
分类功能p>
>>>从诅咒中导入ASCII
>>> ByteArray的(过滤器(ascii.isprint,b'123'))
ByteArray的(b'123')
字节组
是一个字节一个可变的序列 - 不像一个字节字符串你可以改变它就地例如,为小写每3个字节是大写的:>>>进口字符串
>>> A =字节组(b'ABCDEF_')
>>>大写= string.ascii_uppercase.en code()
>>>一个[:3] = [B | 0b0100000若B大写的其他b
......对B在[:: 3]
>>>一个
ByteArray的(b'aBCdEF_')注意:
。b'ad
是小写的,但B'_'
保持不变。要修改一个二进制文件就地,你可以使用
MMAP
模块例如,在每隔一行小写第4列在文件
:#!的/ usr / bin中/ env的python3
进口MMAP
进口字符串大写= string.ascii_uppercase.en code()
ncolumn = 3#选择第4列
开放('文件','R + B')的文件\\
mmap.mmap(file.fileno(),0,获得= mmap.ACCESS_WRITE)为MM:
而真正的:
mm.readline()#忽略每隔一行
POS = mm.tell()#记得当前位置
如果不是mm.readline():#EOF
打破
如果毫米[POS + ncolumn]大写的:
毫米[POS + ncolumn] | = 0b0100000#小写请注意:Python的2和3的API在这种情况下有所不同。在code使用Python 3。
输入
ABCDE1
FGHIJ
ABCDE
FGHI输出
ABCDE1
FGHIJ
ABCDE
FGHI注意:第4列成为小写2日和4小时线
通常情况下,如果你想改变一个文件:你从文件中读取,写入修改到一个临时文件,并在成功移动原始文件的临时文件就地:
#!的/ usr / bin中/ env的python3
进口OS
进口字符串
从临时文件导入NamedTemporaryFilecaesar_shift = 3
文件名='文件'高清caesar_bytes(明文,移位,字母= string.ascii_lowercase.en code()):
shifted_alphabet =字母[SHIFT:] +字母[:SHIFT]
返回plaintext.translate(plaintext.maketrans(字母,shifted_alphabet))dest_dir = os.path.dirname(文件名)
块大小= 1<< 15
开放(文件名,RB)的文件\\
NamedTemporaryFile(WB,DIR = dest_dir,删除= FALSE)为tmp_file:
而真:#加密
块= file.read(块大小)
如果没有大块:#EOF
打破
tmp_file.write(caesar_bytes(块,caesar_shift))
os.replace(tmp_file.name,文件名)输入
ABC
DEF
ABC
DEF输出
DEF
GHI
ABC
DEF要转换输出反馈,请将
caesar_shift = -3
。I'm making a encryption program and i need to open file in binary mode to access non-ascii and non-printable characters, i need to check if character from a file is letter, number, symbol or unprintable character. That means i have to check 1 by 1 if bytes (when they are decoded to ascii) match any of these characters:
{^9,dzEV=Q4ciT+/s};fnq3BFh% #2!k7>YSU<GyD\I]|OC_e.W0M~ua-jR5lv1wA`@8t*xr'K"[P)&b:g$p(mX6Ho?JNZL
I think I could encode these characters above to binary and then compare them with bytes. I don't know how to do this.
P.S. Sorry for bad English and Binary misunderstanding. (I hope you know what i mean by Bytes, I mean characters in binary mode like this):
\x01\x00\x9a\x9c\x18\x00
解决方案There are two major string types in Python: bytestrings (a sequence of bytes) that represent binary data and Unicode strings (a sequence of Unicode codepoints) that represent human-readable text. It is simple to convert one into another (☯):
unicode_text = bytestring.decode(character_encoding) bytestring = unicode_text.encode(character_encoding)
If you open a file in binary mode e.g.,
'rb'
thenfile.read()
returns a bytestring (bytes
type):>>> b'A' == b'\x41' == chr(0b1000001).encode() True
There are several methods that can be used to classify bytes:
string methods such as
bytes.isdigit()
:>>> b'1'.isdigit() True
string constants such as
string.printable
>>> import string >>> b'!' in string.printable.encode() True
regular expressions such as
\d
>>> import re >>> bool(re.match(br'\d+$', b'123')) True
classification functions in
curses.ascii
module e.g.,curses.ascii.isprint()
>>> from curses import ascii >>> bytearray(filter(ascii.isprint, b'123')) bytearray(b'123')
bytearray
is a mutable sequence of bytes — unlike a bytestring you can change it inplace e.g., to lowercase every 3rd byte that is uppercase:>>> import string >>> a = bytearray(b'ABCDEF_') >>> uppercase = string.ascii_uppercase.encode() >>> a[::3] = [b | 0b0100000 if b in uppercase else b ... for b in a[::3]] >>> a bytearray(b'aBCdEF_')
Notice:
b'ad'
are lowercase butb'_'
remained the same.
To modify a binary file inplace, you could use
mmap
module e.g., to lowercase 4th column in every other line in'file'
:#!/usr/bin/env python3 import mmap import string uppercase = string.ascii_uppercase.encode() ncolumn = 3 # select 4th column with open('file', 'r+b') as file, \ mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_WRITE) as mm: while True: mm.readline() # ignore every other line pos = mm.tell() # remember current position if not mm.readline(): # EOF break if mm[pos + ncolumn] in uppercase: mm[pos + ncolumn] |= 0b0100000 # lowercase
Note: Python 2 and 3 APIs differ in this case. The code uses Python 3.
Input
ABCDE1 FGHIJ ABCDE FGHI
Output
ABCDE1 FGHiJ ABCDE FGHi
Notice: 4th column became lowercase on 2nd and 4h lines.
Typically if you want to change a file: you read from the file, write modifications to a temporary file, and on success you move the temporary file inplace of the original file:
#!/usr/bin/env python3 import os import string from tempfile import NamedTemporaryFile caesar_shift = 3 filename = 'file' def caesar_bytes(plaintext, shift, alphabet=string.ascii_lowercase.encode()): shifted_alphabet = alphabet[shift:] + alphabet[:shift] return plaintext.translate(plaintext.maketrans(alphabet, shifted_alphabet)) dest_dir = os.path.dirname(filename) chunksize = 1 << 15 with open(filename, 'rb') as file, \ NamedTemporaryFile('wb', dir=dest_dir, delete=False) as tmp_file: while True: # encrypt chunk = file.read(chunksize) if not chunk: # EOF break tmp_file.write(caesar_bytes(chunk, caesar_shift)) os.replace(tmp_file.name, filename)
Input
abc def ABC DEF
Output
def ghi ABC DEF
To convert the output back, set
caesar_shift = -3
.这篇关于Python的 - 我怎样才能改变字节的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!