Python的 - 我怎样才能改变字节的文件 [英] Python - How can I change bytes in a file

查看:179
本文介绍了Python的 - 我怎样才能改变字节的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个加密程序,我需要以二进制方式打开文件访问非ASCII和非打印字符,我需要检查是否从文件字符是字母,数字,符号或不可显示字符。这意味着我必须在1到1检查是否字节(当他们去coded到ASCII)匹配任何字符:

  {^ 9,dzEV = Q4ciT + / S};!fnq3BFh%#2 K7>&YSU LT; GYD \\ I] |OC_e.W0M~ua-jR5lv1wA`@8t*xr K[P)及b:G $ P(mX6Ho JNZL

我想我可以带code这些字符以上为二进制,然后将它们用字节进行比较。我不知道如何做到这一点。


  

P.S。对不起,我英语不好和二进制误解。 (我希望你
  知道我的字节的意思是,我的意思是像二进制模式字符
  这一点):


  \\ X01 \\ X00 \\ x9a \\ x9c \\ X18 \\ X00


解决方案

有两种主要的字符串类型的Python:该重新present二进制数据和统一code字符串字节串(字节序列)(一统一code codepoints)的重新present可读文本序列。它是简单的将一个到另一个(☯):

  UNI code_text = bytestring.de code(character_encoding)
字节字符串= UNI code_text.en code(character_encoding)

如果您打开二进制模式如文件,RB然后 file.read()返回字节字符串(字节键入):

 >>> b'A'== B'\\ X41'== CHR(0b1000001).EN code()
真正


有可用于分类字节的几种方法:


  • 字符串的方法,如 bytes.isdigit()

     >>> b'1'.isdigit()
    真正


  • 字符串常量,如 string.printable

     >>>进口字符串
    >>> B'!在string.printable.en code()
    真正


  • 常规EX pressions如 \\ D

     >>>进口重
    >>>布尔(re.match(BR'\\ D + $',b'123'))
    真正


  • curses.ascii 模块如 curses.ascii.isprint()

    分类功能p>

     >>>从诅咒中导入ASCII
    >>> ByteArray的(过滤器(ascii.isprint,b'123'))
    ByteArray的(b'123')


字节组是一个字节一个可变的序列 - 不像一个字节字符串你可以改变它就地例如,为小写每3个字节是大写的:

 >>>进口字符串
>>> A =字节组(b'ABCDEF_')
>>>大写= string.ascii_uppercase.en code()
>>>一个[:3] = [B | 0b0100000若B大写的其他b
......对B在[:: 3]
>>>一个
ByteArray的(b'aBCdEF_')

注意: b'ad 是小写的,但 B'_'保持不变。


要修改一个二进制文件就地,你可以使用 MMAP 模块例如,在每隔一行小写第4列在文件

 #!的/ usr / bin中/ env的python3
进口MMAP
进口字符串大写= string.ascii_uppercase.en code()
ncolumn = 3#选择第4列
开放('文件','R + B')的文件\\
     mmap.mmap(file.fileno(),0,获得= mmap.ACCESS_WRITE)为MM:
    而真正的:
        mm.readline()#忽略每隔一行
        POS = mm.tell()#记得当前位置
        如果不是mm.readline():#EOF
            打破
        如果毫米[POS + ncolumn]大写的:
            毫米[POS + ncolumn] | = 0b0100000#小写

请注意:Python的2和3的API在这种情况下有所不同。在code使用Python 3。

输入

  ABCDE1
FGHIJ
ABCDE
FGHI

输出

  ABCDE1
FGHIJ
ABCDE
FGHI

注意:第4列成为小写2日和4小时线


通常情况下,如果你想改变一个文件:你从文件中读取,写入修改到一个临时文件,并在成功移动原始文件的临时文件就地:

 #!的/ usr / bin中/ env的python3
进口OS
进口字符串
从临时文件导入NamedTemporaryFilecaesar_shift = 3
文件名='文件'高清caesar_bytes(明文,移位,字母= string.ascii_lowercase.en code()):
    shifted_alphabet =字母[SHIFT:] +字母[:SHIFT]
    返回plaintext.translate(plaintext.maketrans(字母,shifted_alphabet))dest_dir = os.path.dirname(文件名)
块大小= 1<< 15
开放(文件名,RB)的文件\\
     NamedTemporaryFile(WB,DIR = dest_dir,删除= FALSE)为tmp_file:
    而真:#加密
        块= file.read(块大小)
        如果没有大块:#EOF
            打破
        tmp_file.write(caesar_bytes(块,caesar_shift))
os.replace(tmp_file.name,文件名)

输入

  ABC
DEF
ABC
DEF

输出

  DEF
GHI
ABC
DEF

要转换输出反馈,请将 caesar_shift = -3

I'm making a encryption program and i need to open file in binary mode to access non-ascii and non-printable characters, i need to check if character from a file is letter, number, symbol or unprintable character. That means i have to check 1 by 1 if bytes (when they are decoded to ascii) match any of these characters:

{^9,dzEV=Q4ciT+/s};fnq3BFh% #2!k7>YSU<GyD\I]|OC_e.W0M~ua-jR5lv1wA`@8t*xr'K"[P)&b:g$p(mX6Ho?JNZL

I think I could encode these characters above to binary and then compare them with bytes. I don't know how to do this.

P.S. Sorry for bad English and Binary misunderstanding. (I hope you know what i mean by Bytes, I mean characters in binary mode like this):

\x01\x00\x9a\x9c\x18\x00

解决方案

There are two major string types in Python: bytestrings (a sequence of bytes) that represent binary data and Unicode strings (a sequence of Unicode codepoints) that represent human-readable text. It is simple to convert one into another (☯):

unicode_text = bytestring.decode(character_encoding)
bytestring = unicode_text.encode(character_encoding)

If you open a file in binary mode e.g., 'rb' then file.read() returns a bytestring (bytes type):

>>> b'A' == b'\x41' == chr(0b1000001).encode()
True


There are several methods that can be used to classify bytes:

  • string methods such as bytes.isdigit():

    >>> b'1'.isdigit()
    True
    

  • string constants such as string.printable

    >>> import string
    >>> b'!' in string.printable.encode()
    True
    

  • regular expressions such as \d

    >>> import re
    >>> bool(re.match(br'\d+$', b'123'))
    True
    

  • classification functions in curses.ascii module e.g., curses.ascii.isprint()

    >>> from curses import ascii
    >>> bytearray(filter(ascii.isprint, b'123'))
    bytearray(b'123')
    

bytearray is a mutable sequence of bytes — unlike a bytestring you can change it inplace e.g., to lowercase every 3rd byte that is uppercase:

>>> import string
>>> a = bytearray(b'ABCDEF_')
>>> uppercase = string.ascii_uppercase.encode()
>>> a[::3] = [b | 0b0100000 if b in uppercase else b 
...           for b in a[::3]]
>>> a
bytearray(b'aBCdEF_')

Notice: b'ad' are lowercase but b'_' remained the same.


To modify a binary file inplace, you could use mmap module e.g., to lowercase 4th column in every other line in 'file':

#!/usr/bin/env python3
import mmap
import string

uppercase = string.ascii_uppercase.encode()
ncolumn = 3 # select 4th column
with open('file', 'r+b') as file, \
     mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_WRITE) as mm:
    while True:
        mm.readline()   # ignore every other line
        pos = mm.tell() # remember current position
        if not mm.readline(): # EOF
            break
        if mm[pos + ncolumn] in uppercase:
            mm[pos + ncolumn] |= 0b0100000 # lowercase

Note: Python 2 and 3 APIs differ in this case. The code uses Python 3.

Input

ABCDE1
FGHIJ
ABCDE
FGHI

Output

ABCDE1
FGHiJ
ABCDE
FGHi

Notice: 4th column became lowercase on 2nd and 4h lines.


Typically if you want to change a file: you read from the file, write modifications to a temporary file, and on success you move the temporary file inplace of the original file:

#!/usr/bin/env python3
import os
import string
from tempfile import NamedTemporaryFile

caesar_shift = 3
filename = 'file'

def caesar_bytes(plaintext, shift, alphabet=string.ascii_lowercase.encode()):
    shifted_alphabet = alphabet[shift:] + alphabet[:shift]
    return plaintext.translate(plaintext.maketrans(alphabet, shifted_alphabet))

dest_dir = os.path.dirname(filename)
chunksize = 1 << 15
with open(filename, 'rb') as file, \
     NamedTemporaryFile('wb', dir=dest_dir, delete=False) as tmp_file:
    while True: # encrypt
        chunk = file.read(chunksize)
        if not chunk: # EOF
            break
        tmp_file.write(caesar_bytes(chunk, caesar_shift))
os.replace(tmp_file.name, filename)

Input

abc
def
ABC
DEF

Output

def
ghi
ABC
DEF

To convert the output back, set caesar_shift = -3.

这篇关于Python的 - 我怎样才能改变字节的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆