如何更改文件中的字节? [英] How to change the bytes in a file?

查看:209
本文介绍了如何更改文件中的字节?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个加密程序,我需要以二进制模式打开文件以访问非ASCII和不可打印的字符,我需要检查文件中的字符是字母,数字,符号还是不可打印的字符.这意味着如果字节(将它们解码为ascii时)与以下任何字符相匹配,我就必须按1逐个检查:

I'm making a encryption program and I need to open file in binary mode to access non-ascii and non-printable characters, I need to check if character from a file is letter, number, symbol or unprintable character. That means I have to check 1 by 1 if bytes (when they are decoded to ascii) match any of these characters:

{^9,dzEV=Q4ciT+/s};fnq3BFh% #2!k7>YSU<GyD\I]|OC_e.W0M~ua-jR5lv1wA`@8t*xr'K"[P)&b:g$p(mX6Ho?JNZL

我想我可以将上面的这些字符编码为二进制,然后将它们与字节进行比较.我不知道该怎么做.

I think I could encode these characters above to binary and then compare them with bytes. I don't know how to do this.

P.S.对不起,英语和二进制错误的理解. (我希望你 知道我的意思是字节,我的意思是二进制模式下的字符 这个):

P.S. Sorry for bad English and binary misunderstanding. (I hope you know what I mean by bytes, I mean characters in binary mode like this):

\x01\x00\x9a\x9c\x18\x00

推荐答案

Python中有两种主要的字符串类型:代表二进制数据的字节字符串(字节序列)和代表人类的Unicode字符串(Unicode代码点序列)可读的文本.将一个转换为另一个(☯)很简单:

There are two major string types in Python: bytestrings (a sequence of bytes) that represent binary data and Unicode strings (a sequence of Unicode codepoints) that represent human-readable text. It is simple to convert one into another (☯):

unicode_text = bytestring.decode(character_encoding)
bytestring = unicode_text.encode(character_encoding)

如果您以二进制模式打开文件,例如'rb',则file.read()返回一个字节串(bytes类型):

If you open a file in binary mode e.g., 'rb' then file.read() returns a bytestring (bytes type):

>>> b'A' == b'\x41' == chr(0b1000001).encode()
True


可以使用几种方法对字节进行分类:


There are several methods that can be used to classify bytes:

  • 字符串方法,例如bytes.isdigit():

>>> b'1'.isdigit()
True

  • 字符串常量,例如string.printable

    >>> import string
    >>> b'!' in string.printable.encode()
    True
    

  • 正则表达式,例如\d

    >>> import re
    >>> bool(re.match(br'\d+$', b'123'))
    True
    

  • curses.ascii模块中的
  • 分类功能,例如curses.ascii.isprint()

  • classification functions in curses.ascii module e.g., curses.ascii.isprint()

    >>> from curses import ascii
    >>> bytearray(filter(ascii.isprint, b'123'))
    bytearray(b'123')
    

  • bytearray是一个可变的字节序列-与字节字符串不同,您可以就位更改它,例如,每3个大写字母字节将其小写:

    bytearray is a mutable sequence of bytes — unlike a bytestring you can change it inplace e.g., to lowercase every 3rd byte that is uppercase:

    >>> import string
    >>> a = bytearray(b'ABCDEF_')
    >>> uppercase = string.ascii_uppercase.encode()
    >>> a[::3] = [b | 0b0100000 if b in uppercase else b 
    ...           for b in a[::3]]
    >>> a
    bytearray(b'aBCdEF_')
    

    注意:b'ad'是小写字母,但b'_'保持不变.

    Notice: b'ad' are lowercase but b'_' remained the same.

    要就地修改二进制文件,可以使用mmap模块,例如,将'file'中每隔一行的第4列小写:

    To modify a binary file inplace, you could use mmap module e.g., to lowercase 4th column in every other line in 'file':

    #!/usr/bin/env python3
    import mmap
    import string
    
    uppercase = string.ascii_uppercase.encode()
    ncolumn = 3 # select 4th column
    with open('file', 'r+b') as file, \
         mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_WRITE) as mm:
        while True:
            mm.readline()   # ignore every other line
            pos = mm.tell() # remember current position
            if not mm.readline(): # EOF
                break
            if mm[pos + ncolumn] in uppercase:
                mm[pos + ncolumn] |= 0b0100000 # lowercase
    

    注意:在这种情况下,Python 2和3 API有所不同.该代码使用Python 3.

    Note: Python 2 and 3 APIs differ in this case. The code uses Python 3.

    ABCDE1
    FGHIJ
    ABCDE
    FGHI
    

    输出

    ABCDE1
    FGHiJ
    ABCDE
    FGHi
    

    注意:第2列和第4h行的第4列变为小写.

    Notice: 4th column became lowercase on 2nd and 4h lines.

    通常,如果您要更改文件:您从文件中读取,对临时文件进行修改,并在成功后将临时文件替换为原始文件:

    Typically if you want to change a file: you read from the file, write modifications to a temporary file, and on success you move the temporary file inplace of the original file:

    #!/usr/bin/env python3
    import os
    import string
    from tempfile import NamedTemporaryFile
    
    caesar_shift = 3
    filename = 'file'
    
    def caesar_bytes(plaintext, shift, alphabet=string.ascii_lowercase.encode()):
        shifted_alphabet = alphabet[shift:] + alphabet[:shift]
        return plaintext.translate(plaintext.maketrans(alphabet, shifted_alphabet))
    
    dest_dir = os.path.dirname(filename)
    chunksize = 1 << 15
    with open(filename, 'rb') as file, \
         NamedTemporaryFile('wb', dir=dest_dir, delete=False) as tmp_file:
        while True: # encrypt
            chunk = file.read(chunksize)
            if not chunk: # EOF
                break
            tmp_file.write(caesar_bytes(chunk, caesar_shift))
    os.replace(tmp_file.name, filename)
    

    输入

    abc
    def
    ABC
    DEF
    

    输出

    def
    ghi
    ABC
    DEF
    

    要转换回输出,请设置caesar_shift = -3.

    To convert the output back, set caesar_shift = -3.

    这篇关于如何更改文件中的字节?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆