Python的 - 我怎样才能改变字节的文件 [英] Python - How can I change bytes in a file

查看：179 发布时间：2016/8/6 22:14:59 python unicode binary ascii encode

本文介绍了Python的 - 我怎样才能改变字节的文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在做一个加密程序，我需要以二进制方式打开文件访问非ASCII和非打印字符，我需要检查是否从文件字符是字母，数字，符号或不可显示字符。这意味着我必须在1到1检查是否字节（当他们去coded到ASCII）匹配任何字符：

  {^ 9，dzEV = Q4ciT + / S};！fnq3BFh％＃2 K7＆GT;＆YSU LT; GYD \\ I] |OC_e.W0M~ua-jR5lv1wA`@8t*xr K[P）及b：G $ P（mX6Ho JNZL

我想我可以带code这些字符以上为二进制，然后将它们用字节进行比较。我不知道如何做到这一点。

P.S。对不起，我英语不好和二进制误解。（我希望你
  知道我的字节的意思是，我的意思是像二进制模式字符
  这一点）：
  \\ X01 \\ X00 \\ x9a \\ x9c \\ X18 \\ X00
 
解决方案

有两种主要的字符串类型的Python：该重新present二进制数据和统一code字符串字节串（字节序列）（一统一code codepoints）的重新present可读文本序列。它是简单的将一个到另一个（☯）：
  UNI code_text = bytestring.de code（character_encoding）
字节字符串= UNI code_text.en code（character_encoding）
 
如果您打开二进制模式如文件，RB然后 file.read（）返回字节字符串（字节键入）：
 ＆GT;＆GT;＆GT; b'A'== B'\\ X41'== CHR（0b1000001）.EN code（）
真正
 
有可用于分类字节的几种方法：
字符串的方法，如 bytes.isdigit（）：
 ＆GT;＆GT;＆GT; b'1'.isdigit（）
真正
 
字符串常量，如 string.printable
 ＆GT;＆GT;＆GT;进口字符串
＆GT;＆GT;＆GT; B'！在string.printable.en code（）
真正
 
常规EX pressions如 \\ D
 ＆GT;＆GT;＆GT;进口重
＆GT;＆GT;＆GT;布尔（re.match（BR'\\ D + $'，b'123'））
真正
 
在 curses.ascii 模块如 curses.ascii.isprint（）
分类功能p>
 ＆GT;＆GT;＆GT;从诅咒中导入ASCII
＆GT;＆GT;＆GT; ByteArray的（过滤器（ascii.isprint，b'123'））
ByteArray的（b'123'）
 
字节组是一个字节一个可变的序列 - 不像一个字节字符串你可以改变它就地例如，为小写每3个字节是大写的：
 ＆GT;＆GT;＆GT;进口字符串
＆GT;＆GT;＆GT; A =字节组（b'ABCDEF_'）
＆GT;＆GT;＆GT;大写= string.ascii_uppercase.en code（）
＆GT;＆GT;＆GT;一个[：3] = [B | 0b0100000若B大写的其他b
......对B在[:: 3]
＆GT;＆GT;＆GT;一个
ByteArray的（b'aBCdEF_'）
 
注意： b'ad 是小写的，但 B'_'保持不变。
。
要修改一个二进制文件就地，你可以使用 MMAP 模块例如，在每隔一行小写第4列在文件：
 ＃！的/ usr / bin中/ env的python3
进口MMAP
进口字符串大写= string.ascii_uppercase.en code（）
ncolumn = 3＃选择第4列
开放（'文件'，'R + B'）的文件\\
     mmap.mmap（file.fileno（），0，获得= mmap.ACCESS_WRITE）为MM：
    而真正的：
        mm.readline（）＃忽略每隔一行
        POS = mm.tell（）＃记得当前位置
        如果不是mm.readline（）：＃EOF
            打破
        如果毫米[POS + ncolumn]大写的：
            毫米[POS + ncolumn] | = 0b0100000＃小写
 
请注意：Python的2和3的API在这种情况下有所不同。在code使用Python 3。
输入
  ABCDE1
FGHIJ
ABCDE
FGHI
 
输出
  ABCDE1
FGHIJ
ABCDE
FGHI
 
注意：第4列成为小写2日和4小时线
通常情况下，如果你想改变一个文件：你从文件中读取，写入修改到一个临时文件，并在成功移动原始文件的临时文件就地：
 ＃！的/ usr / bin中/ env的python3
进口OS
进口字符串
从临时文件导入NamedTemporaryFilecaesar_shift = 3
文件名='文件'高清caesar_bytes（明文，移位，字母= string.ascii_lowercase.en code（））：
    shifted_alphabet =字母[SHIFT：] +字母[：SHIFT]
    返回plaintext.translate（plaintext.maketrans（字母，shifted_alphabet））dest_dir = os.path.dirname（文件名）
块大小= 1＆LT;＆LT; 15
开放（文件名，RB）的文件\\
     NamedTemporaryFile（WB，DIR = dest_dir，删除= FALSE）为tmp_file：
    而真：＃加密
        块= file.read（块大小）
        如果没有大块：＃EOF
            打破
        tmp_file.write（caesar_bytes（块，caesar_shift））
os.replace（tmp_file.name，文件名）
 
输入
  ABC
DEF
ABC
DEF
 
输出
  DEF
GHI
ABC
DEF
 
要转换输出反馈，请将 caesar_shift = -3 。
I'm making a encryption program and i need to open file in binary mode to access non-ascii and non-printable characters, i need to check if character from a file is letter, number, symbol or unprintable character. That means i have to check 1 by 1 if bytes (when they are decoded to ascii) match any of these characters:
{^9,dzEV=Q4ciT+/s};fnq3BFh% #2!k7>YSU<GyD\I]|OC_e.W0M~ua-jR5lv1wA`@8t*xr'K"[P)&b:g$p(mX6Ho?JNZL
I think I could encode these characters above to binary and then compare them with bytes. I don't know how to do this.

P.S. Sorry for bad English and Binary misunderstanding. (I hope you know what i mean by Bytes, I mean characters in binary mode like this):
\x01\x00\x9a\x9c\x18\x00
解决方案
There are two major string types in Python: bytestrings (a sequence of bytes) that represent binary data and Unicode strings (a sequence of Unicode codepoints) that represent human-readable text. It is simple to convert one into another (☯):
unicode_text = bytestring.decode(character_encoding)
bytestring = unicode_text.encode(character_encoding)
If you open a file in binary mode e.g., 'rb' then file.read() returns a bytestring (bytes type):
>>> b'A' == b'\x41' == chr(0b1000001).encode()
True
There are several methods that can be used to classify bytes:
string methods such as bytes.isdigit():
>>> b'1'.isdigit()
True
string constants such as string.printable
>>> import string
>>> b'!' in string.printable.encode()
True
regular expressions such as \d
>>> import re
>>> bool(re.match(br'\d+$', b'123'))
True
classification functions in curses.ascii module e.g., curses.ascii.isprint()
>>> from curses import ascii
>>> bytearray(filter(ascii.isprint, b'123'))
bytearray(b'123')
bytearray is a mutable sequence of bytes — unlike a bytestring you can change it inplace e.g., to lowercase every 3rd byte that is uppercase:
>>> import string
>>> a = bytearray(b'ABCDEF_')
>>> uppercase = string.ascii_uppercase.encode()
>>> a[::3] = [b | 0b0100000 if b in uppercase else b 
...           for b in a[::3]]
>>> a
bytearray(b'aBCdEF_')
Notice: b'ad' are lowercase but b'_' remained the same.

To modify a binary file inplace, you could use mmap module e.g., to lowercase 4th column in every other line in 'file':
#!/usr/bin/env python3
import mmap
import string

uppercase = string.ascii_uppercase.encode()
ncolumn = 3 # select 4th column
with open('file', 'r+b') as file, \
     mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_WRITE) as mm:
    while True:
        mm.readline()   # ignore every other line
        pos = mm.tell() # remember current position
        if not mm.readline(): # EOF
            break
        if mm[pos + ncolumn] in uppercase:
            mm[pos + ncolumn] |= 0b0100000 # lowercase
Note: Python 2 and 3 APIs differ in this case. The code uses Python 3.

Input
ABCDE1
FGHIJ
ABCDE
FGHI
Output
ABCDE1
FGHiJ
ABCDE
FGHi
Notice: 4th column became lowercase on 2nd and 4h lines.

Typically if you want to change a file: you read from the file, write modifications to a temporary file, and on success you move the temporary file inplace of the original file:
#!/usr/bin/env python3
import os
import string
from tempfile import NamedTemporaryFile

caesar_shift = 3
filename = 'file'

def caesar_bytes(plaintext, shift, alphabet=string.ascii_lowercase.encode()):
    shifted_alphabet = alphabet[shift:] + alphabet[:shift]
    return plaintext.translate(plaintext.maketrans(alphabet, shifted_alphabet))

dest_dir = os.path.dirname(filename)
chunksize = 1 << 15
with open(filename, 'rb') as file, \
     NamedTemporaryFile('wb', dir=dest_dir, delete=False) as tmp_file:
    while True: # encrypt
        chunk = file.read(chunksize)
        if not chunk: # EOF
            break
        tmp_file.write(caesar_bytes(chunk, caesar_shift))
os.replace(tmp_file.name, filename)
Input
abc
def
ABC
DEF
Output
def
ghi
ABC
DEF
To convert the output back, set caesar_shift = -3.

这篇关于Python的 - 我怎样才能改变字节的文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python的 - 我怎样才能改变字节的文件 [英] Python - How can I change bytes in a file

问题描述

输入

输出

输入

输出

Input

Output

Input

Output

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python的 - 我怎样才能改变字节的文件 [英] Python - How can I change bytes in a file

问题描述

输入

输出

输入

输出

Input

Output

Input

Output

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭