将位序列传递到文件python [英] Passing a sequence of bits to a file python

查看:83
本文介绍了将位序列传递到文件python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为更大项目的一部分,我想在文件中保存一个位序列,以使文件尽可能小.我不是在谈论压缩,我想按原样保存序列,但是使用最少的字符数.最初的想法是使用ASCII编码将8位的微型序列转换为char,然后保存这些char,但是由于某些未知的奇怪字符问题,读取文件时检索到的字符与最初写入的字符不同.我尝试使用utf-8编码,latin-1打开文件,但似乎没有任何效果.我想知道是否还有其他方法,可能是通过将序列转换为十六进制数字?

As a part of a bigger project, I want to save a sequence of bits in a file so that the file is as small as possible. I'm not talking about compression, I want to save the sequence as it is but using the least amount of characters. The initial idea was to turn mini-sequences of 8 bits into chars using ASCII encoding and saving those chars, but due to some unknown problem with strange characters, the characters retrieved when reading the file are not the same that were originally written. I've tried opening the file with utf-8 encoding, latin-1 but none seems to work. I'm wondering if there's any other way, maybe by turning the sequence into a hexadecimal number?

推荐答案

技术上,您不能写少于一个字节,因为os以字节为单位组织内存( https://docs.python.org/2/library/io.html struct

technically you can not write less than a byte because the os organizes memory in bytes (write individual bits to a file in python), so this is binary file io, see https://docs.python.org/2/library/io.html there are modules like struct

使用'b'开关打开文件,指示二进制读/写操作,然后使用to_bytes()函数(如何在python中将单个位写入文本文件?)

open the file with the 'b' switch, indicates binary read/write operation, then use i.e. the to_bytes() function (Writing bits to a binary file) or struct.pack() (How to write individual bits to a text file in python?)

  with open('somefile.bin', 'wb') as f:

 import struct
 >>> struct.pack("h", 824)
'8\x03'

>>> bits = "10111111111111111011110"
>>> int(bits[::-1], 2).to_bytes(4, 'little')
b'\xfd\xff=\x00'


如果您想了解内存的8位(字节)结构,可以使用位操作以及 bitmasks BitArrays 之类的技术> 参见 https://wiki.python.org/moin/BitManipulation


if you want to get around the 8 bit (byte) structure of the memory you can use bit manipulation and techniques like bitmasks and BitArrays see https://wiki.python.org/moin/BitManipulation and https://wiki.python.org/moin/BitArrays

但是,正如您所说,如果使用不同长度的BitArrays ,问题是要读回数据,即存储小数点7需要3位0x111来存储小数点2需要2位0x10.现在的问题是再读一遍. 您的程序如何知道它是否必须将值读回为3位值或2位值?在无组织存储器中,十进制序列72看起来像11110,可以转换为111|10所以您的程序如何知道|在哪里?

however the problem is, as you said, to read back the data if you use BitArrays of differing length i.e. to store a decimal 7 you need 3 bit 0x111 to store a decimal 2 you need 2 bit 0x10. now the problem is to read this back. how can your program know if it has to read the value back as a 3 bit value or as a 2 bit value ? in unorganized memory the sequence decimal 72 looks like 11110 that translates to 111|10 so how can your program know where the | is ?

在普通字节有序存储器小数72中为0000011100000010-> 00000111|00000010,这具有以下优点:可以清楚地知道|的位置

in normal byte ordered memory decimal 72 is 0000011100000010 -> 00000111|00000010 this has the advantage that it is clear where the | is

这就是为什么最低级别的内存组织在8位= 1字节的固定群集中的原因.如果要访问字节/8位群集中的单个位,则可以将位掩码与逻辑运算符结合使用( http://www.learncpp.com/cpp-tutorial/3-8a-bit-flags-and-bit-masks/).在python中,最简单的单比特操作方式是模块ctypes

this is why memory on its lowest level is organized in fixed clusters of 8 bit = 1 byte. if you want to access single bits inside a bytes/ 8 bit clusters you can use bitmasks in combination with logic operators (http://www.learncpp.com/cpp-tutorial/3-8a-bit-flags-and-bit-masks/). in python the easiest way for single bit manipulation is the module ctypes

如果您知道自己的值全为6位,也许值得付出努力,但这也很难...

if you know that your values are all 6 bit maybe it is worth the effort, however this is also tough...

(如何设置,清除和切换一下?)

( 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆