Python扩展-有效地构造和检查大整数 [英] Python extension - construct and inspect large integers efficiently

查看:131
本文介绍了Python扩展-有效地构造和检查大整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个本机库,其自然接口将涉及传递潜在的大量数字.我预计约有一半低于< 32位;另一个季度< 64位;接下来的第八个< 128位-依此类推,没有固定的长度限制.

如果我可以将值约束为适合单个寄存器,则

PyLong_FromUnsignedLongLong()和PyLong_AsUnsignedLongLong()将是合适的.

PyLong_FromString()克服了这一点-但以需要中间表示形式为代价. _PyLong_FromByteArray()和_PyLong_AsByteArray()通过简化中间表示减轻了这种开销,但是下划线引起了我的疑问,这是否会导致可移植性问题.

在longintrepr.h中,我发现了struct _longobject ...,这暗示它可能是直接与内部表示进行交互的方式...尽管缺少有关此结构的详细文档仍然是一个障碍.

哪种方法将导致Python和库之间的最佳吞吐量?有我忽略的文档吗?

解决方案

下划线前缀在C API中的含义与在普通Python中的含义相同:此函数是一个实现细节,可能会发生变化,因此请密切注意用它".您并非被禁止使用此类功能,并且如果这是实现特定目标的唯一方法(例如,您的情况下获得了显着的效率提高),那么只要您知道这种危害,就可以使用API​​.

如果_PyLong_FromByteArray API确实是私有的,则它将是一个static函数,并且不会在longobject.h中完整记录和导出.实际上,蒂姆·彼得斯(著名的Python核心开发人员)明确祝福它的使用:

[丹·克里斯滕森]

我和我的学生正在编写一个C扩展名,该扩展名会产生很大的 我们要转换为python long的二进制整数.这 位数可能远远超过32甚至64.我的学生发现 longobject.h中的函数_PyLong_FromByteArray正是 我们需要什么,但是领先的下划线使我保持警惕.安全吗 使用此功能吗?

Python在内部使用它,所以最好是;-)

在将来的python版本中它将继续存在吗?

没有保证,这就是为什么它有一个下划线的原因:不是 官方支持,外部记录的广告的一部分 Python/C API.碰巧我添加了该功能,因为 Python内部需要某种形式的功能 不同的C模块.使其成为Python/C API的正式组成部分 将会做更多的工作(我没时间做),并且 造成了永恒的新维护负担(我不喜欢 不管;-)).

在实践中,很少有人接触到Python实现的这一部分,因此 我不希望在未来几年内它会消失甚至改变. 我能想到的最大的不安全因素就是某人可能 发起十字军东征以制作一些其他字节数组<->长接口 官方"基于表示负整数的另一种方式. 但是即使如此,我仍希望当前的非官方职能得以保留, 因为256的补码表示对于 struct模块的"q"格式,并且对于pickle模块的协议= 2 长序列化格式.

还是我们应该使用其他方法?

不.这就是为什么要创建这些功能的原因;;-)

以下是文档(来自Python 3.2.1):

 /* _PyLong_FromByteArray:  View the n unsigned bytes as a binary integer in
   base 256, and return a Python long with the same numeric value.
   If n is 0, the integer is 0.  Else:
   If little_endian is 1/true, bytes[n-1] is the MSB and bytes[0] the LSB;
   else (little_endian is 0/false) bytes[0] is the MSB and bytes[n-1] the
   LSB.
   If is_signed is 0/false, view the bytes as a non-negative integer.
   If is_signed is 1/true, view the bytes as a 2's-complement integer,
   non-negative if bit 0x80 of the MSB is clear, negative if set.
   Error returns:
   + Return NULL with the appropriate exception set if there's not
     enough memory to create the Python long.
*/
PyAPI_FUNC(PyObject *) _PyLong_FromByteArray(
    const unsigned char* bytes, size_t n,
    int little_endian, int is_signed);
 

它是下划线前缀" API的主要原因是因为它依赖于Python long的实现,该实现以2的幂为基数的单词数组.这种情况不太可能改变,但是由于您是基于此实现API,因此以后可以将调用者与Python API中的更改隔离开来.

I have a native library for which a natural interface would involve passing potentially large numbers. I anticipate about half being < 32 bits; another quarter < 64 bits; the next eighth < 128 bits - and so on, without a fixed length limit.

PyLong_FromUnsignedLongLong() and PyLong_AsUnsignedLongLong() would be suitable if I could constrain values to fit in a single register.

PyLong_FromString() overcomes this - but at the undesirable expense of requiring an intermediate representation. _PyLong_FromByteArray() and _PyLong_AsByteArray() mitigate this cost (by making this intermediate representation simple) but the leading underscore makes me wonder if this may lead to portability problems.

In longintrepr.h, I've found struct _longobject... which hints that it might be a way to interact directly with the internal representation... though an absence of detailed documentation about this structure remains a hurdle.

What approach will result in optimal throughput between Python and the library? Is there documentation I've overlooked?

解决方案

The underscore prefix largely means the same thing in the C API as in normal Python: "this function is an implementation detail subject to change, so watch yourself if you use it". You're not forbidden to use such functions, and if it's the only way to achieve a particular goal (e.g. significant efficiency gains in your case), then it's fine to use the API as long as you are aware of the hazard.

If the _PyLong_FromByteArray API was truly private, it would be a static function and wouldn't be fully documented and exported in longobject.h. In fact, Tim Peters (a well-known Python core developer) explicitly blesses its use:

[Dan Christensen]

My student and I are writing a C extension that produces a large integer in binary which we'd like to convert to a python long. The number of bits can be a lot more than 32 or even 64. My student found the function _PyLong_FromByteArray in longobject.h which is exactly what we need, but the leading underscore makes me wary. Is it safe to use this function?

Python uses it internally, so it better be ;-)

Will it continue to exist in future versions of python?

No guarantees, and that's why it has a leading underscore: it's not an officially supported, externally documented, part of the advertised Python/C API. It so happens that I added that function, because Python needed some form of its functionality internally across different C modules. Making it an official part of the Python/C API would have been a lot more work (which I didn't have time for), and created an eternal new maintenance burden (which I'm not keen on regardless ;-)).

In practice, few people touch this part of Python's implementation, so I don't /expect/ it will go away, or even change, for years to come. The biggest insecurity I can think of offhand is that someone may launch a crusade to make some other byte-array <-> long interface "official" based on a different way of representing negative integers. But even then I expect the current unofficial functions to remain, since the 256's-complement representation remains necessary for the struct module's "q" format, and for the pickle module's protocol=2 long serialization format.

Or is there some other method we should use?

No. That's why these functions were invented to begin with ;-)

Here's the documentation (from Python 3.2.1):

/* _PyLong_FromByteArray:  View the n unsigned bytes as a binary integer in
   base 256, and return a Python long with the same numeric value.
   If n is 0, the integer is 0.  Else:
   If little_endian is 1/true, bytes[n-1] is the MSB and bytes[0] the LSB;
   else (little_endian is 0/false) bytes[0] is the MSB and bytes[n-1] the
   LSB.
   If is_signed is 0/false, view the bytes as a non-negative integer.
   If is_signed is 1/true, view the bytes as a 2's-complement integer,
   non-negative if bit 0x80 of the MSB is clear, negative if set.
   Error returns:
   + Return NULL with the appropriate exception set if there's not
     enough memory to create the Python long.
*/
PyAPI_FUNC(PyObject *) _PyLong_FromByteArray(
    const unsigned char* bytes, size_t n,
    int little_endian, int is_signed);

The main reason it's an "underscore-prefixed" API is because it depends on the implementation of the Python long as an array of words in a power-of-two base. This isn't likely to change, but since you're implementing an API on top of this, you can insulate your callers from changes in the Python API later on.

这篇关于Python扩展-有效地构造和检查大整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆