使用不是 2 的幂的 bufsize 调用 socket.recv 的实际影响是什么? [英] What is the actual impact of calling socket.recv with a bufsize that is not a power of 2?

查看:21
本文介绍了使用不是 2 的幂的 bufsize 调用 socket.recv 的实际影响是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要从 Python 中的套接字读取数据,您可以调用 socket.recv,它具有以下签名:

<块引用>

socket.recv(bufsize[, flags])

socket.recv 的python 文档 含糊地说:

<块引用>

注意:为了与硬件和网络现实最佳匹配,bufsize 应该相对较小2的幂,例如4096.

问题:与硬件和网络现实的最佳匹配"是什么意思?将 bufsize 设置为非 2 的幂有什么实际影响?

我见过 许多 其他 建议使这成为阅读2 的幂.我也很清楚将数组长度设为 2 的幂(长度的位移/掩码操作、最佳 FFT 数组大小等)通常很有用的原因,但这些取决于应用程序.我只是没有看到 socket.recv 的一般原因.当然不是python文档中具体建议的重点.我也没有在 底层的python代码使其成为特定于python的推荐

例如...如果您有一个协议,其中传入的数据包长度是完全已知的,显然最好只最多"读取您正在处理的数据包所需的内容,否则您可能会吃掉下一个包,那会很烦人.如果我当前正在处理的数据包只有 42 字节未决,我只会将 bufsize 设置为 42.

我错过了什么?当我必须选择任意缓冲区/数组大小时,我通常(总是?)将长度设为 2 的幂,以防万一.这只是多年养成的习惯.python 文档是否也只是习惯的受害者?

这不是 python 独有的,但由于我专门引用了 python 文档,因此我将其标记为这样.

<小时>

更新:我刚刚检查了系统内核级别的缓冲区大小(或者至少我认为我做了...我做了cat/proc/sys/net/core/rmem_default) 是 124928.不是 2 的幂.rmem_max 是 131071,显然也不是 2 的幂.

在深入研究这一点时,我真的看不出两条建议的力量有任何好处.我准备将其称为虚假推荐...

我还添加了 tcpC 标签,因为它们也是相关的.

解决方案

我很确定2 的力量"建议是基于编辑错误,不应被视为要求.

该具体建议被添加到 Python 2.5 文档(和 backported to. Python3, 2, 2, 3535ac7efaf396dc47811687d7186e445b#diff-4ba34ca0cd00d7ceb926ea3b04363371" rel="noreferrer">响应.href 2.hrefs.https://bugs.python.org/issue756104" rel="noreferrer">Python 问题 #756104.报告者为 socket.recv() 使用了不合理的大缓冲区大小,这促使更新.

是 Tim Peters 引入了2 的力量"概念:

<块引用>

我希望你是历史上唯一一个尝试过这样的人recv() 的一个很大的价值——即使它有效,你几乎尝试分配缓冲区空间时肯定会耗尽内存1.9GB.套接字是一种低级设施,它很常见通过一个相对较小的 2 次幂(为了与硬件和网络现实).

(粗体强调我的).我和 Tim 一起工作过,他在网络编程和硬件方面拥有丰富的经验,所以一般来说,在发表这样的评论时,我会相信他的话.他特别喜欢"Windows 95 堆栈,他称其为煤矿中的金丝雀,因为它能够在压力下失败.但请注意,他说很常见,而不是必须使用 2 的幂.

正是这种措辞导致了文档更新:

<块引用>

这是一个文档错误;用户应该是什么警告"一下.

这一次抓住了我,两个不同的人问了这个在#python中,所以也许我们应该放一些像跟随在 recv() 文档中.

"""
为了与硬件和网络现实最佳匹配,
缓冲区"的值应该是相对较小的 2 次方,
例如,4096.
"""

如果您认为措辞正确,只需将错误分配给我,我会处理的.

这里没有人质疑2 的力量"断言,但编辑在几条回复中从很常见转移到了应该.>

对我来说,那些提议更新文档的人更关心的是确保使用小缓冲区,而不是它是否是 2 的幂.这并不是说它不好建议 但是;任何与内核交互的低级缓冲区都有利于与内核数据结构对齐.

但是,虽然可能存在一个深奥的堆栈,其中大小为 2 的幂的缓冲区更重要,但我怀疑 Tim Peters 是否意味着他的经验(这是常见做法)以这种铁一般的方式铸造.如果不同的缓冲区大小对您的特定用例更有意义,请忽略它.

To read data from a socket in python, you call socket.recv, which has this signature:

socket.recv(bufsize[, flags])

The python docs for socket.recv vaguely state:

Note: For best match with hardware and network realities, the value of bufsize should be a relatively small power of 2, for example, 4096.

Question: What does "best match with hardware and network realities" mean? What is the actual impact of setting bufsize to a non-power-of-two?

I've seen many other recommendations to make this read a power of 2. I'm also well aware of reasons when it is often useful to have array lengths as powers of two (bitshift/masking operations on the length, optimal FFT array size, etc), but these are application dependent. I just am not seeing the general reason for it with socket.recv. Certainly not to the point of the specific recommendation in the python documentation. I also don't see any power-of-two optimizations in the underlying python code to make it a python-specific recommendation

For example... if you have a protocol where the incoming packet length is exactly known, it is obviously preferrable to only read "at most" what is needed for the packet you are dealing with, otherwise you could potentially eat into the next packet and that would be irritating. If the packet I'm currently processing only has 42 bytes pending, I'm only going to set bufsize to 42.

What am I missing? When I have to choose an arbitrary buffer/array size I usually (always?) make the length a power of two, just in case. This is just a habit developed over many years. Are the python docs also just a victim of habit?

This isn't exclusive to python, but since I'm specifically referencing the python docs I'll tag it as such.


UPDATE: I just checked the size of the buffer at the kernel level on my system (or at least I think I did... I did cat /proc/sys/net/core/rmem_default) and it was 124928. Not a power of two. rmem_max was 131071, also clearly not a power of two.

In looking into this more I really cannot see any benefit in the power of two recommendation(s) yet. I'm about ready to call it as a bogus recommendation...

I also added tcp and C tags since they are also relevant.

解决方案

I'm pretty sure the 'power of 2' advice is based on an error in editing, and should not be taken as a requirement.

That specific piece of advice was added to the Python 2.5 documentation (and backported to Python 2.4.3 docs), in response to Python issue #756104. The reporter was using an unreasonably large buffer size for socket.recv(), which prompted the update.

It was Tim Peters that introduced the 'power of 2' concept:

I expect you're the only person in history to try passing such a large value to recv() -- even if it worked, you'd almost certainly run out of memory trying to allocate buffer space for 1.9GB. sockets are a low-level facility, and it's common to pass a relatively small power of 2 (for best match with hardware and network realities).

(Bold emphasis mine). I've worked with Tim and he has a huge amount of experience with network programming and hardware, so generally speaking I'd take him on his word when making a remark like that. He was particularly 'fond' of the Windows 95 stack, he called it his canary in a coalmine for its ability to fail under stress. But note that he says it is common, not that it is required to use a power of 2.

It was that wording that then led to the documentation update:

This is a documentation bug; something the user should be "warned" about.

This caught me once, and two different persons asked about this in #python, so maybe we should put something like the following in the recv() docs.

"""
For best match with hardware and network realities, the
value of "buffer" should be a relatively small power of 2,
for example, 4096.
"""

If you think the wording is right, just assign the bug to me, I'll take care of it.

No one challenged the 'power of 2' assertion here, but the editor moved from it is common to should be in the space of a few replies.

To me, those proposing the documentation update were more concerned with making sure you use a small buffer, and not whether or not it is a power of 2. That's not to say it is not good advice however; any low-level buffer that interacts with the kernel benefits with alignment with the kernel data structures.

But although there may well be an esoteric stack where buffers with a size that is a power of 2 matters even more, I doubt Tim Peters ever meant for his experience (that it is common practice) to be cast in such iron-clad terms. Just ignore it if a different buffer size makes more sense for your specific use cases.

这篇关于使用不是 2 的幂的 bufsize 调用 socket.recv 的实际影响是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆