数据包有时完全发送,有时未完全发送 [英] packet is sent completely somtimes and somtimes is not sent completely

查看:50
本文介绍了数据包有时完全发送,有时未完全发送的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

@Grismar 建议我为以下问题创建新主题:

@Grismar recommended me to create new topic for the following problem:

我用 socket 模块编写了一个服务器和客户端.对于多连接,我使用了 选择器模块 而不是 threadfork().

I wrote a server and client with socket module.For multi connection I used selectors module instead of thread or fork().

场景:我要生成一个海量的字符串并发送给客户端.当然根据一个字符串是由客户端生成的.事实上,客户端发送一个查询,服务器生成一个结果并发送给客户端.我没有将查询发送到服务器的问题.

Scenario: I have to generate a massive string and send to client.Of course according to a string is generated by client. Indeed client send a query and server generate a result and send to client. I don't have problem for send query to server.

因为我有大量的字符串,我决定将我的字符串拆分成块,例如:

Because I have massive string, I decided to split my string to chunks, such as :

if sys.getsizeof(search_result_string) > 1024: #131072:
    if sys.getsizeof(search_result_string) % 1024 == 0:
        chunks = int(sys.getsizeof(search_result_string) / 1024 )
    else:
        chunks = int(sys.getsizeof(search_result_string) / 1024) + 1
for chunk in range(chunks):
    packets.append(search_result_string[:1024])
    search_result_string = search_result_string[1024:]

所以,我有数据包列表.然后:

So , I have packets list. Then:

conn.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1000000)
for chunk in packets:
    conn.sendall(bytes(chunk,'utf-8'))

有时我在客户端没有任何问题,有时我会收到以下错误:

Somtimes I don't have any problem in client, and somtimes I get the following error:

Traceback (most recent call last):
  File "./multiconn-client.py", line 116, in <module>
    service_connection(key, mask)
  File "./multiconn-client.py", line 89, in service_connection
    target_string += recv_data.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd9 in position 42242: unexpected end of data

在我的客户端,我使用了以下回调:

At my client I used the following callback:

def service_connection(key, mask):
    buff = 10000
    sock = key.fileobj
    data = key.data
    target_string = str()
    if mask & selectors.EVENT_READ:
        buff = sock.getsockopt(SOL_SOCKET,SO_RCVBUF)
        recv_data = sock.recv( 128*1024 |buff)
        if recv_data:
            buff = sock.getsockopt(SOL_SOCKET,SO_RCVBUF)
            data.recv_total += len(recv_data)
        target_string += recv_data.decode('utf-8')
        print(target_string)
        if not recv_data: #or data.recv_total == data.msg_total:
            print("closing connection", data.connid)
            sel.unregister(sock)
            sock.close()
    if mask & selectors.EVENT_WRITE:
        if not data.outb and data.messages:
            data.outb = data.messages.pop(0)
        if data.outb:
            print("sending", repr(data.outb), "to connection", data.connid)
            sent = sock.send(data.outb)  # Should be ready to write
            data.outb = data.outb[sent:]

顺便说一下,我使用 TCP 套接字.并在本地主机中测试.
我每次运行都使用相同的字符串.

By the way, I use TCP socket.And test in localhost both.
I use same string for every run.

问题是,为什么有时一切都很好,有时字符串没有完全发送.

Questions is, Why somtimes everything is okey and sometimes string is not sent completely.

推荐答案

正在发生的事情是您的数据被操作系统分块(除了您正在执行的操作).当操作系统执行此操作时,它可能会在 UTF-8 编码序列的中间拆分您的数据.换句话说,请考虑以下代码块:

What's happening is that your data is being chunked by the operating system (in addition to what you're doing). And when the operating system does it, it may split your data in the middle of a UTF-8 encoding sequence. In other words, consider this block of code:

foo = '\xce\xdd\xff'       # three non-ascii characters
print(len(foo))            # => 3
bar = foo.encode('utf-8')
print(bar)                 # => b'\xc3\x8e\xc3\x9d\xc3\xbf'
bar[:3].decode()           # =>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 2: unexpected end of data

发生了什么:0x7f 以上的字符被编码为两个 UTF8 字节.但是,如果两字节序列在中间被截断,则无法解码字符.

What's going on: Those characters above 0x7f get encoded as two UTF8 bytes. But you cannot decode a character if the two-byte sequence gets truncated in the middle.

因此,为了轻松解决您的问题,请先接收所有数据(作为字节字符串),然后将整个字节字符串作为一个单元进行解码.

So, to easily fix your problem, receive all the data first (as a byte string), then decode the entire byte string as a unit.

这带来了另一个相关问题:您无需创建自己的数据块.TCP会为你做到这一点.正如您所见,TCP 无论如何都不会保留您的消息边界.所以你最好的选择是正确地构建"你的数据.

This brings up another related issue: you needn't create your own data chunks. TCP will do that for you. And as you've seen, TCP won't preserve your message boundaries anyway. So your best option is to properly "frame" your data.

也就是说,取出您的字符串的一部分(或所有字符串,如果不是数百兆字节),并将其编码为 UTF-8.获取结果字节缓冲区的长度.以二进制数据形式发送包含该长度的固定长度字段(使用 struct 模块创建).在接收端,首先接收定长大小字段.这让您知道实际需要接收多少字节的字符串数据.接收所有这些字节,然后立即解码整个缓冲区.

That is, take some part of your string (or all of your string if it isn't hundreds of megabytes), and encode it in UTF-8. Take the length of the resulting byte buffer. Send, as binary data, a fixed-length size field (use the struct module to create that) containing that length. On the receiving side, first receive the fixed-length size field. This lets you know how many bytes of string data you actually need to receive. Receive all of those bytes, then decode the entire buffer at once.

换句话说,忽略错误处理,发送方:

In other words, ignoring error handling, sending side:

import struct
import socket
...
str_to_send = "blah blah\xce"
bytes_to_send = str_to_send.encode('utf-8')
len_bytes = len(bytes_to_send)
sock.send(struct.pack("!I", len_bytes)         # Send 4-byte size header
sock.send(bytes_to_send)                       # Let TCP handle chunking bytes

接收方:

len_bytes = sock.recv(4)                       # Receive 4-byte size header
len_bytes = struct.unpack("!I")[0]             # Convert to number (unpack returns a list)

bytes_sent = b''
while len(bytes_sent) < len_bytes:
    buf = sock.recv(1024)          # Note, may not always receive 1024 (but typically will)
    if not buf:
        print("Unexpected EOF!")
        sys.exit(1)
    bytes_sent += buf
str_sent = bytes_sent.decode('utf-8')

最后一句话:socket.send保证发送整个缓冲区(尽管它通常会这样做).并且 socket.recv 不保证接收到您在参数中指定的字节数.因此,健壮的 TCP 发送/接收代码需要适应这些警告.

Final word: socket.send does not guarantee to send the entire buffer (although it typically does). And socket.recv does not guarantee to receive as many bytes as you specified in the argument. So, robust TCP sending/receiving code needs to accommodate those caveats.

这篇关于数据包有时完全发送,有时未完全发送的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆