截断unicode，以便在进行电汇编码时适合最大大小 [英] Truncating unicode so it fits a maximum size when encoded for wire transfer

查看：74 发布时间：2021/2/13 20:07:33 python json unicode truncate

本文介绍了截断unicode，以便在进行电汇编码时适合最大大小的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出Unicode字符串和这些要求:

Given a Unicode string and these requirements:

该字符串被编码为某种字节顺序格式(例如UTF-8或JSON Unicode转义)
编码的字符串具有最大长度

例如，iPhone推送服务需要JSON编码，最大总数据包大小为256个字节.

For example, the iPhone push service requires JSON encoding with a maximum total packet size of 256 bytes.

截断字符串以使其重新编码为有效Unicode并合理正确显示的最佳方法是什么?

(不需要人类语言理解-截断的版本可能看起来很奇怪，例如对于孤立的组合字符或泰语元音，只要软件在处理数据时不会崩溃即可.)

(Human language comprehension is not necessary—the truncated version can look odd e.g. for an orphaned combining character or a Thai vowel, just as long as the software doesn't crash when handling the data.)

另请参见:

相关的Java问题:一旦UTF-8编码，如何截断Java字符串以适合给定的字节数?

相关的JavaScript问题:使用JavaScript将文本截断为特定大小

Related Java question: How do I truncate a java string to fit in a given number of bytes, once UTF-8 encoded?
Related Javascript question: Using JavaScript to truncate text to a certain size

推荐答案

def unicode_truncate(s, length, encoding='utf-8'):
encoded = s.encode(encoding)[:length]
return encoded.decode(encoding, 'ignore')

这是unicode字符串的示例，其中每个字符在UTF-8中用2个字节表示:

Here is an example for unicode string where each character is represented with 2 bytes in UTF-8:

>>> unicode_truncate(u'абвгд', 5)
u'\u0430\u0431'

这篇关于截断unicode，以便在进行电汇编码时适合最大大小的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

截断unicode，以便在进行电汇编码时适合最大大小 [英] Truncating unicode so it fits a maximum size when encoded for wire transfer

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

截断unicode，以便在进行电汇编码时适合最大大小 [英] Truncating unicode so it fits a maximum size when encoded for wire transfer

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭