测试字符串(如果是Unicode)，哪个UTF标准，并获取其长度(以字节为单位)? [英] Test a string if it's Unicode, which UTF standard is and get its length in bytes?

查看：82 发布时间：2020/7/13 2:44:33 python string unicode utf-8 python-2.5

本文介绍了测试字符串(如果是Unicode)，哪个UTF标准，并获取其长度(以字节为单位)?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要测试一个字符串是否为Unicode，然后是否为UTF-8.之后，获取字符串的长度(以字节为单位)，包括 BOM (如果使用的话).如何在Python中完成?

I need to test if a string is Unicode, and then if it whether it's UTF-8. After that, get the string's length in bytes including the BOM, if it ever uses that. How can this be done in Python?

出于教学目的，UTF-8字符串的字节列表表示是什么样的?我很好奇Python中如何表示UTF-8字符串.

Also for didactic purposes, what does a byte list representation of a UTF-8 string look like? I am curious how a UTF-8 string is represented in Python.

后期pprint效果很好.

Latter edit: pprint does that pretty well.

推荐答案

try:
    string.decode('utf-8')
    print "string is UTF-8, length %d bytes" % len(string)
except UnicodeError:
    print "string is not UTF-8"

在Python 2中，str是字节序列，而unicode是字符序列.您可以使用str.decode将字节序列解码为unicode，并使用unicode.encode将字符序列编码为str.因此，例如，u"é"是包含单个字符U + 00E9的Unicode字符串，也可以写为u"\xe9"；编码为UTF-8会给出字节序列"\xc3\xa9".

In Python 2, str is a sequence of bytes and unicode is a sequence of characters. You use str.decode to decode a byte sequence to unicode, and unicode.encode to encode a sequence of characters to str. So for example, u"é" is the unicode string containing the single character U+00E9 and can also be written u"\xe9"; encoding into UTF-8 gives the byte sequence "\xc3\xa9".

在Python 3中，这已更改； bytes是字节序列，str是字符序列.

In Python 3, this is changed; bytes is a sequence of bytes and str is a sequence of characters.

这篇关于测试字符串(如果是Unicode)，哪个UTF标准，并获取其长度(以字节为单位)?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

测试字符串(如果是Unicode)，哪个UTF标准，并获取其长度(以字节为单位)? [英] Test a string if it's Unicode, which UTF standard is and get its length in bytes?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

测试字符串(如果是Unicode)，哪个UTF标准，并获取其长度(以字节为单位)? [英] Test a string if it&#39;s Unicode, which UTF standard is and get its length in bytes?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

测试字符串(如果是Unicode)，哪个UTF标准，并获取其长度(以字节为单位)? [英] Test a string if it's Unicode, which UTF standard is and get its length in bytes?

登录关闭