为什么在utf-8中编码仍然会导致ascii? [英] Why encoding in utf-8 still results in ascii?

查看:232
本文介绍了为什么在utf-8中编码仍然会导致ascii?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据此代码:

# coding=utf-8
import sys
import chardet

print(sys.getdefaultencoding())

a = 'abc'

print(type(a))
print(chardet.detect(a))

b = a.decode('ascii')

print(type(b))


c = '中文'

print(type(c))
print(chardet.detect(c))


m = b.encode('utf-8')
print(type(m))
print(chardet.detect(m))

n = u'abc'

print(type(n))

x = n.encode(encoding='utf-8')

print(type(x))
print(chardet.detect(x))

我使用 utf-8 来编码 n ,但结果仍显示结果为 ascii

I use utf-8 to encode n but the result still show the result is ascii.

所以我想知道 utf-8 ascii unicode

So I want to know, what is relation between utf-8, ascii and unicode.

我与python2一起运行。

i run with python2.

===================结果=============== =================

===================result=================================

= =====================最终结果============================ ===

=======================end result =============================

推荐答案

UTF-8实际上是宽度可变的编码,恰好碰巧ASCII字符将直接映射

UTF-8 is actually a variable-width encoding, and it just so happens that ASCII characters will map directly in UTF-8.

由于您的UTF-8字符串仅包含 个ASCII字符,因此该字符串实际上是ASCII和UTF- 8个字符串。

Since your UTF-8 string contains only ASCII characters, the string is, well honestly both an ASCII and UTF-8 string.

此视觉效果可能会有所帮助:

This visual might help:

>>> c = '中文abc中文'
>>>
>>>
>>> c
'中文abc中文'
>>> c.encode(encoding="UTF-8")
b'\xe4\xb8\xad\xe6\x96\x87abc\xe4\xb8\xad\xe6\x96\x87'

请注意,UTF-8字符串中的 abc如何仅字节?它们仍然是与ascii对应的相同字节!

Notice how the "abc" in the UTF-8 string are only single-byte? They are still the same bytes as their ascii counterparts!

这篇关于为什么在utf-8中编码仍然会导致ascii?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆