将UTF-16转换为UTF-8 [英] Converting UTF-16 to UTF-8

查看:243
本文介绍了将UTF-16转换为UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经从文件中加载了一个字符串。当我打印出以下字符串时:

 打印my_string 
打印binascii.hexlify(my_string)

我得到:

  2DF5 
0032004400460035

含义此字符串 UTF-16 。我想将此字符串转换为 UTF-8 ,以便上述代码生成此输出:

  2DF5 
32444635

我试过:

  my_string.decode('utf-8')

哪个输出:

  32004400460035 
/ pre>

编辑:



以下是一个简单示例:

  hello ='hello'.encode('utf-16')
print hello
print binascii.hexlify(hello)

hello = hello [2:]。decode('utf-8')
print hello
print binascii.hexlify(hello)

哪个产生这个输出:

   hello
fffe680065006c006c006f00
hello
680065006c006c006f00

预期的输出将是:

   hello
fffe680065006c006c006f00
hello
68656c6c6f


解决方案

您的字符串似乎已使用 utf进行编码-16be

 在[9]中:s =2DF5.encode(utf- 16be)
在[11]中:print binascii.hexlify(s)
0032004400460035

所以,为了将其转换为 utf-8 ,您首先需要对其进行解码,然后进行编码:



[pre> 在[14]中:uni = s.decode(utf-16be)
在[15]中:uni
输出[15]:u '2DF5'

在[16]中:utf = uni.encode(utf-8)
在[17]中:utf
输出[17]:'2DF5'

或一步:

 在[13]中:s.decode(utf-16be)。encode(utf-8)
输出[13]:'2DF5'


I've loading a string from a file. When I print out the string with:

print my_string
print binascii.hexlify(my_string)

I get:

2DF5
0032004400460035

Meaning this string is UTF-16. I would like to convert this string to UTF-8 so that the above code produces this output:

2DF5
32444635

I've tried:

my_string.decode('utf-8')

Which output:

32004400460035

EDIT:

Here's a quick sample:

    hello = 'hello'.encode('utf-16')
    print hello
    print binascii.hexlify(hello)

    hello = hello[2:].decode('utf-8')
    print hello
    print binascii.hexlify(hello)

Which produces this output:

��hello
fffe680065006c006c006f00
hello
680065006c006c006f00

Expected output would be:

��hello
fffe680065006c006c006f00
hello
68656c6c6f

解决方案

Your string appears to have been encoded using utf-16be:

In [9]: s = "2DF5".encode("utf-16be")
In [11]: print binascii.hexlify(s)
0032004400460035

So, in order to convert it to utf-8, you first need to decode it, then encode it:

In [14]: uni = s.decode("utf-16be")
In [15]: uni
Out[15]: u'2DF5'

In [16]: utf = uni.encode("utf-8")
In [17]: utf
Out[17]: '2DF5'

or, in one step:

In [13]: s.decode("utf-16be").encode("utf-8")
Out[13]: '2DF5'

这篇关于将UTF-16转换为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆