UTF-8到unicode或latin-1(是的,我阅读了FAQ) [英] UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

查看:70
本文介绍了UTF-8到unicode或latin-1(是的,我阅读了FAQ)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨!


我正在努力将UTF-8字符串转换为latin-1。至于

我知道要走的路是将UTF-8字符串解码为unicode和

然后再将其编码回latin-1?


所以我试过了:


''K\xc3 \ xb6ni''。decode(''utf-8'')#''K \ xc3 \ xb6ni''应该是''K?nig'',

包含一个德语''变音符号''


但是因为python失败了假设要解码的每个字符串都是ASCII?


如何将此字符串转换为latin-1?


如何编写函数?喜欢:


def encode_string(string,from_encoding,to_encoding):

#????


Best问候,

诺埃尔

解决方案

没有******* @ gmx.net 写道:


我正在努力转换一个UTF-8字符串到latin-1。至于

我知道要走的路是将UTF-8字符串解码为unicode和

然后再将其编码回latin-1?


所以我试过了:


''K\xc3 \ xb6ni''。decode(''utf-8'')#''K \ xc3 \ xb6ni''应该是''K?nig'',



" K?ni",确切地说。
< blockquote class =post_quotes>
包含一个德语''umlaut''

但由于python假设每个字符串都要解码为ASCII而失败了?



这应该有用,它确实对我有用:


>> s =''K\xc3 \ xb6ni''。decode(''utf-8'')
s



u''K\xf6ni''


>> print s



K?ni


你做了什么,它是怎么失败的?


< / F>


< a href =mailto:No ******* @ gmx.net> No ******* @ gmx.net 写道:


''K\xc3 \ xb6ni''。decode(''utf-8'')#''K\xc3 \ xb6ni''应该是''K?nig'',

包含一个德语''变音符号''


但是因为pyt失败了hon假设每个字符串解码为ASCII?



不,Python会假设在这种情况下字符串为utf-8编码:


>>''K\xc3 \ xb6ni''。decode(''utf-8'')。encode(''latin1'')



''K\xf6ni''


您的代码必须在某处失败其他。尝试发布实际失败的代码

和实际追溯。



''K\xc3 \ xb6ni''。decode(''utf-8'')#''K\xc3 \ xb6ni''应该是''K?nig'',确切地说,



K?ni。



?h,是的。

; o)


包含一个德语''变音符号''

但由于python假定每个字符串都要解码为ASCII而失败了?



这应该有用,它确实对我有用:


>> s =''K'\\ xc3 \ xb6ni''。decode(''utf-8'')

>> s



u''K \ xf6ni''


>> print s



K?ni


你做了什么,它是如何失败的?



首先,非常感谢您回答这么快。我为一个

项目提出了python,如果我失败将会非常尴尬

将UTF-8字符串转换为latin-1。


我意识到我的问题不是UTF-8的解码。如果我尝试打印unicode字符串,则会打印出异常




UnicodeEncodeError:''ascii''编解码器不能编码字符u ''\ xf6''在

位置1:序数不在范围内(128)


但这完全不是问题,因为我现在可以将我的UTF-8字符串

转为unicode!问题再次出现在我的

屏幕前面。傻我......

; o)


再次,谢谢你的回复!


祝你好运,

Noel


Hi!

I''m struggling with the conversion of a UTF-8 string to latin-1. As far
as I know the way to go is to decode the UTF-8 string to unicode and
then encode it back again to latin-1?

So I tried:

''K\xc3\xb6ni''.decode(''utf-8'') # ''K\xc3\xb6ni'' should be ''K?nig'',
contains a german ''umlaut''

but failed since python assumes every string to decode to be ASCII?

How can I convert this string to latin-1?

How would you write a function like:

def encode_string(string, from_encoding, to_encoding):
#????

Best regards,
Noel

解决方案

No*******@gmx.net wrote:

I''m struggling with the conversion of a UTF-8 string to latin-1. As far
as I know the way to go is to decode the UTF-8 string to unicode and
then encode it back again to latin-1?

So I tried:

''K\xc3\xb6ni''.decode(''utf-8'') # ''K\xc3\xb6ni'' should be ''K?nig'',

"K?ni", to be precise.

contains a german ''umlaut''

but failed since python assumes every string to decode to be ASCII?

that should work, and it sure works for me:

>>s = ''K\xc3\xb6ni''.decode(''utf-8'')
s

u''K\xf6ni''

>>print s

K?ni

what did you do, and how did it fail?

</F>


No*******@gmx.net wrote:

''K\xc3\xb6ni''.decode(''utf-8'') # ''K\xc3\xb6ni'' should be ''K?nig'',
contains a german ''umlaut''

but failed since python assumes every string to decode to be ASCII?

No, Python would assume the string to be utf-8 encoded in this case:

>>''K\xc3\xb6ni''.decode(''utf-8'').encode(''latin1'')

''K\xf6ni''

Your code must have failed somewhere else. Try posting actual failing code
and actual traceback.


''K\xc3\xb6ni''.decode(''utf-8'') # ''K\xc3\xb6ni'' should be ''K?nig'',


"K?ni", to be precise.

?h, yes.
;o)

contains a german ''umlaut''

but failed since python assumes every string to decode to be ASCII?


that should work, and it sure works for me:

>>s = ''K\xc3\xb6ni''.decode(''utf-8'')
>>s

u''K\xf6ni''

>>print s

K?ni

what did you do, and how did it fail?

First, thank you so much for answering so fast. I proposed python for a
project and it would be very embarrassing for me if I would fail
converting a UTF-8 string to latin-1.

I realized that my problem ist not the decode to UTF-8. The exception
is raised by print if I try to print the unicode string.

UnicodeEncodeError: ''ascii'' codec can''t encode character u''\xf6'' in
position 1: ordinal not in range(128)

But that is not a problem at all since I can now turn my UTF-8 strings
to unicode! Once again the problem was sitting right in front of my
screen. Silly me...
;o)

Again, thank you for your reply!

Best regards,
Noel


这篇关于UTF-8到unicode或latin-1(是的,我阅读了FAQ)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆