Python字符编码欧洲口音 [英] Python Character Encoding European Accents

查看:197
本文介绍了Python字符编码欧洲口音的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这不是一个罕见的问题,已经有多个SO问题回答这个问题( 1 2 < a>, 3 ),但即使在其中的建议,我仍然看到这个错误(对于下面的代码):

I know this is not an uncommon problem and that there are already multiple SO questions answered about this (1, 2, 3) but even in following the recommendations there, I am still seeing this error (for the below code):

uri_name = u%s_%s %(name [1] .encode('utf-8')。strip(),name [0] .encode('utf-8')。strip())
UnicodeDecodeError:'ascii'codec can' t解码字节0xc3在位置4:序数不在范围(128)

所以我试图从艺术家名称,其中许多具有口音和欧洲字符,这样(他们的名字也打印有特殊字符通过 repr ):

So I am trying to get a url from a list of artist names, a lot of which have accents and european characters like so (with their names also printed with the special characters via repr):

Auberjonois, René -> Auberjonois, Ren\xc3\xa9
Bäumer, Eduard -> B\xc3\xa4umer, Eduard
Baur-Nütten, Gisela -> Baur-N\xc3\xbctten, Gisela
Bösken, Lorenz -> B\xc3\xb6sken, Lorenz
Čapek, Josef -> \xc4\x8capek, Josef
Großmann, Rudolf -> Gro\xc3\x9fmann, Rudolf

我试图运行的块是:

def create_uri(artist_name):

  artist_name = artist_name

  name = artist_name.split(",")

  uri_name = u"%s_%s" % (name[1].encode('utf-8').strip(), name[0].encode('utf-8').strip())

  uri = 'http://example.com/' + uri_name

  print uri

create_uri('Name, Non_Accent')
create_uri('Auberjonois, René')

所以第一个工作并产生 http://example.com/Non_Accent_Name
但第二个失败,出现上述错误。

So the first one works and produces http://example.com/Non_Accent_Name But the second fails with the error above.

我在脚本的顶部添加了#coding = utf-8 ,并尝试对 artist_name

I have added # coding=utf-8 to the top of my script and have tried encoding the artist_name string at every point along the way, only to get the same error each time.

如果重要,我使用Atom作为文本编辑器,并且当我打开.csv文件,从这些名字来自,口音都显示正确。

If it matters, I am using Atom as a text editor and when I open up the .csv file from where these names are coming from, the accents all display correctly.

我还能做什么,以确保脚本解释UTF-8作为UTF-8而不是ascii?

What else can I do to ensure that the script interprets UTF-8 as UTF-8 and not ascii?

推荐答案

停止使用UTF-8。 unicode 无处不在,只有在接口处解码/编码(如有必要)。

Stop using UTF-8. Use unicodes everywhere, and only decode/encode (if necessary) at interfaces.

def create_uri(artist_name):
  name = artist_name.split(u",")
  uri_name = u"%s_%s" % (name[1].strip(), name[0].strip())
  uri = u'http://example.com/' + uri_name
  print uri

create_uri(u'Name, Non_Accent')
create_uri(u'Auberjonois, René')

这篇关于Python字符编码欧洲口音的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆