Python字符编码欧洲口音 [英] Python Character Encoding European Accents
问题描述
我知道这不是一个罕见的问题,已经有多个SO问题回答这个问题( 1 , 2 < a>, 3 ),但即使在其中的建议,我仍然看到这个错误(对于下面的代码):
I know this is not an uncommon problem and that there are already multiple SO questions answered about this (1, 2, 3) but even in following the recommendations there, I am still seeing this error (for the below code):
uri_name = u%s_%s %(name [1] .encode('utf-8')。strip(),name [0] .encode('utf-8')。strip())
UnicodeDecodeError:'ascii'codec can' t解码字节0xc3在位置4:序数不在范围(128)
所以我试图从艺术家名称,其中许多具有口音和欧洲字符,这样(他们的名字也打印有特殊字符通过 repr
):
So I am trying to get a url from a list of artist names, a lot of which have accents and european characters like so (with their names also printed with the special characters via repr
):
Auberjonois, René -> Auberjonois, Ren\xc3\xa9
Bäumer, Eduard -> B\xc3\xa4umer, Eduard
Baur-Nütten, Gisela -> Baur-N\xc3\xbctten, Gisela
Bösken, Lorenz -> B\xc3\xb6sken, Lorenz
Čapek, Josef -> \xc4\x8capek, Josef
Großmann, Rudolf -> Gro\xc3\x9fmann, Rudolf
我试图运行的块是:
def create_uri(artist_name):
artist_name = artist_name
name = artist_name.split(",")
uri_name = u"%s_%s" % (name[1].encode('utf-8').strip(), name[0].encode('utf-8').strip())
uri = 'http://example.com/' + uri_name
print uri
create_uri('Name, Non_Accent')
create_uri('Auberjonois, René')
所以第一个工作并产生 http://example.com/Non_Accent_Name
但第二个失败,出现上述错误。
So the first one works and produces http://example.com/Non_Accent_Name
But the second fails with the error above.
我在脚本的顶部添加了#coding = utf-8
,并尝试对 artist_name $
I have added # coding=utf-8
to the top of my script and have tried encoding the artist_name
string at every point along the way, only to get the same error each time.
如果重要,我使用Atom作为文本编辑器,并且当我打开.csv文件,从这些名字来自,口音都显示正确。
If it matters, I am using Atom as a text editor and when I open up the .csv file from where these names are coming from, the accents all display correctly.
我还能做什么,以确保脚本解释UTF-8作为UTF-8而不是ascii?
What else can I do to ensure that the script interprets UTF-8 as UTF-8 and not ascii?
推荐答案
停止使用UTF-8。 unicode
无处不在,只有在接口处解码/编码(如有必要)。
Stop using UTF-8. Use unicode
s everywhere, and only decode/encode (if necessary) at interfaces.
def create_uri(artist_name):
name = artist_name.split(u",")
uri_name = u"%s_%s" % (name[1].strip(), name[0].strip())
uri = u'http://example.com/' + uri_name
print uri
create_uri(u'Name, Non_Accent')
create_uri(u'Auberjonois, René')
这篇关于Python字符编码欧洲口音的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!