Python字符编码欧洲口音 [英] Python Character Encoding European Accents

查看：197 发布时间：2016/11/19 16:52:29 python python-2.7 unicode utf-8 character-encoding

本文介绍了Python字符编码欧洲口音的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道这不是一个罕见的问题，已经有多个SO问题回答这个问题（ 1 ， 2 < a>， 3 ），但即使在其中的建议，我仍然看到这个错误（对于下面的代码）：

I know this is not an uncommon problem and that there are already multiple SO questions answered about this (1, 2, 3) but even in following the recommendations there, I am still seeing this error (for the below code):

uri_name = u％s_％s ％（name [1] .encode（'utf-8'）。strip（），name [0] .encode（'utf-8'）。strip（）） UnicodeDecodeError：'ascii'codec can' t解码字节0xc3在位置4：序数不在范围（128）

所以我试图从艺术家名称，其中许多具有口音和欧洲字符，这样（他们的名字也打印有特殊字符通过 repr ）：

So I am trying to get a url from a list of artist names, a lot of which have accents and european characters like so (with their names also printed with the special characters via repr):

Auberjonois, René -> Auberjonois, Ren\xc3\xa9
Bäumer, Eduard -> B\xc3\xa4umer, Eduard
Baur-Nütten, Gisela -> Baur-N\xc3\xbctten, Gisela
Bösken, Lorenz -> B\xc3\xb6sken, Lorenz
Čapek, Josef -> \xc4\x8capek, Josef
Großmann, Rudolf -> Gro\xc3\x9fmann, Rudolf

我试图运行的块是：

def create_uri(artist_name):

  artist_name = artist_name

  name = artist_name.split(",")

  uri_name = u"%s_%s" % (name[1].encode('utf-8').strip(), name[0].encode('utf-8').strip())

  uri = 'http://example.com/' + uri_name

  print uri

create_uri('Name, Non_Accent')
create_uri('Auberjonois, René')

所以第一个工作并产生 http://example.com/Non_Accent_Name
但第二个失败，出现上述错误。

So the first one works and produces http://example.com/Non_Accent_Name But the second fails with the error above.

我在脚本的顶部添加了＃coding = utf-8 ，并尝试对 artist_name


I have added # coding=utf-8 to the top of my script and have tried encoding the artist_name string at every point along the way, only to get the same error each time.
如果重要，我使用Atom作为文本编辑器，并且当我打开.csv文件，从这些名字来自，口音都显示正确。
If it matters, I am using Atom as a text editor and when I open up the .csv file from where these names are coming from, the accents all display correctly.
我还能做什么，以确保脚本解释UTF-8作为UTF-8而不是ascii？
What else can I do to ensure that the script interprets UTF-8 as UTF-8 and not ascii?
推荐答案
停止使用UTF-8。  unicode 无处不在，只有在接口处解码/编码（如有必要）。
Stop using UTF-8. Use unicodes everywhere, and only decode/encode (if necessary) at interfaces.
def create_uri(artist_name):
  name = artist_name.split(u",")
  uri_name = u"%s_%s" % (name[1].strip(), name[0].strip())
  uri = u'http://example.com/' + uri_name
  print uri

create_uri(u'Name, Non_Accent')
create_uri(u'Auberjonois, René')


                        这篇关于Python字符编码欧洲口音的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Python字符编码欧洲口音 [英] Python Character Encoding European Accents

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python字符编码欧洲口音 [英] Python Character Encoding European Accents

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭