使用Python处理CSV中的非标准美国英语字符和符号 [英] Handling non-standard American English Characters and Symbols in a CSV, using Python

查看:382
本文介绍了使用Python处理CSV中的非标准美国英语字符和符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个商店名称列表,有几千个名字,其中一些有非标准的美国英语字符,造成一个问题。



例如,我的输入文件如下所示:

  store_name 
yéché
Ázak
ótndle

我想要输出文件实际上看起来像这样(我认为Googledocs发生这种情况,btw):

  store_name new_store_name 
yéchéyéché
Ázakézak
ótndleótndle

非标准美国英语字符转换成这种格式,所以我经历了,并控制f在excel中使他们。但我想在将来能够做这样的计算,这只是想知道是否有一个快速的方式使用Python。要清楚,我想做的是:

 é成为é
Á成为Ãi$ b $ python.org/library/codecs.html#module-codecsrel =nofollow>解码和编码:

  print a 
péché
Álak
óundle

打印a.decode('latin9')。encode('utf8'),
pé ché
Ãlak
óundle

我不得不做相反的...


I have a list of store names, with a few thousand names, some of which have non-standard American English characters that are posing a problem.

For example, my input file looks like this:

store_name
yéché
Ázak
ótndle

I want the output file to actually look like this (I think Googledocs made this happen, btw):

store_name  new_store_name 
yéché       yéché
Ázak        Ãzak
ótndle      ótndle 

There are only about 10 such rules that convert the non-standard American English character into this format, so I went through and did control f in excel to make them. But I'd like to be able in the future to do things like this computationally, and was just wondering if there is a quick way of doing this using Python. To be clear, what I want to do is make:

é become é
Á become Ãi

解决方案

You can use decode and encode:

print a
péché
Álak
óundle

print a.decode('latin9').encode('utf8'),
péché
Ãlak
óundle

I had to do the reverse...

这篇关于使用Python处理CSV中的非标准美国英语字符和符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆