删除csv文件中的非ASCII字符 [英] Removing non-ascii characters in a csv file

查看:320
本文介绍了删除csv文件中的非ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用csv文件在我的django模型中插入数据。下面是一个简单的保存函数,使用:

  def save(self):
myfile = file.csv
data = csv.reader(myfile,delimiter =',',quotechar ='')
i = 0
数据中的行:
if i == 0:
i = i + 1
继续#跳过标题行

b = MyModel()
b.create_from_csv_row(row)#调用保存在模型中的方法

此函数与ascii字符完美配合使用,但如果csv文件有一些非ASCII字符,








我的问题是:如何在保存我的csv文件之前删除非ascii字符,以避免此错误。


解决方案

如果您真的要删除它,请尝试:

  import unicodedata 

unicodedata.normalize('NFKD',title).encode('ascii','ignore')



*警告此操作将修改您的数据*
尝试查找关闭match - ieć - > c



也许更好的答案是使用 unicodecsv



----- EDIT -----
好​​的,如果你不在乎请尝试以下操作:

 #如果行引用unicode字符串
b.create_from_csv_row (row.encode('ascii','ignore'))

如果row是集合,一个unicode字符串,你将需要迭代集合到字符串级别重新序列化它。


I am currently inserting data in my django models using csv file. Below is a simple save function that am using:

def save(self):
myfile = file.csv
data = csv.reader(myfile, delimiter=',', quotechar='"')
i=0
for row in data:
    if i == 0:
        i = i + 1
        continue    #skipping the header row        

    b=MyModel()
    b.create_from_csv_row(row) # calls a method to save in models

The function is working perfectly with ascii characters. However, if the csv file has some non-ascii characters then, an error is raised: UnicodeDecodeError 'ascii' codec can't decode byte 0x93 in position 1526: ordinal not in range(128)

My question is: How can i remove non-ascii characters before saving my csv file to avoid this error.

Thanks in advance.

解决方案

If you really want to strip it, try:

import unicodedata

unicodedata.normalize('NFKD', title).encode('ascii','ignore')

* WARNING THIS WILL MODIFY YOUR DATA * It attempts to find a close match - i.e. ć -> c

Perhaps a better answer is to use unicodecsv instead.

----- EDIT ----- Okay, if you don't care that the data is represented at all, try the following:

# If row references a unicode string
b.create_from_csv_row(row.encode('ascii', 'ignore'))

If row is a collection, not a unicode string, you will need to iterate over the collection to the string level to re-serialize it.

这篇关于删除csv文件中的非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆