删除csv文件中的非ASCII字符 [英] Removing non-ascii characters in a csv file
问题描述
我目前正在使用csv文件在我的django模型中插入数据。下面是一个简单的保存函数,使用:
def save(self):
myfile = file.csv
data = csv.reader(myfile,delimiter =',',quotechar ='')
i = 0
数据中的行:
if i == 0:
i = i + 1
继续#跳过标题行
b = MyModel()
b.create_from_csv_row(row)#调用保存在模型中的方法
此函数与ascii字符完美配合使用,但如果csv文件有一些非ASCII字符, :
我的问题是:如何在保存我的csv文件之前删除非ascii字符,以避免此错误。
如果您真的要删除它,请尝试:
import unicodedata
unicodedata.normalize('NFKD',title).encode('ascii','ignore')
*警告此操作将修改您的数据*
尝试查找关闭match - ieć - > c
也许更好的答案是使用 unicodecsv 。
----- EDIT -----
好的,如果你不在乎请尝试以下操作:#如果行引用unicode字符串
b.create_from_csv_row (row.encode('ascii','ignore'))
如果row是集合,一个unicode字符串,你将需要迭代集合到字符串级别重新序列化它。
I am currently inserting data in my django models using csv file. Below is a simple save function that am using:
def save(self): myfile = file.csv data = csv.reader(myfile, delimiter=',', quotechar='"') i=0 for row in data: if i == 0: i = i + 1 continue #skipping the header row b=MyModel() b.create_from_csv_row(row) # calls a method to save in models
The function is working perfectly with ascii characters. However, if the csv file has some non-ascii characters then, an error is raised: UnicodeDecodeError 'ascii' codec can't decode byte 0x93 in position 1526: ordinal not in range(128)
My question is: How can i remove non-ascii characters before saving my csv file to avoid this error.
Thanks in advance.
解决方案If you really want to strip it, try:
import unicodedata unicodedata.normalize('NFKD', title).encode('ascii','ignore')
* WARNING THIS WILL MODIFY YOUR DATA * It attempts to find a close match - i.e. ć -> c
Perhaps a better answer is to use unicodecsv instead.
----- EDIT ----- Okay, if you don't care that the data is represented at all, try the following:
# If row references a unicode string b.create_from_csv_row(row.encode('ascii', 'ignore'))
If row is a collection, not a unicode string, you will need to iterate over the collection to the string level to re-serialize it.
这篇关于删除csv文件中的非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!